DOC

Recommendation for Standard Operations at Remote Sites

By Greg Ferguson,2014-10-29 19:41
7 views 0
Recommendation for Standard Operations at Remote Sites

GWD-I DRAFT Reagan W. Moore, SDSC

    Category: Informational

    Data Transport Research Group 13-October-2003

     26-December-2003

    Operations for Access, Management, and Transport at Remote Sites

Status of This Memo: Informational

This memo provides information to the Grid community specifying the types of operations needed

    for access to remote data. Distribution is unlimited.

Copyright Notice

    Copyright ? Global Grid Forum (2003). All Rights Reserved.

Abstract

Remote data access can be viewed as simply the “get” or “put” of files. However, based upon

    experience with Data Grids, digital libraries, and persistent archives that access remote data,

    additional remote operations have been identified. This paper quantifies five types of operations

    that are performed at remote storage repositories, and illustrates each of the types of operations

    from examples based upon production systems. The transport of data often involves the coordinated execution of the specified remote operations, and thus the operations can be viewed

    as extensions to standard data transport.

Contents

Abstract....................................................................................................................................... 1

    1. Transport Operations on Remote Data .............................................................................. 2

    1.1 Categories of extended transport operations .................................................................. 2

    1.2 GGF Research Groups .................................................................................................. 3

    2. Operations ........................................................................................................................ 4

    2.1 Byte level access........................................................................................................... 4

    2.2 Latency management mechanisms................................................................................ 4

    2.3 Remote Execution of Functions ..................................................................................... 6

    2.4 Protocol conversion ....................................................................................................... 7

    2.5 Administrative tasks....................................................................................................... 8

    3. Implementations ............................................................................................................... 8

    4. Summary ........................................................................................................................ 10

    5. Author Information .......................................................................................................... 10

    6. Acknowledgements ......................................................................................................... 11

    7. Glossary ......................................................................................................................... 11

    7.1 Data Grid terms for a logical name space .................................................................... 11

    7.2 Data Grid terms for a storage repository abstraction .................................................... 12

    7.3 Data Grid terms for an information repository abstraction ............................................. 12

    7.4 Data Grid terms for a distributed resilient scalable architecture .................................... 13

    7.5 Data Grid terms for a virtual Data Grid ......................................................................... 13

    8. Intellectual Property Statement ....................................................................................... 14

    9. Full Copyright Notice ....................................................................................................... 14

    10. References ..................................................................................................................... 14

    moore@sdsc.edu 1

GWD-I DRAFT 13-October-2003

    1. Transport Operations on Remote Data

    Access to remote data involves aspects of data management, data manipulation, and data transport. From the perspective of differentiated grid services, one would like to be able to implement each type of data operation as a separate service, and then apply the services sequentially. The services would generate a dataset or file, and would then transport the file using GridFTP. We examine the conjecture that remote data operations may need to be combined with the file transport mechanisms to improve performance.

    This informational paper describes five types of remote operations, and examines the impact of combining remote operations with data transport. Each type of operation is illustrated with examples from production systems. Based upon experiences with Data Grids, digital libraries, and persistent archives, access to remote data in file systems can involve additional coordinated operations beyond those of the “get” or “put” of files [12]. Access to other types of storage repositories such as databases also leads to the requirement for additional types of remote operations. In particular, the Data Access and Integration Services Working Group of the Global Grid Forum is developing standards recommendations for data access and integration based on the Open Grid Services Architecture. While the DAIS specification depends on transport mechanisms for delivering data to clients and third parties, the DAIS group has not defined extended transport operations to improve service performance

    The key concept behind the integration of remote operations with transport is the observation that support for remote data access requires the installation of a server at each storage repository that is accessed [11]. The server manages the transport protocol, converts from the transport protocol to the access protocol required by the local storage system, coordinates authenticity information, maps from the Unix identifier under which the data is stored to the Unix identifier of the requestor, applies access controls lists, etc. In addition to these access management operations, the remote server is also responsible for packaging the data before movement. This can be as simple as aggregation into streams of buffered data that are then transported. But in production data management systems, the server can also be required to do additional manipulations upon the data. The additional data manipulations constitute extended transport operations that can be implemented in the server that manages data movement from remote storage systems.

    The extended transport operations identified in this document are either supported by the San Diego Supercomputer Center Storage Resource Broker (SRB) [39, 2, 3, 20, 24], or are being developed in other Data Grids. To provide usage examples, nineteen data management system implementations that are based upon the SRB are discussed, covering Data Grids for the sharing of data across administrative domains, digital libraries for publishing data and supporting web services, and persistent archives for managing technology evolution. The extended transport operations needed by these three types of data management systems have many common components. The common components can be organized into five basic categories of extended transport operations.

The objective of this document is to:

    o Identify types of operations executed during remote data access

    o Organize the extended transport operations into categories

    o Survey the range of extended transport operations currently in use on production Data

    Grids

1.1 Categories of extended transport operations

    Five basic categories of extended transport operations are collectively required by Data Grids, digital libraries, and persistent archives.

    moore@sdsc.edu 2

GWD-I DRAFT 13-October-2003

    1. Byte level access

    2. Latency management mechanisms

    3. Remote execution of functions

    4. Heterogeneous system access

    5. Administrative tasks

    Byte level access transport operations correspond to the standard operations supported by Unix file systems. Latency management transport operations are typically operations that facilitate bulk data and bulk metadata access and update. The remote execution of functions provides the ability to process data directly at the remote storage system. Transport operations related to access of heterogeneous systems typically involve protocol conversion and data repackaging. Administrative tasks involve access to information catalogs to either control the transfer or manage the name space under which the digital entities are referenced. Each of these categories of extended transport operations is examined in more detail in section 2.

1.2 GGF Research Groups

    Multiple groups within the Global Grid Forum are addressing the issue of defining the sets of extended operations that should be performed on remote storage resources:

    o Persistent Archive Research Group is defining preservation operations, such as

    checksums, digital signatures, migration, all of which can be invoked at the remote

    storage system as a component of the transport operation.

    o Open Grid Services Architecture - Data Access and Integration Services Working Group

    (OGSA-DAI) is defining a set of operations that can be applied across both databases

    and file systems, as Data Set Services. This includes manipulation of records in

    databases, and formatting of query results before transport.

    o Grid File System Birds of a Feather session is defining operations on logical name

    spaces, and is mapping these operations to actions performed at remote file systems.

    o Data Format Description Language Working Group is defining operations on digital

    entities, which may be executed as remote processes invoked through remote data

    access mechanisms

    It is possible to build an environment in which all manipulations of data are performed with the result stored at the remote storage site, and then a separate request is issued to transmit the result [6,8,9]. This assumes space is available at the remote storage system to save the intermediate results. When dealing with very large data sets, this may be impractical, with partial transmission of results interspersed with the process of generating the intermediate data. At the other extreme, when dealing with small data sets, it may be much faster to make a bulk request to the remote storage system, rather than multiple individual requests. Each request incurs latency, whether in the wide area network, or within the storage repository itself. By making a single request for multiple small data sets, the performance can be substantially improved.

    For large data sets, such that the size is greater than the product of the access latency and the bandwidth, the additional messages that are transmitted when data formatting commands are sent separately from data transport commands, do not impact the performance. However, when the data set size is smaller than the bandwidth*delay product (that is, the product of the bandwidth and the access latency), performance can be degraded by arbitrarily large factors. All of the extended transport operations are intended to improve the performance of data access and manipulation over wide area networks. The extended operations either reduce the number of messages that must be sent to achieve a result, or migrate operations to the location where they can be performed optimally.

moore@sdsc.edu 3

GWD-I DRAFT 13-October-2003

    The multiple Global Grid Forum research groups are effectively defining the set of extended transport operations that improve the performance of their particular service.

2. Operations

    The types of operations that will be considered are focused on file and aggregated file level operations. For each of the major categories of extended transport operations, we provide explicit examples of the capabilities that are in use in production data management systems.

2.1 Byte level access

The traditional Unix file system operations include:

    o creat(), open(), close(), unlink()

    o read(), write(), seek(), sync()

    o stat(), fstat(), chmod()

    o mkdir(), rmdir(), opendir(), closedir(), readdir()

    o chown(), chdir()

    Additional file system operations are being developed in Data Grids that provide directory manipulation:

    ; rewinddir reset directory handle to the first entry in a directory

    ; seekdir set position for next read of a directory

    ; telldir get current seek pointer on a directory handle

    ; scandir scan a directory and return the list of files specified by a comparison function

    The ability to apply these operations directly at the remote storage system is one of the design goals of the Grid File System Research Group. In particular, the ability to read and write is needed for partial file reads at the remote site. Partial file reads make it possible to retrieve a subset of a file, especially important when dealing with very large files. The ability to seek is needed by paging systems for visualization (such as the San Diego Supercomputer Center 3D Visualization Toolkit). The ability to synchronize (sync) is needed when manipulating containers of files, and staging files from archives to disk.

    The ability to list and modify the remote directory structure is needed when manipulating remote collections that contain millions of files. The performance of the remote storage system depends upon the number of physical files within a directory. While one can map all logical names to a single physical directory, the performance of physical file systems improves when the logical names are mapped to multiple physical directories.

    The remote server that implements the Unix file system operations can be the same server that is also used to support the transmission of the results of the operations over the wide area networks.

2.2 Latency management mechanisms

    Explicit latency management mechanisms are used to manage and manipulate large numbers of files stored at remote sites. The operations involve some form of aggregation, whether of data into containers, metadata into XML files, or I/O commands into execution of remote processes. The operations also may invoke a mechanism that will improve future operations, such as staging of files onto faster media. The following latency management functions are in production use in Data Grids:

    o Bulk registration of files

    o Bulk data load

    o Bulk data unload

    moore@sdsc.edu 4

GWD-I DRAFT 13-October-2003

    o Aggregation into a container

    o Extraction from a container

    o Staging, required by the Hierarchical Resource Manager

    o Status, required by the Hierarchical Resource Manager

    Each of these operations requires the execution of a process at the remote storage system, which is invoked simultaneously with the transport request. Again the maximum performance improvement is seen when dealing with small files.

    Registration is the process of recursively processing a physical file system directory and creating corresponding logical file records in the Data Grid metadata catalog [17]. The information recorded in each logical file record can include the physical file name, length, registration date, owner, physical source location, etc. The information also includes administrative metadata that is needed to map from the logical file name to the physical file name. The physical file system directory structure can be replicated within the logical name space, and the logical file name can be set equal to the physical file name. This makes it possible to register an existing directory structure into the logical name space using a similar organization of the files. The user of the system can also choose to register the physical files into an entirely different organizational structure in the logical name space, using logical file names that are different from the original physical file names.

    Bulk registration corresponds to packaging the file system metadata before transmission over the network, and then bulk import of the metadata into the logical name space catalog. Standards have been developed in the digital library community for the organization of the metadata. The Metadata Encoding Transmission Standard (METS) is used to aggregate metadata for bulk movement [18]. The registration process implements one aspect of consistency management for associating administrative metadata with logical file names [19]. The METS schema is encoded in XML [48], as a standard syntax for annotating metadata. The encoding in XML and organization into a METS schema take place at the remote storage system before transmission of the file system metadata over the wide area network, and can be done by the same server as used for data transmission.

    Data load is the process of registration of the physical file into the logical name space, and the import of the file onto a storage system under the control of the Data Grid. Thus both metadata registration and data movement is needed. When dealing with small files, it is much faster to aggregate the small files before transmission. This can be done by explicit use of containers, physical aggregations of data that are managed by the Data Grid. The files are written into the container before transmission, and the container is stored as an entity within the Data Grid. Small files can also be aggregated for transport without storing the aggregation into the Data Grid. When the files reach the remote storage system, they are stored as independent files. The fastest mechanism in practice for dealing with small files is the explicit aggregation of the files into containers that are then managed by the Data Grid.

    Data unload is the export of files from the Data Grid and their movement to the requesting application. Again, when dealing with small files, it is faster to move containers of data. The application then needs an index into the container for the extraction of individual files. In this case, data transport is the coordinated movement of the container and the associated XML file that defines the bitfile offsets of the multiple files stored within the container.

    When containers are created, the operations that load the files into the container are performed at the remote storage repository. Similarly, when files are accessed, they may be extracted form the container, again at the remote storage system, with just the individual file transmitted to the requestor.

    Staging and status operations are needed for interactions with resource managers that reorder data access requests. A staging command is issued to request the movement of a file from an moore@sdsc.edu 5

GWD-I DRAFT 13-October-2003

    archive onto a disk cache. A status command is issued to check whether the staging request has been completed.

2.3 Remote Execution of Functions

    Commercial file system providers are examining the ability to support the execution of functions within file systems under the name of object oriented storage [34]. The idea is that the file system can support operations at the object level rather than the block level. Object level manipulations would be implemented through execution of defined functions on the files. This concept is already supported by database vendors.

    The central idea behind remote execution of functions is that low complexity operations (defined as a sufficiently small number of operations per byte moved) should always be performed at the remote storage repository to decrease the total time for access and manipulation. Conversely, high complexity operations (a sufficiently large number of operations per byte moved) should always be performed at the most powerful computer that is available. The exact conversion point depends upon the type of data movement that is being supported (streaming, pipelining), the load on the systems, the amount of data reduction that could be achieved, the complexity of the transport mechanism, the ratio of the execution rate and the product of the transmission bandwidth and the operation complexity (rate / (bandwidth*complexity)), and the relative execution rates. Object oriented access takes advantage of the ability of object oriented storage systems to perform appropriate processing steps directly on the remote storage repository.

    Example operations that are can be executed more efficiently on the remote storage system include:

    o Metadata extraction from files this typically extracts a few hundred bytes from a file.

    Unless the file only contains metadata, the operation is best performed at the remote

    storage repository.

    o Extraction of a file from a container if the file size is much smaller than the container

    size, the extraction should be done at the remote storage repository.

    o Validation of a digital signature If the local access bandwidth is greater than the wide

    area network bandwidth, the validation should be done at the remote storage repository.

    o Data subsetting this is similar to reading data out of a container, and should be done at

    the remote storage system when the data subset is small [4].

    o Data filtering this is similar to data subsetting, but also consists of decisions that are

    made during the filtering process [4]. When the result set is small, the process is better

    done at the remote storage system.

    o Server initiated parallel I/O streams the decision on the number of I/O streams to use

    when sending data in parallel over a wide area network depends strongly on the number

    of independent resources from which the data can be accessed. This is typically only

    known by the remote storage system. When data filtering is involved, the transport

    decisions and filtering results have to be coordinated. In the case of access to very large

    files that are filtered and then streamed to another storage system, the filtering is an

    active part of the process and controls the number of I/O streams.

    o Checksum checking on storage it can be worth checking that data was transmitted

    correctly, and on transmission, it may be worth checking that data has not been corrupted

    while being stored [10].

    o Encryption as a property of the data file. For biomedical data, all data transmissions of

    personal data must be encrypted. The encryption process must be invoked at the remote

    storage system on every file transfer.

    o Compression as a property of the data file. The decision to compress typically depends

    upon the bandwidth of the final network leg. The compression takes place at the remote

    storage repository to guarantee that the network over which the data is sent can handle

    the load.

moore@sdsc.edu 6

GWD-I DRAFT 13-October-2003

    An extension of the concept of remote execution of functions is the automated conversion of the encoding format of the digital entity to a desired encoding format. The Data Format Description Language Research Group is developing mechanisms to characterize the structure of digital entities, the semantic labels that are applied to the structures, and the operations that can be performed upon the structures. The characterizations correspond to digital ontologies that can be applied at the remote storage repository during access [21]. Preservation environments depend upon the ability to migrate digital entities to new encoding formats to ensure the ability to display archived material [22, 23, 40, 41, 42, 44]. The digital ontologies describe the structural relationships present within the digital entity, usually expressed using the Resource Description Framework syntax [36].

2.4 Protocol conversion

    File based access, such as that provided by GridFTP, assumes that a dataset can be generated by the remote storage system and then transported using the File Transfer Protocol. A form of extended transport operations occurs when the remote storage system that is being accessed does not provide standard Unix file system operations. In these cases, the data must be manipulated into a suitable form for transport, or the access mechanism must be modified to work with the protocol used by the remote storage system. The following additional types of storage systems are being accessed by production data grids:

    o Database blob access support reading and writing of blobs in databases

    o Database metadata access support aggregation of query results into an XML file before

    transport

    o Object ring buffer access support queries on the object in the ring buffer, and return

    only the objects that satisfy the query, while aggregating the objects into a single file.

    o Archive access manage archive access requirements such as server-initiated parallel

    I/O. In this case the remote storage system determines the optimal number of I/O

    streams to use.

    o Hierarchical Resource Manager access support staging requests and status requests

    for the placement of data within the remote storage system.

    o Preferred API support access methods such as Python, Java, C library, Shell command,

    Open Archives Initiative, Web Services Description Language, Open Grid Services

    Architecture, http, Dynamic Load Libraries, GridFTP. In this case, the transport

    mechanism that delivers the data is determined by the access mechanism. At some

    point in the transfer, the transport system will have to convert from the transport protocols

    of the remote storage system to the buffering scheme required by the chosen access

    method.

    The DAIS working group has defined a set of services for access to database management systems based on administrative tasks (publish, subscribe, propagate, consume) and operational tasks (createConsumption, alterConsumption, startConsumption, stopConsumption, dropConsumption, publishData, deliverData, deliverEvent, getData). The result sets can often be subjected to Unix file like operations in relation to transport. However, there are still cases where additional operations can be integrated with the data transport:

    o Asynchronous application of SQL commands, with results delivered under propagation

    rules. For large result sets, multiple partial result sets may need to be delivered.

    o Asynchronous delivery of results from DAIS patterns executed at a remote site, with

    transport to an intermediate site for joins across result sets before delivery to a third party.

    o Specification of a workflow with interaction between the steps requiring joins across result

    sets.

    An important aspect of preservation environments is the ability to manage access to multiple types of storage repositories. In particular, when new technology is developed, a persistent archive needs the ability to migrate data from the old technology to the new technology. This will require the ability to interoperate with multiple storage repository protocols while moving data. If moore@sdsc.edu 7

GWD-I DRAFT 13-October-2003

    third party transport is used, then the protocol conversion takes place entirely within the transport mechanism. In addition, preservation systems use a standard data encoding format, called an Archival Information Package (AIP), to encapsulate preservation metadata with each digital entity. AIPs are defined in the Open Archival Information System standard [32, 33]. On transport, elements of the AIP may need to be updated to reflect the new storage location, during the transport. On third party transport of AIPs, the update will take place within the transport mechanism.

2.5 Administrative tasks

    A fruitful area for discussion in future documents is the integration of distributed administrative tasks with transport operations. Examples of such tasks include:

    ; Data replication should the transport mechanism make the decision for whether to

    replicate data? Current schemes use an access history to decide when performance can

    be improved by replication. If the access history is maintained at the remote site, the

    transport mechanism can check the frequency of access versus the location of the

    requestor, and automate the replication of the file to a closer resource.

    ; Archive/restore functions should the transport mechanism force the archiving of less

    frequently used data to make room for the current transfer? The restore function is

    equivalent to a staging request to a hierarchical resource manager, and may also be

    implemented within the transport mechanism.

    ; Data transformation and translation should the transport mechanism enforce the

    conversion of binary objects into the binary encoding format used by the receiving

    operating system? An example is the implementation of support for the External Data

    Representation Standard (XDR) within the transport protocol [47].

    ; Data integration in a distributed environment should the mapping from a logical name

    space to physical file names occur as part of the transport protocol? An example is the

    specification of a logical file name for a transport operation, instead of providing a

    physical file name.

    ; Data integration between federated name spaces should the mapping between logical

    name spaces in multiple Data Grids occur as part of the transport protocol? An example

    is the automated forwarding of a transport request to the Data Grid that is managing the

    desired digital entity.

    ; Generating load average should the transport mechanism access time dependent host

    information? An example is the Grid Datafarm which supports generation of load

    average [10].

3. Implementations

    The SDSC Storage Resource Broker is being used to support Data Grids [19, 20], digital libraries [3, 19], and persistent archives [22, 23]. Example implementations from each of these environments are listed, along with the extended transport operations that are used. Note that all of the SRB implementations specify transport operations through use of logical file names.

    o National Aeronautics and Space Administration (NASA) Information Power Grid

    “traditional” Data Grid [28]. Bulk operations are used to register files into the Grid.

    Containers are used to package (aggregate) files before loading into an archive.

    Transport operations are specified through logical file names.

    o NASA Advanced Data Grid Data Grid [25]. Bulk operations are used to register files

    and metadata.

    o NASA Data Management System/Global Modeling and Assimilation Office Data Grid

    [26]. Containers are used for interacting with archives. The logical name space is

    partitioned across multiple physical directories to improve performance.

    o NASA Earth Observing Satellite Data Grid [27]. Read and write operations are

    supported against a “WORM” file system. This means that all updates cause a new moore@sdsc.edu 8

    GWD-I DRAFT 13-October-2003

    version to be written.

    o Department of Energy (DOE) Particle Physics Data Grid (PPDG) / BaBar high energy

    physics experiment Data Grid [35]. Bulk operations are used to register files, load files

    into the Data Grid, and unload files from the Data Grid. A bulk remove operation has

    been requested to complement the bulk registration operation. Staging and status

    operations are used to interact with a Hierarchical Storage Manager.

    o Japan/High Energy Research Accelerator Program (KEK) Data Grid [14]. Bulk

    operations are used to register files, load files, and unload files.

    o National Virtual Observatory (NVO)/United States Naval Observatory-B Data Grid [30,

    46]. Registration of files through the movement of Grid Bricks. Data is written to a disk

    cache locally (Grid Brick). The Grid Brick is physically moved to a remote site where

    bulk registration and bulk load are invoked on the Grid Brick to import the data into the

    Data Grid.

    o National Science Foundation (NSF)/National Partnership for Advanced Computational

    Infrastructure Data Grid [31]. Containers are used to minimize the impact on the

    archive name space for large collections of small files. Remote processes are used for

    metadata extraction. Data subsetting is done through use of DataCutter remote filters [4].

    The seek operation is used to optimize paging of data for a 4D visualization rendering

    system. Data transfers are invoked using server-initiated parallel I/O to optimize

    interactions with the HPSS archive. Bulk registration, load and unload are used for

    collections of small data. Results from queries on databases are aggregated into XML

    files for transport.

    o National Institute of Health/Biomedical Informatics Research Network Data Grid [5].

    Encryption and compression of data are managed at the remote storage system as a

    property of the logical name space. This ensures privacy of data during transport. o Library of Congress Data Grid [16]. Bulk registration is used to import large collections

    of small files.

    o NSF/ Real-time Observatories, Applications, and Data management Network (Roadnet)

    Data Grid [37]. Queries across object ring buffers return result sets. o NSF/Joint Center for Structural Genomics Data Grid [15]. Parallel I/O is used to push

    experimental data into remote archives, with data aggregated into containers. o NVO/2-Micron All Sky Survey digital library [1, 30]]. Containers are used to organize

    five million images. An image cutout service is implemented as a remote process,

    executed directly on the remote storage system. A metadata extraction service is run as

    a remote process, with the metadata parsed from the image and aggregated before

    transfer.

    o NVO/Digital Palomar Observatory Sky Survey digital library [7]. Bulk registration is

    used to register the images. An image cutout service is implemented as a remote

    process, executed directly on the remote storage repository.

    o NSF/Southern California Earthquake Center digital library [38]. Bulk registration of files

    is used to load simulation output files into the logical name space (1.5 million files

    generated in a simulation using 3000 time steps).

    o National Archives and Records Administration (NARA) - persistent-archive [45]. Bulk

    registration, load, and unload are used to access digital entities from web archives.

    Containers are used to aggregate files before storage in archives. Transport operations

    are automatically forwarded to the appropriate Data Grid for execution through peer-to-

    peer federation mechanisms.

    o NSF/National Science Digital Library (NSDL) - persistent-archive [29]. Bulk registration,

    load, and unload are used to import digital entities into an archive. Web browsers are

    used to access and display the imported data, using http.

    o University of California San Diego persistent archive [43]. Bulk registration, load, and

    unload are used to import digital entities into an archive.

    The GridFTP transport system also incorporates extended operations beyond the traditional “get”

    and “put” of the original FTP mechanism [12]. The extended operations include:

    moore@sdsc.edu 9

GWD-I DRAFT 13-October-2003

    o Partial file access. The read and write operations are supported for reading parts of files

    and for modifying parts of files.

    o Parallel I/O. Data is sent using multiple I/O streams to the requestor.

    o Guaranteed data transmission. Data transport is restarted as needed across all

    interruptions.

4. Summary

    Five major categories of extended transport operations have been identified. For each category, an example production Data Grid has been identified, along with the particular extensions that are used. To improve the analysis, comparisons with additional data management systems is warranted to decide whether important extensions have been overlooked. In reality, multiple projects are now facing the challenge of deciding where computations should take place within the grid. The manipulations can take place at the remote site where the data is stored [4], or the data can be transported to the location where it is processed, as in the Grid Physics Network’s [13] use of Chimera. Of interest to the Data Transport Working Group, is an understanding of when transport and remote manipulation need to be combined, with partial transfer of results as the data is generated.

5. Author Information

Reagan W. Moore

    San Diego Supercomputer Center (SDSC)

    9500 Gilman Drive, MC-0505

    La Jolla, CA 92093-0505]

    moore@sdsc.edu

moore@sdsc.edu 10

Report this document

For any questions or suggestions please email
cust-service@docsford.com