Survey of distributed file system
Table of Contents
The Coda distributed file system is a state of the art experimental file system developed in the group of M. Satyanarayanan at Carnegie Mellon University. Numerous people contributed to Coda which now incorporates many features not found in other systems: Mobile Computing
; disconnected operation for mobile clients
o reintegration of data from disconnected clients
o bandwidth adaptation
; Failure Resilience
o read/write replication servers
o resolution of server/server conflicts
o handles of network failures which partition the servers
o handles disconnection of clients client
; Performance and scalability
o client side persistent caching of files, directories and attributes for high
o write back caching
o kerberos like authentication
o access control lists (ACL's)
; Well defined semantics of sharing
; Freely available source code
Coda was originally implemented on Mach 2.6 and has recently been ported to Linux, NetBSD and FreeBSD.Michael Callahan ported a large portion of Coda to Windows 95, and we are studying Windows NT to understand the feasibility of porting Coda to NT Currently, our efforts are on ports and on making the system more robust. A few new features arc being implemented (write-back caching and cells for example), and in several areas, components of Coda are being reorganized.
; Braam, P. J. The Coda Distributed File System. Linux Journal, #50 June 1998
; Satyanarayanan, M. Fundamental Challenges in Mobile Computing. Fifteenth
ACM Symposium on Principles of Distributed Computing May 1996,
; Satyanarayanan, M. Mobile Information Access. IEEE Personal
Communications, Vol. 3, No. 1, February 1996
; Noble, B., Satyanarayanan, M. A Research Status Report on Adaptation for
Mobile Data Access. SIGMOD Record, Vol. 24, No. 4, December 1995.
; Satyanarayanan, M. Scalable, Secure, and Highly Available Distributed File
Access. IEEE ComputerMay 1990, Vol. 23, No. 5
; Satyanarayanan, M. Coda: A Highly Available File System for a Distributed
Workstation Environment. Proceedings of the Second IEEE Workshop on
Workstation Operating Systems Sep. 1989, Pacific Grove, CA
; Satyanarayanan, M. Autonomy or Interdependence in Distributed Systems?
Third ACM SIGOPS European Workshop Sep. 1988, Cambridge, England
; Satyanarayanan, M., Kistler, J.J., Siegel, E.H. Coda: A Resilient Distributed File
System. IEEE Workshop on Workstation Operating Systems, Nov. 1987,
2. Distributed File System(Microsoft)
Table of Contents
; DFS Terminology
DFS allows administrators to group shared folders located on different servers by transparently connecting them to one or more DFS namespaces. A DFS namespace is a virtual view of shared folders in an organization. Using the DFS tools, an administrator selects which shared folders to present in the namespace, designs the hierarchy in which those folders appear, and determines the names that the shared folders show in the namespace. When a user views the namespace, the folders appear to reside on a single, high-capacity hard disk. Users can navigate the namespace without needing to know the server names or shared folders hosting the data. DFS also provides other benefits, including the following:
; Simplified data migration
DFS simplifies the process of moving data from one file server to another.
; Increased availability of file server data
in the event of a server failure, DFS refers client computers to the next available server,
so users can always access shared folders without interruption
; Load sharing
DFS provides a degree of load sharing by mapping a given logical name to shared folders
on multiple file servers.
; Security integration
Administrators do not need to configure additional security for DFS namespaces
because file and folder security is enforced by existing the NTFS file system and shared
folder permissions on each target.
The following terms are used to describe the basic components of DFS:
; DFS namespace
A virtual view of shared folders on different servers as provided by DFS. A DFS
namespace consists of a root and many links and targets. The namespace starts with a
root that maps to one or more root targets. Below the root are links that map to their
; DFS link
A component in a DFS path that lies below the root and maps to one or more link
; DFS path
Any Universal Naming Convention (UNC) path that starts with a DFS root. ; DFS root
The starting point of the DFS namespace. The root is often used to refer to the
namespace as a whole. A root maps to one or more root targets, each of which
corresponds to a shared folder on a separate server. The DFS root must reside on an
NTFS volume. A DFS root has one of the following formats: \\ServerName\RootName or
; domain-based DFS namespace
A DFS namespace that has configuration information stored in Active Directory. The
path to access the root or a link starts with the host domain name. A domain-based DFS
root can have multiple root targets, which offers fault tolerance and load sharing. ; link referral
A type of referral that contains a list of link targets for a particular link. ; link target
The mapping destination of a link. A link target can be any UNC path. For example, a link
target could be a shared folder or another DFS path.
A list of targets, transparent to the user, which a DFS client receives from DFS when the
user is accessing a root or a link in the DFS namespace. The referral information is
cached on the DFS client for a time period specified in the DFS configuration. ; root referral
A type of referral that contains a list of root targets for a particular root. ; root target
A physical server that hosts a DFS namespace. A domain-based DFS root can have
multiple root targets, whereas a stand-alone DFS root can only have one root target.
Root targets are also called root servers.
; stand-alone DFS namespace
A DFS namespace whose configuration information is stored locally in the registry of the
root server. The path to access the root or a link starts with the root server name. A
stand-alone DFS root has only one root target. Stand-alone roots are not fault tolerant;
when the root target is unavailable, the entire DFS namespace is inaccessible. You can
make stand-alone DFS roots fault tolerant by creating them on server clusters.
3. Fraunhofer Parallel file System(FhGFS)
Table of Contents
Fraunhofer Parallel file System(FhGFS) is the new parallel File System from the Fraunhofer Competence Center for High Performance Computing. FhGfs is written from scratch and incorporate results from our experience with existing systems. FhGfs is a fully POSIX compliant, scalable file system with nice features like:
; Distributed metadata:
Although parallel file systems usually distribute the file contents over multiple storage
nodes, the metadata is often bound to single nodes. This leads to performance
bottlenecks and limited fault tolerance. FhGFS distributes the metadata across all the
available storage nodes in a special way that keeps the lookup time at a minimum.
; Easy installation:
FhGFS requires no kernel patches, is able to connect storage nodes and servers with
zero-config and allows you to add more clients and storage nodes to the running system
whenever you want it.
; Support for high performance technologies:
FhGFS is built on a scalable multithreaded architecture with native InfiniBand support.
Storage nodes can serve InfiniBand and Ethernet clients at the same time and
automatically switches to a redundant connection path in case any of them fails.
Table of Contents
; Features and Benefits
Lustre is an object-based, distributed file system, generally used for large scale cluster computing. The name Lustre is a blend of the words Linux and cluster. The project aims to provide a file system for clusters of tens of thousands of nodes with petabytes of storage capacity, without compromising speed or security. Lustre is available under the GNU GPL.
Lustre file systems can support up to tens of thousands of client systems, petabytes (PBs) of storage and hundreds of gigabytes per second (GB/s) of I/O throughput. Businesses ranging from Internet service providers to large financial institutions deploy Lustre file systems in their data centers. Due to the high scalability of Lustre file systems, Lustre deployments are popular in the oil and gas, manufacturing, rich media and finance sectors.
A Lustre file system has three major functional units:
; A single metadata target (MDT) per filesystem that stores metadata, such as filenames,
directories, permissions, and file layout, on the metadata server (MDS)
; One or more object storage targets (OSTs) that store file data on one or more object
storage servers (OSSes). Depending on the server’s hardware, an OSS typically serves
between two and eight targets, each target a local disk filesystem up to 8 terabytes (TBs)
in size. The capacity of a Lustre file system is the sum of the capacities provided by the
; Client(s) that access and use the data. Lustre presents all clients with standard POSIX
semantics and concurrent read and write access to the files in the filesystem.
The MDT, OST, and client can be on the same node or on different nodes, but in typical installations, these functions are on separate nodes with two to four OSTs per OSS node communicating over a network. Lustre supports several network types, including Infiniband, TCP/IP on Ethernet, Myrinet, Quadrics, and other proprietary technologies. Lustre can take advantage of remote direct memory access (RDMA) transfers, when available, to improve throughput and reduce CPU usage.
The storage attached to the servers is partitioned, optionally organized with logical volume management (LVM) and/or RAID, and formatted as file systems. The Lustre OSS and MDS servers read, write, and modify data in the format imposed by these file systems.
An OST is a dedicated filesystem that exports an interface to byte ranges of objects for read/write operations. An MDT is a dedicated filesystem that controls file access and tells clients which object(s) make up a file. MDTs and OSTs currently use a modified version of ext3 to store data. In the future, Sun's ZFS/DMU will also be used to store data.
When a client accesses a file, it completes a filename lookup on the MDS. As a result, a file is created on behalf of the client or the layout of an existing file is returned to the client. For read or
writer operations, the client then passes the layout to a logical object volume (LOV), which maps the offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSTs. With this approach, bottlenecks for client-to-OST communications are eliminated, so the total bandwidth available for the clients to read and write data scales almost linearly with the number of OSTs in the filesystem.
Clients do not directly modify the objects on the OST filesystems, but, instead, delegate this task to OSSes. This approach ensures scalability for large-scale clusters and supercomputers, as well as improved security and reliability. In contrast, shared block-based filesystems such as Global File System and OCFS must allow direct access to the underlying storage by all of the clients in the filesystem and risk filesystem corruption from misbehaving/defective clients. Features and Benefits
Lustre's unprecedented scalability, bulletproof reliability, and proven performance help you meet the uptime requirements of your most demanding business and national-security applications.
; Unparalleled scalability