General Parallel File System

General Parallel File System

The transition of the electric power industry towards a smarter grid will result in a tremendous growth in the amount of data that is communicated and stored for real-time and off line use. An example is the intention to store synchrophasor data with a rate of one sample per cycle (with potential increase to four samples per cycle) collected from thousands of Phasor Measurement Units distributed all over the system. This data may be used for visualization, situation awareness or analysis of wide area disturbances. This will require simultaneous access to the data by different users and applications located at different remote sites. Achieving all these goals requires the use of technology that can meet the various requirements of the different clients and applications.

One available technology that looks like a good match is IBM's  General Parallel File System (IBM GPFS™). It is a high-performance, shared-disk file management solution that can provide faster, more reliable access to a common set of file data. Enabling a view of distributed data with a single global namespace across platforms, GPFS is also designed to provide:

  • Online storage management
  • Scalable data access through tightly integrated information lifecycle tools capable of managing petabytes of data and billions of files
  • Centralized administration
  • Shared access to file systems from remote GPFS clusters
  • Improved storage use

GPFS allows effective management of growing quantities of unstructured data based on its cluster architecture that provides quicker access to the users' file data which is automatically spread across multiple storage devices, providing optimal use of your available storage to deliver high performance. There is no single-server bottleneck or protocol overhead for data transfer. GPFS takes file management beyond a single system by providing scalable access from multiple systems to a global namespace. GPFS interacts with applications like a local file system but is designed to deliver higher performance, scalability and fault tolerance by allowing access to the data from multiple systems directly and in parallel. It allows multiple applications or users to share access to a single file simultaneously while maintaining file-data integrity.

An efficient network shared disk (NSD) model can be used to transparently forward requests from a GPFS client application node to an NSD server node to perform the disk I/O operation, and then pass the data back to the client. 

BeijingSifang June 2016