A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System

N. Jayakumar, A. M. Kulkarni

Abstract


Storage performance is one of the vital characteristics of a big data environment. Data throughput can be increased to some extent using storage virtualization and parallel data paths. Technology has enhanced the various SANs and storage topologies to be adaptable for diverse applications that improve end to end performance. In big data environments the mostly used file systems are HDFS (Hadoop Distributed File System) and Lustre. There are environments in which both HDFS and Lustre are connected, and the applications directly work on Lustre. In Lustre architecture with out-of-band storage virtualization system, the separation of data path from metadata path is acceptable (and even desirable) for large files since one MDT (Metadata Target) open RPC is typically a small fraction of the total number of read or write RPCs. This hurts small file performance significantly when there is only a single read or write RPC for the file data. Since applications require data for processing and considering in-situ architecture which brings data or metadata close to applications for processing, how the in-situ processing can be exploited in Lustre is the domain of this dissertation work. The earlier research exploited Lustre supporting in-situ processing when Hadoop/MapReduce is integrated with Lustre, but still, the scope of performance improvement existed in Lustre. The aim of the research is to check whether it is feasible and beneficial to move the small files to the MDT so that additional RPCs and I/O overhead can be eliminated, and read/write performance of Lustre file system can be improved.


Keywords


Big Data; Metadata; Lustre; Active Storage; Small File

Full Text:

PDF

References


F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, I. Huang, Understanding Lustre Filesystem Internals, Available at: http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf, 2009

Iozone Filesystem Benchmark, http://www.iozone.org

NERSC, IOR Test, Available at: http://www.nersc.gov/

users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ior/

The Linux Juggernaut, 12 Linux dd Command Examples, Available at: http://www.linuxnix.com/what-you-should-know-about-linux-dd-command/




eISSN: 1792-8036     pISSN: 2241-4487