A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System
Abstract
Storage performance is one of the vital characteristics of a big data environment. Data throughput can be increased to some extent using storage virtualization and parallel data paths. Technology has enhanced the various SANs and storage topologies to be adaptable for diverse applications that improve end to end performance. In big data environments the mostly used file systems are HDFS (Hadoop Distributed File System) and Lustre. There are environments in which both HDFS and Lustre are connected, and the applications directly work on Lustre. In Lustre architecture with out-of-band storage virtualization system, the separation of data path from metadata path is acceptable (and even desirable) for large files since one MDT (Metadata Target) open RPC is typically a small fraction of the total number of read or write RPCs. This hurts small file performance significantly when there is only a single read or write RPC for the file data. Since applications require data for processing and considering in-situ architecture which brings data or metadata close to applications for processing, how the in-situ processing can be exploited in Lustre is the domain of this dissertation work. The earlier research exploited Lustre supporting in-situ processing when Hadoop/MapReduce is integrated with Lustre, but still, the scope of performance improvement existed in Lustre. The aim of the research is to check whether it is feasible and beneficial to move the small files to the MDT so that additional RPCs and I/O overhead can be eliminated, and read/write performance of Lustre file system can be improved.
Keywords:
Big Data, Metadata, Lustre, Active Storage, Small FileDownloads
References
F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, I. Huang, Understanding Lustre Filesystem Internals, Available at: http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf, 2009 DOI: https://doi.org/10.2172/951297
Iozone Filesystem Benchmark, http://www.iozone.org
NERSC, IOR Test, Available at: http://www.nersc.gov/
users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ior/
The Linux Juggernaut, 12 Linux dd Command Examples, Available at: http://www.linuxnix.com/what-you-should-know-about-linux-dd-command/
Downloads
How to Cite
License
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.