Publication details
- Client-Side Data Transformation in Lustre (Anna Fuchs), Master's Thesis, School: Universität Hamburg, 2016-05-25
Publication details
Abstract
Due to the increasing gap between computation power and storage speed and capacity, compression techniques for compensating the I/O bottleneck become more urgent than ever. Although some file systems already support compression, none of the distributed ones do. Lustre is a widely used distributed parallel file system in the HPC area, which can only profit from ZFS backend compression so far. Along with archiving desires to reduce storage space, network throughput can also benefit from compression on the client side. Userspace benchmarks showed, compression can increase throughput by up to a factor of 1.2 while decreasing the required storage space by half. This thesis primarily aims to analyze the suitability of compression for the Lustre client and to introduce online compression based on stripes. This purpose places certain demands on the compression algorithm to be used. Slow algorithms can have adverse effects and decrease system's overall performance. A higher compression ratio at the expense of lower speed can nevertheless be worthwhile due to the sharply reduced amount of data to be transferred. LZ4 is one of the fastest compression algorithms and a good candidate to be used on-the-fly. A prototype of LZ4 fast compression within a Lustre client will be presented for a limited number of use cases. In course of the design, different approaches are discussed with regard to transparency and avoidance of code duplication. Finally, some ideas for adaptive compression, client hints and server-side support will be presented.
BibTeX
@mastersthesis{CDTILF16, author = {Anna Fuchs}, title = {{Client-Side Data Transformation in Lustre}}, advisors = {Michael Kuhn}, year = {2016}, month = {05}, school = {Universität Hamburg}, type = {Master's Thesis}, abstract = {Due to the increasing gap between computation power and storage speed and capacity, compression techniques for compensating the I/O bottleneck become more urgent than ever. Although some file systems already support compression, none of the distributed ones do. Lustre is a widely used distributed parallel file system in the HPC area, which can only profit from ZFS backend compression so far. Along with archiving desires to reduce storage space, network throughput can also benefit from compression on the client side. Userspace benchmarks showed, compression can increase throughput by up to a factor of 1.2 while decreasing the required storage space by half. This thesis primarily aims to analyze the suitability of compression for the Lustre client and to introduce online compression based on stripes. This purpose places certain demands on the compression algorithm to be used. Slow algorithms can have adverse effects and decrease system's overall performance. A higher compression ratio at the expense of lower speed can nevertheless be worthwhile due to the sharply reduced amount of data to be transferred. LZ4 is one of the fastest compression algorithms and a good candidate to be used on-the-fly. A prototype of LZ4 fast compression within a Lustre client will be presented for a limited number of use cases. In course of the design, different approaches are discussed with regard to transparency and avoidance of code duplication. Finally, some ideas for adaptive compression, client hints and server-side support will be presented.}, }