IPCC for Lustre

IPCC Description

“Intel® Parallel Computing Centers are universities, institutions, and labs that are leaders in their field. The primary focus is to modernize applications to increase parallelism and scalability through optimizations that leverage cores, caches, threads, and vector capabilities of microprocessors and coprocessors.” – Intel

General Information

Due to the increasing gap between computational speed, network speed and storage capacity, it has become necessary to investigate data reduction techniques. Storage systems have become a significant part of the total cost of ownership due to the increased amount of storage devices, their associated acquisition cost and energy consumption.

Ultimately, we are aiming for compression support in Lustre at multiple levels:

Client-side compression allows using the available network and storage capacity more efficiently,
Client hints empower applications to provide information useful for compression and
Adaptive compression makes it possible to choose appropriate settings depending on performance metrics and projected benefits.

Compression will be completely transparent to the applications because it will be performed by the client and/or server on their behalf. However, it will be possible for users to tune Lustre's behavior to obtain the best performance/compression/etc. When using client-side compression, the single stream performance bottleneck will directly benefit from the compression. Initial studies have shown that a compression ratio of 1.5 can be achieved for scientific data using lz4.

Publications

State of the Art and Future Trends in Data Reduction for High-Performance Computing (Kira Duwe, Jakob Lüttgau, Georgiana Mania, Jannek Squar, Anna Fuchs, Michael Kuhn, Eugen Betke, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 7, Number 1, pp. 4–36, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2020-04
Publication details – URL – DOI

Analyzing Data Properties using Statistical Sampling – Illustrated on Scientific File Formats (Julian Kunkel), In Supercomputing Frontiers and Innovations, Series: Volume 3, Number 3, pp. 19–33, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2016-10
Publication details – URL – DOI

Data Compression for Climate Data (Michael Kuhn, Julian Kunkel, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 3, Number 1, pp. 75–94, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2016-06
Publication details – URL – DOI

Contributions

So far, we have contributed several changes to Lustre and other projects.

lz4fast in Linux (versions 4.11 and later, also see https://lwn.net/Articles/713175/)
lz4fast in ZFS
Autocompression in ZFS
QoS for compression in ZFS

Funding

Intel Parallel Computing Center for Lustre “Enhanced Adaptive Compression in Lustre”

Working period: 2016-06–2020-05

People from WR

Prof. Dr. Michael Kuhn (contact person)
Anna Fuchs (development)

Prof. Dr. Thomas Ludwig (principal investigator)

Scientific Computing // Wissenschaftliches Rechnen

Table of Contents