Publication details

State of the Art and Future Trends in Data Reduction for High-Performance Computing (Kira Duwe, Jakob Lüttgau, Georgiana Mania, Jannek Squar, Anna Fuchs, Michael Kuhn, Eugen Betke, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 7, Number 1, pp. 4–36, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2020-04
Publication details – URL – DOI

Abstract

Research into data reduction techniques has gained popularity in recent years as storage capacity and performance become a growing concern. This survey paper provides an overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes. We present the underlying theories and their application throughout the HPC stack and also discuss related hardware acceleration and reduction approaches. After introducing relevant use-cases, an overview to modern lossless and lossy compression algorithms and their respective usage at the application and file system layer is given. In anticipation of their increasing relevance for adaptive and in situ approaches, dimensionality reduction techniques are summarized with a focus on non-linear feature extraction. Adaptive approaches and in situ compression algorithms and frameworks follow. The key stages and new opportunities to deduplication are covered next. An unconventional but promising method is recomputation, which is proposed at last. We conclude the survey with an outlook on future developments.

BibTeX

@article{SOTAAFTIDR20,
	author	 = {Kira Duwe and Jakob Lüttgau and Georgiana Mania and Jannek Squar and Anna Fuchs and Michael Kuhn and Eugen Betke and Thomas Ludwig},
	title	 = {{State of the Art and Future Trends in Data Reduction for High-Performance Computing}},
	year	 = {2020},
	month	 = {04},
	editor	 = {Jack Dongarra and Vladimir Voevodin},
	publisher	 = {Publishing Center of South Ural State University},
	address	 = {454080, Lenin prospekt, 76, Chelyabinsk, Russia},
	journal	 = {Supercomputing Frontiers and Innovations},
	series	 = {Volume 7, Number 1},
	pages	 = {4--36},
	doi	 = {http://dx.doi.org/10.14529/jsfi200101},
	abstract	 = {Research into data reduction techniques has gained popularity in recent years as storage capacity and performance become a growing concern. This survey paper provides an overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes. We present the underlying theories and their application throughout the HPC stack and also discuss related hardware acceleration and reduction approaches. After introducing relevant use-cases, an overview to modern lossless and lossy compression algorithms and their respective usage at the application and file system layer is given. In anticipation of their increasing relevance for adaptive and in situ approaches, dimensionality reduction techniques are summarized with a focus on non-linear feature extraction. Adaptive approaches and in situ compression algorithms and frameworks follow. The key stages and new opportunities to deduplication are covered next. An unconventional but promising method is recomputation, which is proposed at last. We conclude the survey with an outlook on future developments.},
	url	 = {https://superfri.org/superfri/article/view/303},
}