Publication details
- Evaluating Lossy Compression on Climate Data (Nathanael Hübbe, Al Wegener, Julian Kunkel, Yi Ling, Thomas Ludwig), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 343–356, (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin, Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743, 2013-06
 Publication details – DOI
Abstract
While the amount of data used by today’s high-performance computing (HPC) codes is huge, HPC users have not broadly adopted data compression techniques, apparently because of a fear that compression will either unacceptably degrade data quality or that compression will be too slow to be worth the effort. In this paper, we examine the effects of three lossy compression methods (GRIB2 encoding, GRIB2 using JPEG 2000 and LZMA, and the commercial Samplify APAX algorithm) on decompressed data quality, compression ratio, and processing time. A careful evaluation of selected lossy and lossless compression methods is conducted, assessing their influence on data quality, storage requirements and performance. The differences between input and decoded datasets are described and compared for the GRIB2 and APAX compression methods. Performance is measured using the compressed file sizes and the time spent on compression and decompression. Test data consists both of 9 synthetic data exposing compression behavior and 123 climate variables output from a climate model. The benefits of lossy compression for HPC systems are described and are related to our findings on data quality.
BibTeX
@inproceedings{ELCOCDHWKL13,
	author	 = {Nathanael Hübbe and Al Wegener and Julian Kunkel and Yi Ling and Thomas Ludwig},
	title	 = {{Evaluating Lossy Compression on Climate Data}},
	year	 = {2013},
	month	 = {06},
	booktitle	 = {{Supercomputing}},
	editor	 = {Julian Martin Kunkel and Thomas Ludwig and Hans Werner Meuer},
	publisher	 = {Springer},
	address	 = {Berlin, Heidelberg},
	series	 = {Lecture Notes in Computer Science},
	number	 = {7905},
	pages	 = {343--356},
	conference	 = {ISC 2013},
	location	 = {Leipzig, Germany},
	isbn	 = {978-3-642-38749-4},
	issn	 = {0302-9743},
	doi	 = {http://dx.doi.org/10.1007/978-3-642-38750-0_26},
	abstract	 = {While the amount of data used by today’s high-performance computing (HPC) codes is huge, HPC users have not broadly adopted data compression techniques, apparently because of a fear that compression will either unacceptably degrade data quality or that compression will be too slow to be worth the effort. In this paper, we examine the effects of three lossy compression methods (GRIB2 encoding, GRIB2 using JPEG 2000 and LZMA, and the commercial Samplify APAX algorithm) on decompressed data quality, compression ratio, and processing time. A careful evaluation of selected lossy and lossless compression methods is conducted, assessing their influence on data quality, storage requirements and performance. The differences between input and decoded datasets are described and compared for the GRIB2 and APAX compression methods. Performance is measured using the compressed file sizes and the time spent on compression and decompression. Test data consists both of 9 synthetic data exposing compression behavior and 123 climate variables output from a climate model. The benefits of lossy compression for HPC systems are described and are related to our findings on data quality.},
}
