Publication details
- SFS: A Tool for Large Scale Analysis of Compression Characteristics (Julian Kunkel), Research Papers (4), Research Group: Scientific Computing, University of Hamburg (Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg), 2017-05-05
Publication details – Publication
Abstract
Data centers manage Petabytes of storage. Identifying the a fast lossless compression algorithm that is enabled on the storage system
that potentially reduce data by additional 10% is significant. However, it is not trivial to evaluate algorithms on huge data pools as this evaluation requires running the algorithms and, thus, is costly, too. Therefore, there is the need for tools to optimize such an analysis. In this paper, the open source tool SFS is described that perform these scans efficiently. While based on an existing open source tool, SFS builds on a proven method to scan huge quantities of data using sampling from statistic. Additionally, we present results of 162 variants of various algorithms conducted on three data pools with scientific data and one more general purpose data pool. Based on this analysis promising classes of algorithms are identified.
BibTeX
@techreport{SATFLSAOCC17, author = {Julian Kunkel}, title = {{SFS: A Tool for Large Scale Analysis of Compression Characteristics}}, year = {2017}, month = {05}, publisher = {Research Group: Scientific Computing, University of Hamburg}, address = {Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg}, series = {Research Papers}, number = {4}, abstract = {Data centers manage Petabytes of storage. Identifying the a fast lossless compression algorithm that is enabled on the storage system that potentially reduce data by additional 10\% is significant. However, it is not trivial to evaluate algorithms on huge data pools as this evaluation requires running the algorithms and, thus, is costly, too. Therefore, there is the need for tools to optimize such an analysis. In this paper, the open source tool SFS is described that perform these scans efficiently. While based on an existing open source tool, SFS builds on a proven method to scan huge quantities of data using sampling from statistic. Additionally, we present results of 162 variants of various algorithms conducted on three data pools with scientific data and one more general purpose data pool. Based on this analysis promising classes of algorithms are identified.}, }