Publication details

Efficient handling of compressed data in ZFS (Hauke Stieler), Bachelor's Thesis, School: Universität Hamburg, 2019-01-16
Publication details

Abstract

Since the beginning of the computer era, the speed of computation, network and storage as well as the storage capacity, have grown exponentially. This effect is well known as Moore's law and leads to increasing gaps between the performance of computation and storage. High performance computing organizations like the Deutsches Klimarechenzentrum (DKRZ) suffer from these gaps and will benefit from solutions introduced by the IPCC for Lustre projects. To close this cap applications and file systems use compression in order to speed up the storage of data. Even if file systems like ZFS are already able to compress data, distributed file systems like Lustre are not completely able to use these functionality in an efficient way. A client-side compression implementation done by Anna Fuchs, presented in her thesis “Client-Side Data Transformation in Lustre”, reduces the size of the data before it is send over the network to the storage server. First changes to ZFS were presented by Niklas Behrmann in his thesis “Support for external data transformation in ZFS” adding support to ZFS for externally compressed data. The interconnection of these two theses was done by Sven Schmidt in his thesis “Efficient interaction between Lustre and ZFS for compression” using the client-side compression with the new API functions of ZFS. This thesis presents an efficient way to handle externally compressed data in ZFS in order to do fast read and write calls. The code changes and complexity of the solutions were kept at a minimum and are as maintainable as possible. In order to do this, a refactoring of the existing work was done before adding features to the code base. Introducing explicit flags and simplifying code paths enables ZFS to handle the data as efficient as possible without neglecting the quality of the software. A correctness and performance analysis, using the test environment of ZFS, shows the efficiency of the implementation and reveals also tasks to do in future work.

BibTeX

@misc{EHOCDIZS19,
	author	 = {Hauke Stieler},
	title	 = {{Efficient handling of compressed data in ZFS}},
	advisors	 = {Michael Kuhn and Anna Fuchs},
	year	 = {2019},
	month	 = {01},
	school	 = {Universität Hamburg},
	howpublished	 = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:hauke_stieler_efficient_handling_of_compressed_data_in_zfs.pdf}}},
	type	 = {Bachelor's Thesis},
	abstract	 = {Since the beginning of the computer era, the speed of computation, network and storage as well as the storage capacity, have grown exponentially. This effect is well known as Moore's law and leads to increasing gaps between the performance of computation and storage. High performance computing organizations like the Deutsches Klimarechenzentrum (DKRZ) suffer from these gaps and will benefit from solutions introduced by the IPCC for Lustre projects. To close this cap applications and file systems use compression in order to speed up the storage of data. Even if file systems like ZFS are already able to compress data, distributed file systems like Lustre are not completely able to use these functionality in an efficient way. A client-side compression implementation done by Anna Fuchs, presented in her thesis "Client-Side Data Transformation in Lustre", reduces the size of the data before it is send over the network to the storage server. First changes to ZFS were presented by Niklas Behrmann in his thesis "Support for external data transformation in ZFS" adding support to ZFS for externally compressed data. The interconnection of these two theses was done by Sven Schmidt in his thesis "Efficient interaction between Lustre and ZFS for compression" using the client-side compression with the new API functions of ZFS. This thesis presents an efficient way to handle externally compressed data in ZFS in order to do fast read and write calls. The code changes and complexity of the solutions were kept at a minimum and are as maintainable as possible. In order to do this, a refactoring of the existing work was done before adding features to the code base. Introducing explicit flags and simplifying code paths enables ZFS to handle the data as efficient as possible without neglecting the quality of the software. A correctness and performance analysis, using the test environment of ZFS, shows the efficiency of the implementation and reveals also tasks to do in future work.},
}