author	 = {Florian Ehmke},
	title	 = {{Adaptive Compression for the Zettabyte File System}},
	advisors	 = {Michael Kuhn},
	year	 = {2015},
	month	 = {02},
	school	 = {Universität Hamburg},
	type	 = {Master's Thesis},
	abstract	 = {Although many file systems nowadays support compression, lots of data is still written to disks uncompressed. The reason for this is the overhead created when compressing the data, a CPU-intensive task. Storing uncompressed data is expensive as it requires more disks which have to be purchased and subsequently consume more energy. Recent advances in compression algorithms yielded compression algorithms that meet all requirements for a compression-by-default scenario (LZ4, LZJB). The new algorithms are so fast, that it is indeed faster to compress-and-write than to just write data uncompressed. However, algorithms such as gzip still yield much higher compression ratios at the cost of a higher overhead. In many use cases the compression speed is not as important as saving disk space. On an archive used for backups the (de-)compression speed does not matter as much as in a folder where some calculation stores intermediate results which will be used again in the next iteration of the calculation. Furthermore, algorithms may perform differently when compressing different data. The perfect solution would know what the user wants and choose the best algorithm for every file individually. The Zettabyte File System (ZFS) is a modern file system with built-in compression support. It supports four different compression algorithms by default (LZ4, LZJB, gzip and ZLE). ZFS already offers some flexibility regarding compression as different algorithms can be selected for different datasets (mountable, nested file systems). The major purpose of this thesis is to demonstrate how adaptive compression in the file system can be used to benefit from strong compression algorithms like gzip while avoiding, if possible, the performance penalties it brings along. Therefore, in the course of this thesis ZFS's compression capabilities will be extended to allow more flexibility when selecting a compression algorithm. The user will be able to choose a use case for a dataset such as archive, performance or energy. In addition to that two features will be implemented. The first feature will allow the user to select a compression algorithm for a specific file type and use case. File types will be identified by the extension of the file name. The second feature will regularly test blocks for compressibility with different algorithms. The winning algorithm of that test will be used until the next test is scheduled. Depending on the selected use case, parameters during the tests are weighted differently.},