Publication details

Adaptive Selection of Lossy Compression Algorithms Using Machine Learning (Armin Schaare), Bachelor's Thesis, School: Universität Hamburg, 2016-11-29
Publication details

Abstract

This goal of this thesis was to evaluate machine learning model's ability for their use as an automatic decision feature for compression algorithms. Their task would be to predict which compression algorithms perform best on what kind of data. For this, artificially generated data, itself, and its compression was analyzed, producing a benchmark of different features, upon which machine learning models could be trained. The models' goal was to predict the compression and decompression throughput of algorithms Additionally, models had to correctly attribute data to the algorithm producing the best compression ratios. Machine learning approaches under consideration were Linear Models, Decision Trees and the trivial Mean Value Model as a comparison baseline. It was found, that Decision Trees performed significantly better than Linear Models which in turn were slightly better than the Mean Value approach. Nevertheless, even Decision Trees did not produce a satisfying result which could be reliably used for practical applications.

BibTeX

@misc{ASOLCAUMLS16,
	author	 = {Armin Schaare},
	title	 = {{Adaptive Selection of Lossy Compression Algorithms Using Machine Learning}},
	advisors	 = {Julian Kunkel and Anastasiia Novikova},
	year	 = {2016},
	month	 = {11},
	school	 = {Universität Hamburg},
	howpublished	 = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:armin_schaare_adaptive_selection_of_lossy_compression_algorithms_using_machine_learning.pdf}}},
	type	 = {Bachelor's Thesis},
	abstract	 = {This goal of this thesis was to evaluate machine learning model's ability for their use as an automatic decision feature for compression algorithms. Their task would be to predict which compression algorithms perform best on what kind of data. For this, artificially generated data, itself, and its compression was analyzed, producing a benchmark of different features, upon which machine learning models could be trained. The models' goal was to predict the compression and decompression throughput of algorithms Additionally, models had to correctly attribute data to the algorithm producing the best compression ratios. Machine learning approaches under consideration were Linear Models, Decision Trees and the trivial Mean Value Model as a comparison baseline. It was found, that Decision Trees performed significantly better than Linear Models which in turn were slightly better than the Mean Value approach. Nevertheless, even Decision Trees did not produce a satisfying result which could be reliably used for practical applications.},
}