Publication details

Exploratory Exploitation of Heterogeneous High Performance Architectures using OpenMP in Large Scale Microphysics Applications (Yannik Koenneker), Master's Thesis, School: Universität Hamburg, 2024-09-16
Publication details

Abstract

The development of modern simulations, exemplified by ICON, is a lengthy process that spans numerous years. Over time, the system architectures for which these simulations were initially developed and optimized for undergo significant modifications. This necessitates that programmers adapt to new hardware in order to fully leverage the potential performance gains. The process of porting existing code to new systems and architectures, as well as maintaining them, is inherently time-consuming and therefore costly. First attempts at porting ICON to LUMI, an AMD system, saw significant hurdles because of missing compiler support. In order to reduce costs in the future, parts of the ICON model have been translated to C++ in order to evaluate different programming models that offer performance portability. This thesis is concerned with the topic of OpenMP offloading to GPUs. We implemented a series of optimizations with the objective of enhancing performance and subjected each to a detailed analysis to ascertain the extent of the improvement. The presented work analyzes different optimization approaches using OpenMP. Our findings indicated that the most effective approach was to reduce the number of required registers per thread. It is noteworthy that the implemented changes had a greater impact on smaller problem sizes than on larger ones. This work offers insight into the optimization strategies that could be adapted to other algorithms, as well as a method of enhancing existing algorithms by a factor of anywhere between 1.87 and 39.34, depending on the configuration.

BibTeX

@mastersthesis{EEOHHPAUOI24,
	author	 = {Yannik Koenneker},
	title	 = {{Exploratory Exploitation of Heterogeneous High Performance Architectures using OpenMP in Large Scale Microphysics Applications}},
	advisors	 = {Georgiana Mania},
	year	 = {2024},
	month	 = {09},
	school	 = {Universität Hamburg},
	howpublished	 = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:yannik_koenneker_exploratory_exploitation_of_heterogeneous_high_performance_architectures_using_openmp_in_large_scale_microphysics_applications.pdf}}},
	type	 = {Master's Thesis},
	abstract	 = {The development of modern simulations, exemplified by ICON, is a lengthy process that spans numerous years. Over time, the system architectures for which these simulations were initially developed and optimized for undergo significant modifications. This necessitates that programmers adapt to new hardware in order to fully leverage the potential performance gains. The process of porting existing code to new systems and architectures, as well as maintaining them, is inherently time-consuming and therefore costly. First attempts at porting ICON to LUMI, an AMD system, saw significant hurdles because of missing compiler support. In order to reduce costs in the future, parts of the ICON model have been translated to C++ in order to evaluate different programming models that offer performance portability. This thesis is concerned with the topic of OpenMP offloading to GPUs. We implemented a series of optimizations with the objective of enhancing performance and subjected each to a detailed analysis to ascertain the extent of the improvement. The presented work analyzes different optimization approaches using OpenMP. Our findings indicated that the most effective approach was to reduce the number of required registers per thread. It is noteworthy that the implemented changes had a greater impact on smaller problem sizes than on larger ones. This work offers insight into the optimization strategies that could be adapted to other algorithms, as well as a method of enhancing existing algorithms by a factor of anywhere between 1.87 and 39.34, depending on the configuration.},
}