Publication details
- High Performance Shallow Water Kernels for Parallel Overland Flow Simulations Based on FullSWOF2D (Roland Wittmann, Hans-Joachim Bungartz, Philipp Neumann), In Computers and Mathematics with Applications, Series: 74(1), pp. 110–125, (Editors: Jose Galan-Garcia), Elsevier, ISSN: 0898-1221, 2017
Publication details – DOI
Abstract
We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.
BibTeX
@article{HPSWKFPOFS17, author = {Roland Wittmann and Hans-Joachim Bungartz and Philipp Neumann}, title = {{High Performance Shallow Water Kernels for Parallel Overland Flow Simulations Based on FullSWOF2D}}, year = {2017}, editor = {Jose Galan-Garcia}, publisher = {Elsevier}, journal = {Computers and Mathematics with Applications}, series = {74(1)}, pages = {110--125}, issn = {0898-1221}, doi = {http://dx.doi.org/10.1016/j.camwa.2017.01.005}, abstract = {We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.}, }