author	 = {Christopher Bartz},
	title	 = {{An in-depth analysis of parallel high level I/O interfaces using HDF5 and NetCDF-4}},
	advisors	 = {Konstantinos Chasapis and Michael Kuhn and Petra Nerge},
	year	 = {2014},
	month	 = {04},
	school	 = {Universität Hamburg},
	howpublished	 = {{Online \url{}}},
	type	 = {Master's Thesis},
	abstract	 = {Scientific applications store data in various formats. HDF5 and NetCDF-4 are data formats which are widely used in the scientific community. They are surrounded by high-level I/O interfaces which provide retrieval and manipulation of data. The parallel execution of applications is a key factor regarding the performance. Previous evaluations have shown that high-level I/O interfaces such as NetCDF-4 and HDF5 can exhibit suboptimal I/O performance depending on the application's access patterns. In this thesis we investigate how the parallel versions of the HDF5 and NetCDF-4 interfaces are behaving when using Lustre as underlying parallel file system. The I/O is performed in a layered manner: NetCDF-4 uses HDF5 and HDF5 uses MPI-IO which itself uses POSIX to perform the I/O. To discover inefficiencies and bottlenecks, we analyse the complete I/O path while using different access patterns and I/O configurations. We use IOR for our analysis. IOR is a configurable benchmark that generates I/O patterns
and is well known in the parallel I/O community. In this thesis we modify IOR in order to fulfil our needs for analysis purposes. We distinguish between two general access patterns for our evaluation: disjoint and interleaved. Disjoint means that each process accesses a contiguous region in the file, whereas interleaved is an access to a non-contiguous region. The results show that neither the disjoint nor the interleaved access outperforms the other in every case. But when using the interleaved access in a certain configuration, results near the theoretical maximum are realised. We provide best practices for choosing the right I/O configuration depending on the need of application in the last chapter. The NetCDF-4 interface does not provide the feature to align the data section to particular address boundaries. This is a significant disadvantage regarding the performance. We provide an implementation and reevaluation for this feature and observe perspicuous performance improvement. When using NetCDF-4 or
HDF5, the data can be broken into pieces called chunks which are stored in independent locations in the file. We present and evaluate an optimised implementation for determining the default chunk size in the NetCDF-4 interface. Beyond that, we reveal an error in the NetCDF-4 implementation and provide the correct solution.},