This page will give some examples for debugging C programs and debugging distributed C programs.
GDB is the GNU project debugger.
To debug a program that is compiled by the GCC, it is necessary to pass the -ggdb
option to the compiler.
It is recommended to use unoptimised code, i.e. replace the -O<X>
flag with -O0
, otherwise the compiler might remove variables and optimize the order.
Debugging a program and passing the command line arguments to GDB:
gdb --args ./program args
To stop the program at certain line use:
> b <line_number>
Now you can run the program:
> run
There is a way to look up some code lines around your breakpoint:
> list
Use
> print <var name to print>
to look up the value of a variable.
You can step in a function by step
(or just s
) or go to the next line by next
(or just n
)
With the command backtrace
or bt
you can get a summary of how your program got to there where it is now.
Print q
to quit gdb.
For more information see man gdb
Valgrind contains some very powerful and helpful tools for debugging, like memcheck
, callgrind
, cachegrind
and some more.
memcheck
can detect and warn about using of uninitialized memory, reading/writing off the end of malloc'd blocks, illegal reading/writing memory after it has been freed and various memory leaks.
To use memcheck
, you have to set the valgrind tool:
valgrind --tool=memcheck ./program
cachegrind
is a cache profiler, which identifies cache misses in the CPU.
To use cacheacheck
, you have to set the valgrind tool:
valgrind --tool=cachecheck ./program
Warning: be careful by using any valgrind tools. The program could run up to many times slower. Chose appropriate parameters.
An example session is like:
$ spack load -r mpi scorep $ scorep mpicc -g mpi-speedup.c -o mpi-speedup # Score-P environment variables $ export SCOREP_ENABLE_TRACING=TRUE # enable tracing $ export SCOREP_TOTAL_MEMORY=10000000 # trace buffer size # export SCOREP_METRIC_PAPI=PAPI_FP_OPS $ mpiexec -np 2 ./mpi-speedup
This will trace MPI calls and allow to distinguish application from communication calls. To understand internal behavior one has to use:
$ scorep --pdt mpicc -g mpi-speedup.c -o mpi-speedup
ScoreP comes with a few command line tools to explore the performance, for example, scorep-score:
$ scorep-score scorep-20170509_1259_6935968279002828/profile.cubex Estimated aggregate size of event trace: 2277 bytes Estimated requirements for largest trace buffer (max_buf): 1139 bytes Estimated memory requirements (SCOREP_TOTAL_MEMORY): 4097kB (hint: When tracing set SCOREP_TOTAL_MEMORY=4097kB to avoid intermediate flushes or reduce requirements using USR regions filters.) flt type max_buf[B] visits time[s] time[%] time/visit[us] region ALL 1,138 52 10.24 100.0 196829.81 ALL MPI 852 30 0.03 0.3 1046.47 MPI USR 260 20 0.00 0.0 1.32 USR COM 26 2 10.20 99.7 5101864.89 COM
Additionally, the tool cube can be used to explore profiles and vampir can be used to investigate traces. Both are graphical tools – require X11 forwarding.
Cube is a GUI program to analyze profiles of parallel applications created using ScoreP.
Example:
$ spack load -r cube $ TODO GUI ???
Further information: http://www.scalasca.org/software/cube-4.x/documentation.html
Vampir is a commercial tool to analyze traces of parallel applications created by ScoreP.
Example:
$ module load vampir $ vampir scorep-20170509_1259_6935968279002828/traces.otf2
Further information: Slides
Amongst others, Likwid toolsuite allows to retrieve hardware counter information for runs.
Example:
$ spack load -r likwid $ salloc -N 1 -p west $ srun likwid-perfctr -a # show available counters $ srun likwid-perfctr -C 0 -g MEM ./hello-world-mpi # pin the application onto one core and measure the memory group -------------------------------------------------------------------------------- CPU name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPU type: Intel Core Westmere processor CPU clock: 2.67 GHz -------------------------------------------------------------------------------- Hello world from process 0 of 1 -------------------------------------------------------------------------------- Group 1: MEM +--------------------------------+---------+---------+ | Event | Counter | Core 0 | +--------------------------------+---------+---------+ | INSTR_RETIRED_ANY | FIXC0 | 3015861 | | CPU_CLK_UNHALTED_CORE | FIXC1 | 3447668 | | CPU_CLK_UNHALTED_REF | FIXC2 | 4552580 | | UNC_QMC_NORMAL_READS_ANY | UPMC0 | 64118 | | UNC_QMC_WRITES_FULL_ANY | UPMC1 | 52604 | | UNC_QHL_REQUESTS_REMOTE_READS | UPMC2 | 673 | | UNC_QHL_REQUESTS_REMOTE_WRITES | UPMC3 | 4 | +--------------------------------+---------+---------+ +------------------------------------------+--------------+ | Metric | Core 0 | +------------------------------------------+--------------+ | Runtime (RDTSC) [s] | 1.0135 | | Runtime unhalted [s] | 0.0013 | | Clock [MHz] | 2019.5326 | | CPI | 1.1432 | | Memory read bandwidth [MBytes/s] | 4.0491 | | Memory data volume [GBytes] | 0.0041 | | Memory write bandwidth [MBytes/s] | 3.3219 | | Memory data volume [GBytes] | 0.0034 | | Memory bandwidth [MBytes/s] | 7.3710 | | Memory data volume [GBytes] | 0.0075 | | Remote memory read bandwidth [MBytes/s] | 0.0425 | | Remote memory read data volume [GBytes] | 4.307200e-05 | | Remote memory write bandwidth [MBytes/s] | 0.0003 | | Remote memory write data volume [GBytes] | 2.560000e-07 | | Remote memory bandwidth [MBytes/s] | 0.0428 | | Remote memory data volume [GBytes] | 4.332800e-05 | +------------------------------------------+--------------+
Der Debugger DDT (Distributed Debugging Tool) ist geeignet zum parallelen Debugging und ist auf dem Cluster verfügbar. Über das graphische User Interface läßt sich das Tool einfach und intuitiv nutzen. Dazu ist X11 forwarding nötig.
DDT kann danach mit
ddt
oder gleich mit Angabe der zu untersuchenden Applikation
ddt ./Applikation
gestartet werden.
Die Applikation muss Debug-Informationen enthalten bzw. mit Debug-Informationen kompiliert sein, üblicherweise mit der -g
-Option.
cc -g
Wichtiger Hinweis: Damit ddt den Source-Code korrekt anzeigen kann, muss dem Compiler die Verwendung des DWARF4 Debug Formats aufgezwungen werden. Dies erreichen Sie, indem Sie als Flag -gdwarf-4
übergeben.