Description
The TotalView debugger is a tool that lets you debug, analyze, and tune the performance of complex serial, multiprocessor, and multithreaded programs.
The Kraken system has a license for 65 processes for all simultaneous TotalView sessions. Since the aprun launch process counts as one, a single user is limited to 64 MPI processes even if no one else is using TotalView.
TotalView can be executed two ways:
- as a GUI,
totalviewand - from command line interface,
totalviewcli.
Because the TotalView GUI is an X-Window application, your system must be set up to allow and run X11 traffic. It is best to tunnel the X11 traffic through the SSH connection. See Kraken Access for information about setting this up.
For more information please see the TotalView documentation page.
Use
Kraken
On Kraken, the Totalview module is loaded by default (xt-totalview).
Examine a core file
If you are debugging an existing core file and do not need to run a parallel process, you can launch TotalView with the following command:
$ totalview a.out core
Launch an executable
If you want to use TotalView to start your job and monitor it as it runs, you must take some additional steps. Because parallel jobs cannot be launched directly from the login nodes, you will need to launch TotalView from within a batch job. The easiest way to do this is to start an interactive batch job as follows:
$ qsub -l size=16,walltime=1:00:00 -I -V -X -A project_identifier
Make sure to use -X as an option to qsub in order to enable X11 forwarding during the interactive session.
Once your job starts, you can launch TotalView from the command line. The following example starts TotalView on 16 compute cores:
$ totalview aprun -a -n 16 a.out
The -a option to TotalView is necessary and tells TotalView that all arguments appearing after this belong to the executable which is being launched. (In most cases this will be aprun as in the above example.)
After you run the totalview command, two windows should appear. In the larger window, you will see the assembly code for aprun. Type “G” (capital “G”) in this window to cause all processes to “Go.” TotalView will run for a few seconds and then ask if you~Rd like to stop your processes before entering “MAIN.” Answer “yes” to stop your program at the beginning so you can add breakpoints, etc., before running.
Connecting to a running job can be done from the job’s aprun node as follows:
- Determine the job’s
aprunnode usingqstat -f <jobid> | grep exec_host. - Log into the
aprunfrom a Kraken login node usingssh -X aprun##. - From the
aprunnode, start TotalView and connect to theaprunprocess.

