Darter: How do I use the code bisection method to find a bug?

While using tools is a preferable method of debugging to simply using print statements, sometimes the latter option is the only method to find the bug. In this case, the most effective way to isolate the error in your code is through the method of bisection, which is an iterative process for tracing the program manually.

Step 1: In the main routine of your code, comment out the second half of the code (or approximately the second half).

Step 2: Compile and run the code. Did it crash as before?

Darter: How do I use Cray ATP to determine where and why a code died abnormally?

Sometimes a code will work fine in many cases and circumstances but there will be a bug which only rears its head when a certain perfect storm of case and job size occurs. This causes the code to die in a strange spot and it is not obvious exactly why or where. In cases like this, Cray's ATP (Abnormal Termination Processing) can likely help!

Simply do

Darter: How to determine memory usage on the compute node

In order to determine memory usage for a given process on a compute node, one would normally simply issue the command "top" and look at the memory usage of the process in question. However, this cannot be done on a Darter compute node, since they are not accessible to the user. Also, OOM (Out of Memory) errors often occur even when a problem has been discretized finely enough but memory leaks in the code occur in the worst case scenario, causing the program to crash.

Darter: How do I enable the creation of a coredump file when a program crashes in the compute node?

In order to enable the creation of a coredump file when a program crashes in the compute node of a CRAY system like Darter, the following command should be added to the job script before the aprun call:

Bourne shellulimit -c unlimited
C shelllimit coredumpsize unlimited


For example if using a Bourne like job scrip, the script will look like:

