The National Institute for Computational Sciences

Darter

Darter: Why do I get 'out of space' error when transferring files from HPSS to Darter?

Your file transfer has caused a Lustre storage server (OST) to become full, resulting in an error like:

ead_cond_timedwait() return error 22, errno=0 OUT OF SPACE condition detected while writing local file

This usually happens because the stripe count is too small (often 1). To solve this issue, remove the partially transferred file and change the stripe count of the directory before transferring the file. To change the stripe count of the directory, first cd to that directory. Second, type the following command:

Darter: How do I enable the creation of a coredump file when a program crashes in the compute node?

In order to enable the creation of a coredump file when a program crashes in the compute node of a CRAY system like Darter, the following command should be added to the job script before the aprun call:

Bourne shellulimit -c unlimited
C shelllimit coredumpsize unlimited

 

For example if using a Bourne like job scrip, the script will look like:

Darter: How do I get information about my MPICH/Portals settings?

Cray's MPICH has a number of settings (changed using environment variables) that affect what algorithms are used, buffer space, etc. For a list of these variables and their default settings, you can set the following prior to calling aprun:

export MPICH_ENV_DISPLAY=1

This causes rank 0 to display all MPICH environment variables and their current settings at MPI initialization time. If two or more nodes are used, MPICH/GNI environment settings are also included in the listing.

Pages

Subscribe to RSS - Darter