• National Institute for Computational Sciences is a UT/ORNL Partnership

Kraken

Why do I get module load errors for software that I used before the CLE 3.1 upgrade?

Default versions have changed for both Cray and 3rd party software, and some software versions are no longer available. Please check the availability and default versions of applications or libraries. You can also check available software with "module avail" on Kraken.

Why does my array job not work (i.e. #PBS -t or qsub -t)?

Array jobs on Kraken are no longer supported. The submission filter will reject jobs which make use of job arrays (i.e. #PBS -t or qsub -t). These jobs (if submitted) will not run and should be deleted.

Why does my submitted job die with strange shell errors?

The shell initiation line in PBS scripts is not guaranteed to be used to determine the interpreting shell. The default behavior is to use the user's default login shell or the value of the PBS option -S (i.e. #PBS -S /bin/bash or qsub -S /bin/bash). If you are using a shell for a PBS script which is different than your default shell, please use the PBS -S option.

What should I do in the event of a lustre slowdown?

In the event of a lustre slowdown, there are many things to consider as lustre has many working parts and is shared by all users on the system. NICS continually monitors lustre's performance and seeks to improve researcher's data communications. If you notice your code's I/O performance or the lustre filesystem is slower than usual, please answer the following questions to the best of your knowledge and email XSEDE Help Desk your answers.

  • When did you first notice the slowdown? How long did it last?
  • Which login node were you on?
  • Can you estimate the magnitude of the slowdown? (ex - "It took 2 min instead of 3 secs", "batch job exceeded walltime limit of 10 hours, but normally finishes in 8 hours")
  • What were you doing? Interactive command (like "ls")? Batch job?
  • For interactive commands:
    • Which host were you using?
    • Did you see the same behavior on other hosts?
    • Can you provide the exact command that was run and the directory in which it was run?
  • For batch jobs:
    • Can you supply the job IDs for jobs that were affected?
    • Can you provide any details about the IO pattern for your job?

Tags:

How do I enable the creation of a coredump file when a program crashes in the compute node?

In order to enable the creation of a coredump file when a program crashes in the compute node of a CRAY XT system like Kraken and Athena, the following command should be added to the job script before the aprun call:

Bourne shellulimit -c unlimited
C shelllimit coredumpsize unlimited

For example if using a Bourne like job scrip, the script will look like:

#PBS MY_PROJECT
#PBS -l size=12,walltime=00:05:00
#PBS -S /bin/bash

cd $PBS_O_WORKDIR

ulimit -c unlimited

aprun -n 4 ./helloWorld

In the previous example, if program 'helloworld' crashes (for example, due a segmentation fault), a coredump file named 'core' will be created in the same directory where the program is located.


Note: Using the compiler option '-g' at compile time, will add debugging information to the executable that will facilitate figuring out the location in the source code where the program crashed.

What common changes are needed to compile my programs on Kraken and Athena?

  • Replace all compiler commands (mpicc, mpif90, icc, ifort, pgCC, pgf90, etc) with the following: cc (C), CC (C++) or ftn (Fortran).
  • Remove all references to MPI libraries within the makefile.
  • Any references to libraries BLAS, LAPACK, BLACS, and ScalaPACK should be removed from your makefiles. The system will automatically link with the most highly optimized versions of these libraries. (For a complete list of libraries, enter: man libsci)
  • References to MKL can often be removed because their function is replaced by libsci.

Before you compile your code, load any relevant modules for third-party libraries. For example:

module load hdf5-parallel

The documentation will tell you how to use environment variables in your makefile. In the hdf5 example, this is documented in HDF5.

cc -o hdf5example.x hdf5example.c ${HDF5_CLIB}

There are two advantages to using the module with the environment variable instead of the pathname:

  1. If you change versions of hdf5, you only need to load a different module. The makefile does not have to be modified.
  2. If you change to a different compiler and then reload the hdf5 module, the system will load a version of hdf5 that is compiled with the other compiler.

For a list of libraries and other software available for Kraken and Athena, see NICS Software.

Why does xtnodestat fail?

The command xtnodestat can be used to see which jobs are currently running and which cabinets, nodes, and processors they are running on. When the system is busy and many jobs are starting or stopping, a read error may occur that produces the following error:

Error: xtnodestat cannot be run unless apstat is in PATH

This is a known bug that has been reported to Cray. The error will be fixed in a future software release. Until then, please retry the command until the appropriate output is returned.

When should I use the PBS option '-V'?

The -V option tells the batch system to remember all of your environmental variables. For example, if I want to set OMP_NUM_THREADS to 4 and then submit the job, I need this flag so that OMP_NUM_THREADS is still set in the batch script. You can use it as a flag such as qsub -V ... or in your batch script like:

#PBS -V

While this can be convenient, it is best practice not to use -V. Why?

  • It makes jobs more self contained. If the script itself must set all the environment variables it needs, the script can be shared between people without confusion. Additionally, when debugging an issue, it is clear from looking at the script what variables are set.
  • This option, when used often, can create additional load for the scheduler, and in rare cases cause a crash (particularly if used in jobs which resubmit themselves)

If you do use -V it is not a problem, and may be recommendable for something like an interactive job, but it is best not to include it in every job script as a matter of habit.

Can I use dynamic shared libraries on the Cray compute nodes?

Dynamic shared libraries are officially not supported by NICS on Kraken. The primary reason for this position is that Kraken is not configured to run DVS servers (separate from Lustre servers) which are required to support the Cray shared root file system necessary to support Cray’s dynamic shared libraries. Additionally, Kraken is one of the largest Cray XTs in the world and as such there is a strong concern that a large scale job (32 K cores or more) dynamically accessing one or more shared libraries simultaneously will cause extreme system slowdown or a crash (this is because the shared libraries must reside in Lustre and in doing so this puts additional load on our IO servers which are often fully loaded.) In the best case scenario, the use of dynamic linking will still cause a performance degradation for an application - the extent is very application dependent (see Johansen, CUG 2009). Dynamic shared libraries may be used on Kraken by placing the shared libraries in Lustre and setting the LD_LIBRARY_PATH appropriately; however, this activity is not officially supported by NICS and as such no guarantee of support will be provided for this mode of operation. We must caution that Lustre striping be set appropriately (stripe count of 1) and more importantly that the LD_LIBRARY_PATH be carefully set to the minimize the number of included paths and in the order of the most used directories first.

Can I use PVM on Kraken?

PVM is a communication interface for parallel programming. While it has been ported to a number of platforms, including some Cray platforms in the past, it has not been ported to the Cray XTs. Thus, we will not install or support PVM on Kraken or Athena. We do allow MPI and Global Arrays on Kraken, as well as pthreads within a single node (this includes OpenMP). It would also be possible to support Unified Parallel C or CoArray Fortran given sufficient demand.

How do I get information about my MPICH/Portals settings?

Cray's MPICH has a number of settings (changed using environment variables) that affect what algorithms are used, buffer space, etc. For a list of these variables and their default settings, you can set the following prior to calling aprun:

export MPICH_ENV_DISPLAY=1

This will print a single list, regardless of the number of MPI tasks. It is important to note that these may change based on the core count of the job. In particular, some settings, such as MPICH_UNEX_BUFFER_SIZE scale with the number of MPI tasks. In addition, a job run on a single node can use shared memory rather than Cray's portals for communication, therefore portal-related settings are undefined, and not displayed.

For more information about some of these settings, please see this workshop presentation or "man intro_mpi" contains a description of many of these settings. You can also find that information on Cray's documentation page (under "Introduction to MPI man pages").

How do I find out what macros are predefined by the compiler?

For Kraken consult the “Cray online documentation” (http://docs.cray.com).

For C, search for the Cray “C and C++ Reference Manual” and for Fortran, consult the “Cray Fortran Compiler Commands and Directives Reference Manual”.

Can a user login directly to a compute node?

No, users cannot login directly to a compute node, but by submitting an interactive batch job, users can get access to an aprun node, from where they can execute commands as if they were directly executing them on a compute node. For more information on how to run interactive batch jobs, please view the information found at Interactive Batch Jobs

How do I get performance counter data for my program?

Use the following process:

  1. Use module load xt-craypat.
  2. Compile code.
    • If Fortran90 with modules, compile with -Mprof=func.

What profiling tools are available?

At least three profiling tools are available on Kraken.

  1. CrayPat is provided by Cray. Follow this link for more information.
  2. fpmpi is an unsupported product that can provide a very concise profile of MPI routines in an application. To use it, simply load the fpmpi (or fpmpi_papi) module and relink. Then rerun your application. There are a few environment variables to control profiling output:
    • MPI_PROFILE_DISABLE : Disables statistic collection until fpmpi_enable is called (#include fpmpi.h).
    • MPI_PROFILE_SUMMARY : Setting disables creation of individual MPI process statistics files. Should set this when running with 1000s of processes.
    • MPI_PROFILE_FILE : Name of process statistic file; default is profile.txt.
    • MPI_HWPC_COUNTERS : List of events or event set number as in libhwpc.
  3. A third tool that is unsupported is TAU. TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. Basic profiling with TAU can be done in the following steps:
    1. Load the tau module: module load tau
    2. Set the environmental variable TAU_MAKEFILE: In tcsh, setenv TAU_MAKEFILE $(TAUROOT)/lib/Makefile.tau-mpi-pdt
    3. Compile code with the tau wrappers (which should be in your path), tau_f90.sh, tau_cc.sh, or tau_cxx.sh.
    4. You will get a regular executable. Submit your job as usual.
    5. After execution, there should be a profile.xxx text file.

TAU can also do MPI profiling and collect hardware performance counter data.

Syndicate content