This guide is geared toward experienced users, and it gives a overview of how to use the Kraken system. If you need more information, see the full user guide.
- Kraken System
- Getting an Allocation
- File Systems and Storage
- Data Transfer
- User Environment
- Running Jobs
- Debugging and Optimization
- Other Documentation
Kraken is a Cray XT5 system with 9,408 compute nodes. Each node has:
- two 2.6 GHz six-core AMD Opteron processors (Istanbul)
- 12 cores (system total is 112,896 compute cores)
- 16 GB of memory (system total is 129 TB of compute memory)
Compute nodes run Cray Linux Environment (CLE) 3.1. Each node is connected to a Cray SeaStar router through HyperTransport, and the SeaStars are all interconnected in a 3-D-torus topology.
Details on the eligibility requirements for principal investigators, the application process and the proposal deadlines are located on the XSEDE Allocations page.
If you have mailed in the notarized form and received email confirmation that your RSA One Time Password (OTP) token has been activated, use ssh to connect to
login.kraken.nics.tennessee.edu. Otherwise, you can use GSI authentication to log in to
Home directories have a 2 GB quota. These directories are on a Network File System (NFS), and are generally available by logging in to
login.nics.utk.edu even if Kraken is down. However, home directories are not accessible from the compute nodes. In a batch job, the current directory, input and output files must be on Lustre. Otherwise, the job will fail with an error message saying
no such file or directory.
Each user has a scratch directory in /lustre/scratch/$USER which is subject to purging based on the time of last access. Lustre is a highly scalable cluster file system that allows you to increase striping width to improve I/O performance. Programs launched with
aprun can only access Lustre, not home directories or other directories.
Files may be stored on the mass storage system (HPSS) using the hsi command. You may only access HPSS if you have activated your RSA OTP token. You may access HPSS within a batch job, but only from jobs in the "hpss" queue that are submitted from the RSA SecurID login nodes (login.kraken.nics.tennessee.edu).
Kraken uses the
module command (softenv is not available) to select libraries, compilers and other software that are not already in the default user environment.
Users can change their login shell via the NICS User Portal.
By default, Kraken uses the Portland Group (PGI) compilers. GNU and Intel compilers are available via modules. To compile your MPI program, use the
ftn(Fortran) commands. These commands create executables to run on the compute nodes. Other MPI compiler wrappers such as
mpi90are not available on Kraken. The Cray scientific libraries (BLAS, LAPACK, ScaLAPACK, BLACS, and others) are in the default library path. Use the module command for access to libraries such as HDF, netCDF, FFTW, and PETSc.
The login nodes are not intended for computing, but if you need to run short pre- or post-processing programs on them, you can call the compilers directly: for PGI compilers, use
Kraken uses PBS/Torque (qsub, qdel, qstat, etc) for batch jobs. To specify the number cores allocated to a batch job, use "
#PBS -l size=cores" in the batch script. Since the batch system allocates in units of 12-core nodes, the number of cores must be a multiple of 12.
To specify an account to charge, use "
#PBS -A account". The
showusage command lists your valid accounts/projects.
Parallel programs may only be run within a batch job with
aprun -n nprocks executable;
mpirun is not available on Kraken. By default, it runs 12 MPI processes on each dual-socket node (one per core). To run with fewer MPI processes per socket, use the -S option. For example, "aprun -n nprocs -S 4 executable" runs with 4 MPI processes per socket (8 per node).
Kraken has a license for running TotalView on up to 64 processors. CrayPAT (Cray Performance Analysis Tools) is useful for profiling and collecting hardware performance data to determine bottlenecks in performance.