Once you have received an allocation, and logged in to Darter, the next step should be familiarize yourself with the system and how it is used. More specific information about particular tasks may be found via the Darter Quick Start Guide. This page aims to guide you through the process of downloading, compiling, and running a simple program. Take this opportunity to experiment with the various options. Remember that you can find documentation on most commands by typing man <command>.
The example uses OpenMP and MPI, and prints each node, core, MPI rank, and thread ID, which can be useful for understanding how nodes are laid out. If you are not interested in using OpenMP, it is simple to compile without OpenMP and check the behavior of MPI-only applications.
Getting the files
This tutorial requires three files:
- Source code for a hybrid "Hello World" program in C, courtesy of Cray's User Guide (page 119).
- A simple makefile to build the source
- A PBS batch file which submits the job
There is also a compressed version of those files. You could upload these files to Darter using SFTP/SCP. From another source, it may be convenient to get your software with a program like Git. Since these files are online, the easiest way to get them is
wget. From a node on Darter, type:
% wget http://www.nics.utk.edu/\ /sites/default/files/tutorials/HybridHello/HybridHello.tar.gz
Extract the contents with:
% tar zxvf HybridHello.tar.gz
z" flag tells
tar to uncompress the archive with
Compiling by hand
To compile the source directly using the compiler wrappers, you might use the following command:
cc -o HybridHello.x HybridHello.c
Note that if you are using the GNU compiler, you would use "
-fopenmp" and for the Intel compiler, "
Using the make script
This makefile was written for a Cray XC30 architecture, so it should work by default with Cray compilers, and with a minor change with GNU. In general, makefiles (or a configure script to generate a makefile) will not work without some extra direction, pointing to the Cray compiler wrappers at least (
- Verify that
makewill use the Cray compiler wrappers
- Verify that
maketo use the correct OpenMP flags,"
-fopenmp" for GNU.
From the directory containing the makefile, type "
make", and it should compile your code.
- Verify that
Real programs generally have more advanced installers, but almost always need to be told to use the Cray compiler wrappers. The other common issue with compiling is finding the right libraries. Many libraries are available as modules (see available software). They are automatically linked to when the necessary modules are loaded (unless otherwise noted).
Running on Darter
You should edit
HH.pbs to use your account (the
#PBS -A line). Once that is written, you can submit the job as follows:
% qsub HH.pbs
There will be a confirmation that your job was submitted, and you should be able to see it on the queue for a brief period:
It is rare for every node to be in use, so a (short) single node job
generally starts right away. It should only take a few seconds before
you get an output file, "
There are a few common options that would be good to play with until you have them figured out. You may want to refer to our documentation on running jobs.
There are quite a few PBS options you may want to use. For example, you may want to receive an email when a job fails, or set up jobs so they only run after another job has finished (in the mean time, they are in a "held" state, which does not count towards a jobs wait time).
- Files which are read or written by compute jobs must be on Lustre
- The job attempts to start in the directory they were launched from (via
aprun), so that directory must also be on Lustre
The executable as well as standard input and output from a job are handled by the job launcher, so these files can be anywhere.
HH.pbsshould take care of all of these issues. It changes to the user's directory in Lustre, and calls the executable assuming that it is in the original directory from which the job was submitted. The output goes into that same directory as well, which need not be on Lustre. Try changing the script and moving files around to see what breaks, and what error messages you get.
HH.pbs requests 16 cores, sets the number of OpenMP threads (
OMP_NUM_THREADS) to 2. It automatically figures out how many MPI tasks to spawn to fill the reservation, (calling this "
N_MPI"), and launches the job with
aprun(it does no error checking, so if you ask it to do something that doesn't make sense, such as use more threads per MPI task than there are cores per node, it will blindly try.)
You are welcome to try some of the other options with
aprun, see the web documentation or "
man aprun" for more information.