Beacon User Guide
Beacon is an energy efficient cluster that utilizes Intel® Xeon Phi™ coprocessors. It is funded by NSF through the Beacon project to port and optimize scientific codes to the coprocessors based on Intel's Many Integrated Core (MIC) architecture.
The Beacon system offers access to the following:
Each compute node is equipped with:
Each I/O node provides:
Overall, Beacon provides 768 conventional cores and 11,520 accelerator cores that provide over 210 TFLOP/s of combined computational performance, 12 TB of system memory, 1.5 TB of coprocessor memory, and over 73 TB of SSD storage, in aggregate.
Beacon has access to three file systems: NFS, Lustre, and Local SSD.NFS
NFS home directories have a 2 GB quota. The path to this directory is:
nics/[a-e]/home/$USERThis filesystem is only available on the login node and the compute nodes. Lustre
Each user has a Lustre scratch directory in
/lustre/medusa/$USERThis filesystem is available to the login node, compute nodes, and all coprocessors. There is no quota limit placed on Lustre; however, files older than 30 days are eligible to become purged: it is recommended to move your data once the calculation has completed. Any attempt to circumvent this purge policy may lead to deactivation of your account. Local SSD
A root directory on the local SSD scratch space contains folders named mic0, mic1, etc., and is mounted by the compute nodes. The coproccesors on the compute nodes mount their respective mic# folder. These unique directories have an absolute path determined by the job id assigned by the scheduler, and can be accessed through the environment variable TMPDIR.
Given the speed of the SSD drives, using $TMPDIR is preferable to using the Lustre scratch space. The directories bin and lib are automatically created in each mic# folder for the user's convenience. When a native mode application is finished running, all output should be copied out of $TMPDIR/mic# to either the NFS home or Lustre scratch file system.
Logging into Beacon - OTP
SSH access to the login nodes requires the use of a One Time Password (OTP) token. NICS requires the notarized NICS Token Activation form returned to them before activating the OTP. Once your OTP token has been enabled you will receive an email with instructions to set your Personal Identification Number (PIN). To set up your OTP please visit Setting up your OTP.
Once you have set your PIN, you may log in using your OTP token to ssh to the system. In the example below, userid would be replaced by your NICS username. Users are prompted for their OTP token by the PASSCODE prompt. The PASSCODE is made up of your PIN, followed by the number displayed on the OTP token (see picture). For example, if your pin is 1234 and the token code is 159759, enter 1234159759.
% ssh firstname.lastname@example.org Enter PASSCODE:
Accounts that are not used for a period of three consecutive months are disabled. If you believe your account has been disabled for inactivity please submit a request to email@example.com or you may call the helpline directly at 865-241-1504.
Access is granted to Beacon using
ssh. In order to
ssh to Beacon you must use a one time password (OTP) token. New users will receive a NICS Token Activation form via email. The user must return this notarized form and then the OTP token will be mailed to the user via snail mail. Only then can you login via ssh to Beacon:
Accessing the MICSMC software
The micsmc software, used for calculating CPU/Threading usage on thePhi/MIC cards is installed on Beacon and can only be used on the compute nodes after forwarding X. In order to do this, you must first connect to Beacon using ssh with X forwarding enabled:
ssh -X firstname.lastname@example.org
Next, enable X forwarding again through the queueing system by issuing the following command:
qsub -X -I -A PROJECT_ACCOUNT
Once connected to a compute node, you may issue the following command to bring up the micsmc GUI:
[username@beacon### ~]$ /opt/intel/mic/bin/micsmc &
The status panel will then be launched in the background, allowing you to observe the real time utilization of the Xeon Phis.
- All code runs directly on the coprocessors
- Any libraries used will need to be recompiled for native mode usage
- Use the compiler flag '-mmic' to compile for native mode
- Code starts running on CPU host
- Parallel regions of code can be manually specified to run on the coprocessors using pragmas/directives
- Data is either copied explicitly to the coprocessors or implicitly (used for complex data types involving pointers, only available in C++)
- Automatic Offload (AO) is available for certain Intel Math Kernel Library (MKL) functions: ?GEMM, ?TRSM, ? PORTF, ?GEQRF, and ?GETRF
Current Intel recommended approach for reading files from within an offload section
- Files can be read from the NFS mount of /global, but only in read only mode
- Files can also be read directly from the MIC's internal memory
- To transfer a file directly to a MIC's internal memory use micscp file beaconXXX-micX:/User or /tmp
- Be sure to reset the permissions on the file and directory ($TMPDIR or /tmp or /User) such that 'other' has read/write/execute permissions (all I/O in offload region is executed as 'micuser')
- Pass in the file pointer to the offload region in a "nocopy" clause
- Perform open, close, and read operations using the file pointer
- Use the absolute path as the argument to fopen()
- Remember to copy any output files off of the MICs before exiting a job
- Files cannot be read from /lustre/medusa
- $HOME is not mounted on the MICs
Heterogeneous MPI Tasks on Hosts With Offload To Cards (Could Include OpenMP)
- Run MPI tasks as you would for Native Mode on the Host
- Use MIC_ENV_PREFIX=MIC_ and MIC_OMP_NUM_THREADS=N to have all cards offloaded to use same number of threads
- Use omp_set_num_threads in the code and have it logically tied to the device type or number for different numbers of threads for each different offload
- This is an important time to set MIC_KMP_AFFINITY, especially if offloading to multiple cards from the same host: http://software.intel.com/en-us/articles/openmp-thread-affinity-control. Also, the pseudocode snippet below will assist with offloading to multiple devices from one host:
#pragma omp parallel
#pragma omp single
#pragma omp task
#pragma omp target(mic) OR #pragma offload target(mic)
<various serial code>
#pragma omp parallel for
for (int i=0; i < limit; i++)
<parallel loop body>
#pragma omp task
<host code or another offload>
Heterogeneous MPI Tasks on Hosts or Cards With OpenMP (No Offload)
- A user is running MPI ranks on the hosts and or the MICs with OpenMP pragmas to be invoked directly from each MPI task (without offload pragmas)
- Use omp_set_num_threads in the code and have it logically tied to the device type or number --OR--
- micmpiexec -n 1 -host beaconXXX -env OMP_NUM_THREADS=N1; -n 1 -host beaconXXX-mic0 -wdir $DIR -env OMP_NUM_THREADS=N2; etc.
- This passes the appropriate number of OMP threads to each MIC/host through its own local environment variable (-env versus the global -genv)
Heterogeneous MPI Tasks on Hosts and Cards With Machine File
- A user is running MPI ranks on the hosts and the MICs with an executable named test and test.MIC
- Set environment variable I_MPI_MIC=1 and I_MPI_MIC_POSTFIX=.MIC
- Use command: generate-mic-hostlist hybrid X Y > machines, where X is the number of Xeon ranks ON EACH NODE REQUESTED and Y is the number of Xeon Phi ranks ON EACH PHI ON EACH NODE REQUESTED.
- micmpiexec -n NNODES*X+4*NNODES*Y -machinefile machines ./test
The default account setup each user has is their home directory, Lustre scratch space, and their unix group (typically beacon-users).
The command-line interpreter (a.k.a. shell) is the traditional Unix/Linux operating system. The default shell environment on Beacon is bash. There are other shells available, such as sh, csh, tcsh, and zsh. Users may change their default shell in the NICS User Portal. To log into the portal, you need to use your RSA SecurID OTP token.
The modules software package allows you to dynamically modify your user environment by using module files. Modules are very useful to compile and running on Beacon.
Modules can make your computing experience easy. Here is a short list and description of commonly used module commands. Note, if no version number is preceding the package name, it will use the default package.
Shows all loaded modules.
module swap <package A> <package B>
This will swap package B for package A. Useful to switch to versions of other modules.
module avail <package>
If no package is given, it will list all available modules. This command is useful to see which versions of particular software are installed. Try: module avail zlib
module show <package>
This gives you the information concerning the installed software. One will see the setenv commands that will modify your environment if you decide to load that module. This is useful for two major reasons. First, you can make sure what executable you like to run. Here, you can perform a ls on the outputted bin directory. Secondly, some environmental variables could be introduced. For instance, the FFTW module will provide environmental variables that point to the library and include directories. Simply include these variables into your makefile versus including the full path. Note that all modules are now complete for Xeon and Xeon Phi.
Beacon has C/C++ and Fortran compilers from GNU, Intel, and CAPS. However, only the Intel compilers can take full advantage of the Intel Xeon Phi coprocessors, unless you are using OpenACC and having the CAPS compiler convert it to OpenCL code. Additionally, only the Intel MPI library fully supports the Intel Xeon Phi coprocessors. As such, the module for the Intel programming is loaded by default.
One should note that spurious warnings will be given by the Intel 2015.0.024 compiler in offload mode, as it will try the mic libraries first before compiling correctly with the intel64 libraries. Also, since module files have been consolidated, you may see spurious warnings if any compiler tries the mic or intel64 libraries first before finding the correct ones for your chosen mode. Here is an example, so you know to ignore this:
- x86_64-k1om-linux-ld: warning: libimf.so, needed by /global/opt/intel/composer_xe_2015.0.024/compiler/lib/mic/liboffload.so.5, not found (try using -rpath or -rpath-link)
- x86_64-k1om-linux-ld: warning: libsvml.so, needed by /global/opt/intel/composer_xe_2015.0.024/compiler/lib/mic/liboffload.so.5, not found (try using -rpath or -rpath-link)
- x86_64-k1om-linux-ld: warning: libirng.so, needed by /global/opt/intel/composer_xe_2015.0.024/compiler/lib/mic/liboffload.so.5, not found (try using -rpath or -rpath-link)
- x86_64-k1om-linux-ld: warning: libintlc.so.5, needed by /global/opt/intel/composer_xe_2015.0.024/compiler/lib/mic/liboffload.so.5, not found (try using -rpath or -rpath-link)
Also, here is a helpful link for linking MKL with Fortran: Compiling the Intel® Math Kernel Library on the Intel® Xeon Phi™ Coprocessor using Fortran
To use the CAPS compilers for OpenACC, you need to do "module load CAPS" and set the following environment variables:
Now, compile your code:
capsmc --openacc-target=OPENCL icpc acctest.cpp -o ACC
Finally, submit an interactive job and set these two environment variables:
To run your code, simply execute ./ACC and be sure the the openCL codelet is in the same directory as your executable, as it will be sent over to the Xeon Phi.
**Note that when cross compiling a native mode application/library using configure, the following flag must be used
Custom Beacon scripts
- Any secure communication with a MIC requires unique ssh keys that are automatically generated once the scheduler assigns compute nodes
- Custom scripts have been created to use these ssh keys, which prevent prompts asking using users for passwords
|Traditional Command||Custom Beacon Script|
Jobs can be submitted to the queue via the
qsub command. Both batch and interactive sessions are available. Batch mode is the typical method to submit production simulations. If one is not certain on how to construct a proper job executable, it is beneficial to use the interactive queue.
Also, Ganglia CPU metrics have been enabled on the Xeon Phis. We have performed extensive benchmarking that indicates that any associated performance penalty in your applications should be negligible. However, if you notice this background process becoming a problem, you can disable Ganglia monitoring altogether by specifying "-l gres=noganglia" in your batch script or in your qsub command line. Please let email@example.com know if you have any further questions.
For interactive jobs, PBS options are passed through qsub on the command line.
qsub -I -A XXXYYY -l nodes=3,walltime=1:00:00
- -I : Start an interactive session
- -A : Charge to the "XXXYYY" project
Putting it together,"-l nodes=3,walltime=1:00:00" will request 3 compute nodes for one hour.
After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the std input and std output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command will end the interactive job. From here commands may be executed directly instead of through a batch script.
If you are desiring to run a native OpenMP application on the Xeon Phi in interactive mode, you may:
- run with micmpiexec -n 1 -env OMP_NUM_THREADS=N
- micssh into the Xeon Phi and run from prompt
- micssh micX env LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH executable
Using Intel's VTune Performance Analysis in Interactive Mode
Using Intel's VTune for performance optimization and profiling is a useful tool, and it is available as part of the Intel Cluster Studio XE suite on Beacon available to all users. We suggest using the following command to run from the command line without the GUI for purposes of X11 display lag reduction:
module load vtune
You can also look for more information about what tests can be run here: http://software.intel.com/sites/products/documentation/doclib/iss/2013/a...
Finally, you can open the GUI with "module load vtune" and then the command "amplxe-gui", load in the sampling files from the command line run, and begin exploring!
NOTE: YOU CANNOT RUN SAMPLING WITH INTEL TUNE ON THE LOGIN NODE. PLEASE SUBMIT AN INTERACTIVE JOB WITH QSUB -I -X TO USE INTEL VTUNE.
All non-interactive jobs must be submitted on Beacon by a job script via the qsub command. All job scripts start with a series of #PBS directives that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable: the micmpiexec command is used to launch one or more parallel executables on the compute nodes and/or coprocessors.
The following example shows a typical job script that submits a parallel job that executes ./a.out on 2 compute nodes, charged to the fictitious account UT-AACE-TEST with a wall clock limit of one hour and 15 minutes:
#PBS -A UT-AACE-TEST
#PBS -l nodes=2,walltime=01:15:00
micmpiexec -n 2 ./a.out
If you are desiring to run a native OpenMP program on the Xeon Phi in batch mode, you may use:
- micmpiexec -n 1 -env OMP_NUM_THREADS=N
- Use the following script within a script construct:
micssh $(hostname)-mic0 $TMPDIR/test.sh
Where test.sh is:
#2 is important because a simple micssh will not automatically pass the OpenMP environment and any other environment variables to the card without passing excessive information.
Very important note: Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_needed_envars>. You can also include "module load" statements in the job script.
Please use the following acknowledgement information on publications where Beacon was a resource used:
This material is based upon work supported by the National Science Foundation under Grant Number 1137097 and by the University of Tennessee through the Beacon Project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the University of Tennessee.