Kraken will be decommissioned on April 30, 2014. For more information see Kraken Decommission FAQs
Kraken will be decommissioned on April 30, 2014. For more information see Kraken Decommission FAQs
The National Institute for Computational Sciences

Beacon

  Beacon User Guide


System Overview

Overview

Beacon is an energy efficient cluster that utilizes Intel® Xeon Phi™ coprocessors.  It is funded by NSF through the Beacon project to port and optimize scientific codes to the coprocessors based on Intel's Many Integrated Core (MIC) architecture.



System Configuration

The Beacon system offers access to the following:

  • 48 compute nodes and 6 I/O nodes
  • FDR InfiniBand interconnect providing 56 Gb/s of bi-directional bandwidth
  • Each compute node is equipped with:

  • 2 8-core Intel® Xeon® E5-2670 processors
  • 256 GB of memory
  • 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
  • 960 GB of SSD storage
  • Each I/O node provides:

  • Access to an additional 4.8 TB of SSD storage
  • Overall, Beacon provides 768 conventional cores and 11,520 accelerator cores that provide over 210 TFLOP/s of combined computational performance, 12 TB of system memory, 1.5 TB of coprocessor memory, and over 73 TB of SSD storage, in aggregate.



    Beacon

    Back to Contents


    File Systems

    Beacon has access to three file systems: NFS, Lustre, and Local SSD.

    NFS

    NFS home directories have a 2 GB quota. The path to this directory is:

    nics/[a-e]/home/$USER
    This filesystem is only available on the login node and the compute nodes.

    Lustre

    Each user has a Lustre scratch directory in

    /lustre/medusa/$USER
    This filesystem is available to the login node, compute nodes, and all coprocessors.  There is no quota limit placed on Lustre; however, files older than 30 days are eligible to become purged: it is recommended to move your data once the calculation has completed. Any attempt to circumvent this purge policy may lead to deactivation of your account.

    Local SSD

    A root directory on the local SSD scratch space contains folders named mic0, mic1, etc., and is mounted by the compute nodes.  The coproccesors on the compute nodes mount their respective mic# folder.  These unique directories have an absolute path determined by the job id assigned by the scheduler, and can be accessed through the environment variable TMPDIR.

    Visual representation of the Local SSD file system

    Given the speed of the SSD drives, using $TMPDIR is preferable to using the Lustre scratch space.  The directories bin and lib are automatically created in each mic# folder for the user's convenience.  When a native mode application is finished running, all output should be copied out of $TMPDIR/mic# to either the NFS home or Lustre scratch file system.

    Back to Contents


    System Access

    Logging into Beacon - OTP

    SSH access to the login nodes requires the use of a One Time Password (OTP) token.  NICS requires the notarized NICS Token Activation form returned to them before activating the OTP.  Once your OTP token has been enabled you will receive an email with instructions to set your Personal Identification Number (PIN).  To set up your OTP please visit Setting up your OTP.

    Once you have set your PIN, you may log in using your OTP token to ssh to the system. In the example below, userid would be replaced by your NICS username. Users are prompted for their OTP token by the PASSCODE prompt. The PASSCODE is made up of your PIN, followed by the number displayed on the OTP token (see picture). For example, if your pin is 1234 and the token code is 159759, enter 1234159759.

    Note: No characters will appear when entering your PASSCODE
    % ssh your_userid@resource.nics.tennessee.edu
    Enter PASSCODE: 

    RSA Keyfob

    Accounts that are not used for a period of three consecutive months are disabled. If you believe your account has been disabled for inactivity please submit a request to help@xsede.org or you may call the helpline directly at 865-241-1504.

    ** Do not share your OTP and PIN with anyone. This will lead to immediate deactivation of your account. **

    Accessing Beacon

    Access is granted to Beacon using ssh. In order to ssh to Beacon you must use a one time password (OTP) token. New users will receive a NICS Token Activation form via email. The user must return this notarized form and then the OTP token will be mailed to the user via snail mail. Only then can you login via ssh to Beacon:

    ssh <username>@beacon.nics.utk.edu

    Back to Contents

    Accessing the MICSMC software

    The micsmc software, used for calculating CPU/Threading usage on thePhi/MIC cards is installed on Beacon and can only be used on the compute nodes after forwarding X. In order to do this, you must first connect to Beacon using ssh with X forwarding enabled:

    ssh -X username@beacon.nics.utk.edu

    Next, enable X forwarding again through the queueing system by issuing the following command:

    qsub -X -I -A PROJECT_ACCOUNT

    Once connected to a compute node, you may issue the following command to bring up the micsmc GUI:

    [username@beacon### ~]$ /opt/intel/mic/bin/micsmc & 

    The status panel will then be launched in the background, allowing you to observe the real time utilization of the Xeon Phis.

    Back to Contents


    MIC Programming Models

    Native Mode

    • All code runs directly on the coprocessors
    • Any libraries used will need to be recompiled for native mode usage
    • Use the compiler flag '-mmic' to compile for native mode

    Offload Mode

    • Code starts running on CPU host
    • Parallel regions of code can be manually specified to run on the coprocessors using pragmas/directives
    • Data is either copied explicitly to the coprocessors or implicitly (used for complex data types involving pointers, only available in C++)
    • Automatic Offload (AO) is available for certain Intel Math Kernel Library (MKL) functions: ?GEMM, ?TRSM, ? PORTF, ?GEQRF, and ?GETRF

    Current Intel recommended approach for reading files from within an offload section

    • Files can be read from the NFS mount of /global, but only in read only mode
    • Files can also be read directly from the MIC's internal memory
    • To transfer a file directly to a MIC's internal memory use micscp file beaconXXX-micX:/User or /tmp
    • Be sure to reset the permissions on the file and directory ($TMPDIR or /tmp or /User) such that 'other' has read/write/execute permissions (all I/O in offload region is executed as 'micuser')
    • Pass in the file pointer to the offload region in a "nocopy" clause
    • Perform open, close, and read operations using the file pointer
    • Use the absolute path as the argument to fopen()
    • Remember to copy any output files off of the MICs before exiting a job
    • Files cannot be read from /lustre/medusa
    • $HOME is not mounted on the MICs

    Heterogeneous MPI Tasks on Hosts With Offload To Cards (Could Include OpenMP)

    • Run MPI tasks as you would for Native Mode on the Host
    • Use MIC_ENV_PREFIX=MIC_ and MIC_OMP_NUM_THREADS=N to have all cards offloaded to use same number of threads
    • Use omp_set_num_threads in the code and have it logically tied to the device type or number for different numbers of threads for each different offload
    • This is an important time to set MIC_KMP_AFFINITY, especially if offloading to multiple cards from the same host: http://software.intel.com/en-us/articles/openmp-thread-affinity-control. Also, the pseudocode snippet below will assist with offloading to multiple devices from one host:

    • #pragma omp parallel
      #pragma omp single
      {
      #pragma omp task
          #pragma omp target(mic) OR #pragma offload target(mic)
          {
          <various serial code>
          #pragma omp parallel for
          for (int i=0; i < limit; i++)
          <parallel loop body>
          }
      #pragma omp task
      <host code or another offload>
      }

    Heterogeneous MPI Tasks on Hosts or Cards With OpenMP (No Offload)

    • A user is running MPI ranks on the hosts and or the MICs with OpenMP pragmas to be invoked directly from each MPI task (without offload pragmas)
    • Use omp_set_num_threads in the code and have it logically tied to the device type or number --OR--
    • micmpiexec -n 1 -host beaconXXX -env OMP_NUM_THREADS=N1; -n 1 -host beaconXXX-mic0 -wdir $DIR -env OMP_NUM_THREADS=N2; etc.
    • This passes the appropriate number of OMP threads to each MIC/host through its own local environment variable (-env versus the global -genv)

    Back to Contents


    Computing Environment

    The default account setup each user has is their home directory, Lustre scratch space, and their unix group (typically beacon-users).

    Unix shell

    The command-line interpreter (a.k.a. shell) is the traditional Unix/Linux operating system. The default shell environment on Beacon is bash. There are other shells available, such as sh, csh, tcsh, and zsh. Users may change their default shell in the NICS User Portal. To log into the portal, you need to use your RSA SecurID OTP token.

    Modules

    The modules software package allows you to dynamically modify your user environment by using module files. Modules are very useful to compile and running on Beacon. 

    Modules can make your computing experience easy. Here is a short list and description of commonly used module commands. Note, if no version number is preceding the package name, it will use the default package.

    module list

    Shows all loaded modules.

    module swap <package A> <package B>

    This will swap package B for package A. Useful to switch to versions of other modules.

    module avail <package>

    If no package is given, it will list all available modules. This command is useful to see which versions of particular software are installed. Try: module avail zlib

    module show <package>

    This gives you the information concerning the installed software. One will see the setenv commands that will modify your environment if you decide to load that module. This is useful for two major reasons. First, you can make sure what executable you like to run. Here, you can perform a ls on the outputted bin directory. Secondly, some environmental variables could be introduced. For instance, the FFTW module will provide environmental variables that point to the library and include directories.  Simply include these variables into your makefile versus including the full path. Note that all modules are now complete for Xeon and Xeon Phi.

    Back to Contents


    Application Development

    Compiling

    Beacon has C/C++ and Fortran compilers from GNU, Intel, and CAPS. However, only the Intel compilers can take full advantage of the Intel Xeon Phi coprocessors, unless you are using OpenACC and having the CAPS compiler convert it to OpenCL code.  Additionally, only the Intel MPI library fully supports the Intel Xeon Phi coprocessors.  As such, the module for the Intel programming is loaded by default. Also, here is a helpful link for linking MKL with Fortran: Compiling the Intel® Math Kernel Library on the Intel® Xeon Phi™ Coprocessor using Fortran


      GNU Intel
    C gcc icc/mpiicc
    C++ g++ icpc/mpiicpc
    Fortran gfortran ifort/mpiifort

    To use the CAPS compilers for OpenACC, you need to do "module load CAPS" and set the following environment variables:

    export OPENCL_INC_PATH=/global/opt/intel/opencl/include/

    
export HMPPRT_NO_FALLBACK=1

    
export ACC_DEVICE_TYPE=acc_device_opencl

    export HMPPRT_OPENCL_DEVICE_TYPE=CL_DEVICE_TYPE_ACCELERATOR

    Now, compile your code:

    capsmc --openacc-target=OPENCL icpc acctest.cpp -o ACC

    Finally, submit an interactive job and set these two environment variables:

    export HMPPRT_OPENCL_DEVICE_TYPE=CL_DEVICE_TYPE_ACCELERATOR

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/global/opt/intel/opencl/lib64

    To run your code, simply execute ./ACC and be sure the the openCL codelet is in the same directory as your executable, as it will be sent over to the Xeon Phi.

    **Note that when cross compiling a native mode application/library using configure, the following flag must be used --host=x86_64-k1om-linux

    Back to Contents


    Running your applications

    Custom Beacon scripts

    • Any secure communication with a MIC requires unique ssh keys that are automatically generated once the scheduler assigns compute nodes
    • Custom scripts have been created to use these ssh keys, which prevent prompts asking using users for passwords

    Traditional Command Custom Beacon Script
    ssh micssh
    scp micscp
    mpirun/mpiexec micmpiexec

    Job schedulers

    Jobs can be submitted to the queue via the qsub command. Both batch and interactive sessions are available. Batch mode is the typical method to submit production simulations. If one is not certain on how to construct a proper job executable, it is beneficial to use the interactive queue.

    Also, Ganglia CPU metrics have been enabled on the Xeon Phis. We have performed extensive benchmarking that indicates that any associated performance penalty in your applications should be negligible. However, if you notice this background process becoming a problem, you can disable Ganglia monitoring altogether by specifying "-l gres=noganglia" in your batch script or in your qsub command line. Please let help@nics.utk.edu know if you have any further questions.

    Interactive submission

    For interactive jobs, PBS options are passed through qsub on the command line.

    qsub -I -A XXXYYY -l nodes=3,walltime=1:00:00

    Options:

    • -I : Start an interactive session
    • -A : Charge to the "XXXYYY" project

    Putting it together,"-l nodes=3,walltime=1:00:00" will request 3 compute nodes for one hour.

    After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the std input and std output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command will end the interactive job. From here commands may be executed directly instead of through a batch script.

    If you are desiring to run a native OpenMP application on the Xeon Phi in interactive mode, you may:

    1. run with micmpiexec -n 1 -env OMP_NUM_THREADS=N
    2. micssh into the Xeon Phi and run from prompt
    3. micssh micX env LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH executable

    Using Intel's VTune Performance Analysis in Interactive Mode

    Using Intel's VTune for performance optimization and profiling is a useful tool, and it is available as part of the Intel Cluster Studio XE suite on Beacon available to all users. We suggest using the following command to run from the command line without the GUI for purposes of X11 display lag reduction:

    module load vtune

    /global/opt/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect -app-working-dir --

    You can also look for more information about what tests can be run here: http://software.intel.com/sites/products/documentation/doclib/iss/2013/a...

    Finally, you can open the GUI with "module load vtune" and then the command "amplxe-gui", load in the sampling files from the command line run, and begin exploring!

    NOTE: YOU CANNOT RUN SAMPLING WITH INTEL TUNE ON THE LOGIN NODE. PLEASE SUBMIT AN INTERACTIVE JOB WITH QSUB -I -X TO USE INTEL VTUNE.

    Batch submission

    All non-interactive jobs must be submitted on Beacon by a job script via the qsub command. All job scripts start with a series of #PBS directives that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable: the micmpiexec command is used to launch one or more parallel executables on the compute nodes and/or coprocessors.

    The following example shows a typical job script that submits a parallel job that executes ./a.out on 2 compute nodes, charged to the fictitious account UT-AACE-TEST with a wall clock limit of one hour and 15 minutes:

    #!/bin/bash
    #PBS -A UT-AACE-TEST
    #PBS -l nodes=2,walltime=01:15:00
    cd $PBS_O_WORKDIR
    micmpiexec -n 2 ./a.out

    If you are desiring to run a native OpenMP program on the Xeon Phi in batch mode, you may use:

    1. micmpiexec -n 1 -env OMP_NUM_THREADS=N
    2. Use the following script within a script construct:

    3. #!/bin/bash
      .
      .
      .
      micssh $(hostname)-mic0 $TMPDIR/test.sh

      Where test.sh is:
      #!/bin/sh
      source /etc/profile
      export LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH
      executable

    #2 is important because a simple micssh will not automatically pass the OpenMP environment and any other environment variables to the card without passing excessive information.

    Very important note: Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_needed_envars>. You can also include "module load" statements in the job script.

    Back to Contents


    References

  • Intel MIC developer website
  • Programming and Compiling for Intel® Many Integrated Core Architecture
  • Intel® C++ Compiler XE 13.1 User and Reference Guide
  • Intel® Fortran Compiler XE 13.0 User and Reference Guides
  • The Heterogeneous Offload Model for Intel® Many Integrated Core Architecture
  • Fortran vs. C offload directives and functions
  • Intel® MPI Library and Process Pinning on Xeon Phi™
  • Using MPI and Xeon Phi™ Offload Together
  • Intel® Math Kernel Library on the Intel® Xeon Phi™ Coprocessor
  • Math Kernel Library Automatic Offload
  • Back to Contents


    Beacon Acknowledgement

    Please use the following acknowledgement information on publications where Beacon was a resource used:

    This material is based upon work supported by the National Science Foundation under Grant Number 1137097 and by the University of Tennessee through the Beacon Project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the University of Tennessee.