Kraken will be officially retired and no longer accessible on August 27, 2014. For more information see Kraken Decommission FAQs.
Kraken will be officially retired and no longer accessible on August 27, 2014. For more information see Kraken Decommission FAQs.
The National Institute for Computational Sciences

Beacon

  Beacon


MIC Programming Models

Native Mode

  • All code runs directly on the coprocessors
  • Any libraries used will need to be recompiled for native mode usage
  • Use the compiler flag '-mmic' to compile for native mode

Offload Mode

  • Code starts running on CPU host
  • Parallel regions of code can be manually specified to run on the coprocessors using pragmas/directives
  • Data is either copied explicitly to the coprocessors or implicitly (used for complex data types involving pointers, only available in C++)
  • Automatic Offload (AO) is available for certain Intel Math Kernel Library (MKL) functions: ?GEMM, ?TRSM, ? PORTF, ?GEQRF, and ?GETRF

Current Intel recommended approach for file I/O from within an offload section

  • Files can be read from the NFS mount of /global, but only in read only mode
  • Files can be directly read from and written to the MIC's internal memory
  • To transfer a file directly to a MIC's internal memory use micscp file beaconXXX-micX:/User or /tmp
  • Be sure to reset the permissions on the file such that 'other' has read/write permissions (all I/O in offload region is executed as 'micuser')
  • Use the absolute path as the argument to fopen()
  • Remember to copy any output files off of the MICs before exiting a job
  • Files can also be read from and written to /lustre/medusa/$USER, and be sure to reset file permission (o+rw) as above and directory permission (o+x)
  • $HOME is not mounted on the MICs

Heterogeneous MPI Tasks on Hosts With Offload To Cards (Could Include OpenMP)

  • Run MPI tasks as you would for Native Mode on the Host
  • Use MIC_ENV_PREFIX=MIC_ and MIC_OMP_NUM_THREADS=N to have all cards offloaded to use same number of threads
  • Use omp_set_num_threads in the code and have it logically tied to the device type or number for different numbers of threads for each different offload
  • This is an important time to set MIC_KMP_AFFINITY, especially if offloading to multiple cards from the same host: http://software.intel.com/en-us/articles/openmp-thread-affinity-control. Also, the pseudocode snippet below will assist with offloading to multiple devices from one host:

  • #pragma omp parallel
    #pragma omp single
    {
    #pragma omp task
        #pragma omp target(mic) OR #pragma offload target(mic)
        {
        <various serial code>
        #pragma omp parallel for
        for (int i=0; i < limit; i++)
        <parallel loop body>
        }
    #pragma omp task
    <host code or another offload>
    }

Heterogeneous MPI Tasks on Hosts or Cards With OpenMP (No Offload)

  • A user is running MPI ranks on the hosts and or the MICs with OpenMP pragmas to be invoked directly from each MPI task (without offload pragmas)
  • Use omp_set_num_threads in the code and have it logically tied to the device type or number --OR--
  • micmpiexec -n 1 -host beaconXXX -env OMP_NUM_THREADS=N1; -n 1 -host beaconXXX-mic0 -wdir $DIR -env OMP_NUM_THREADS=N2; etc.
  • This passes the appropriate number of OMP threads to each MIC/host through its own local environment variable (-env versus the global -genv)

Heterogeneous MPI Tasks on Hosts and Cards With Machine File

  • A user is running MPI ranks on the hosts and the MICs with an executable named test and test.MIC
  • Set environment variable I_MPI_MIC=1 and I_MPI_MIC_POSTFIX=.MIC
  • Use command: generate-mic-hostlist hybrid X Y > machines, where X is the number of Xeon ranks ON EACH NODE REQUESTED and Y is the number of Xeon Phi ranks ON EACH PHI ON EACH NODE REQUESTED.
  • micmpiexec -n NNODES*X+4*NNODES*Y -machinefile machines ./test

Back to Contents