Kraken will be officially retired and no longer accessible on August 27, 2014. For more information see Kraken Decommission FAQs.
Kraken will be officially retired and no longer accessible on August 27, 2014. For more information see Kraken Decommission FAQs.
The National Institute for Computational Sciences

Frequently Asked Questions

Debugging

How do I use the code bisection method to find a bug?

While using tools is a preferable method of debugging to simply using print statements, sometimes the latter option is the only method to find the bug. In this case, the most effective way to isolate the error in your code is through the method of bisection, which is an iterative process for tracing the program manually.

Step 1: In the main routine of your code, comment out the second half of the code (or approximately the second half).

Step 2: Compile and run the code. Did it crash as before?

Step 3A: If yes, return to step one and comment out the second half of the part of the main routine that ran successfully. Repeat until you have narrowed it down to the line/routine causing issues, which may include following this same tack within a subroutine.

Step 3B: If no, then swap out the half which was commented and try compiling and running again. Then, go to Step 3A.

Additionally, the use of print statements to see variable values can give insight into some earlier piece of code that might have been run through just fine but is creating an errant, unacceptable value that causes a later routine to crash.

Finally, if there is any way to duplicate the error in serial, this makes the print statements more consistent (as far as being ordered chronologically, since they are not all coming from different processors' buffers).

Now, while this might sound like a lot of work, and it is non-trivial, here is a tip to make your burden lighter: Have three sessions open on Darter simultaneously.

1. One session to edit the code.
2. Another session to compile the code.
3. Another session in which you submit for an Interactive Job so that you do not have to submit your job every time and wait in the queue.

How do I use Cray ATP on Darter to determine where and why a code died abnormally?

Sometimes a code will work fine in many cases and circumstances but there will be a bug which only rears its head when a certain perfect storm of case and job size occurs. This causes the code to die in a strange spot and it is not obvious exactly why or where. In cases like this, Cray's ATP (Abnormal Termination Processing) can likely help!

Simply do

module load atp 

and re-compile your code without optimization (use the "-g" flag for debugging) and with the Cray compilers (ftn or cc). This simultaneously helps assure that the error was not brought on by compiler optimization mistakes and creates the instrumented executable.

Now, you are ready to use ATP to generate a backtrace to the line where the code died.

Add the following to your PBS script to make sure that the ATP module is loaded into your aprun environment and that the ATP environment variable is set to collect information:
module load atp
export ATP_ENABLED=1

If a backtrace file appears in your directory upon run termination, search through it to find the line that your code died on. If the code completes successfully, you need to lower the compiler optimization number in order that the compiler does not optimize your code to incorrect results.

Also, you may go back and add "-traceback", an Intel compiler flag, to the compilation, which may assist in producing a traceback file as well. This only works when "ProgEnv-intel" is loaded, but you can pass it to "cc" or "ftn" and it will pass it to the backend Intel compiler.

If you are still unable to find the problem, stepping through with a debugger like DDT or Totalview may be helpful.

Test memory usage on the compute node

In order to determine memory usage for a given process on a compute node, one would normally simply issue the command "top" and look at the memory usage of the process in question. However, this cannot be done on a Darter compute node, since they are not accessible to the user. Also, OOM (Out of Memory) errors often occur even when a problem has been discretized finely enough but memory leaks in the code occur in the worst case scenario, causing the program to crash.

This crashing behavior means that the user needs to instrument their code and fix the memory leaks, and the Scientific Computing staff at NICS have created a simple method to add to your current program in spots where memory usage is suspect due to possible leaks. This can assist with finding potential memory leaks as well as diagnosing situations where memory is growing in a manner not commensurate with what the user expected. While tools like valgrind and electric fence exist, they often slow the code execution to the point where the memory issue cannot be found within the prescribed wall time, making the run a waste of SUs and user time.

The following is a C function "GetMemoryUsage" which can be added into the source tree and compiled along with the rest of the user code. This function returns a program's memory usage on the compute node at the point in the program at which it is called. The idea is that one can insert "GetMemoryUsage" function calls at different places in the source, recompile, and run to observe memory leaks. To test if a function / subroutine has memory leak, one can call GetMemoryUsage at the beginning and end of the function and check if there is noticeable different in memory usage. If there is, that means there is some memory leak in that function, unless it is allocating memory of its own. If the latter is true, then the user should be able to note that the growth was by the exact amount allocated, otherwise a memory leak still exists. Regardless, the user should be able to see how much memory is allocated for a given function and determine if that is commensurate with what they were expecting. Through repeated insertion of the GetMemoryUsage function call, one can narrow down which part of large code is contributing to the memory leak.

The sample program "memusage_test.c" is to show how the function can be used, and running this should assist the user in becoming familiar with how the application works to prepare for use in a larger code base. In the sample program, a code with memory leak is created intentionally, and therefore GetMemoryUsage will keep returning higher and higher memory usage as the program continues. A sample makefile is also provided for convenience.

GetMemoryUsage.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MEMORY_INFO_FILE "/proc/self/status"
#define BUFFER_SIZE 1024

void GetMemoryUsage ( HWM, RSS )
double *HWM, *RSS;
  {
  FILE *fp;
  size_t n = BUFFER_SIZE;
  char buffer [ BUFFER_SIZE ], scratch [ BUFFER_SIZE ];
  char *loc;

  fp = fopen ( MEMORY_INFO_FILE, "r" );
  while ( fgets ( buffer, BUFFER_SIZE, fp ) )
    {
    if ( strncmp ( buffer, "VmHWM:", 6 ) == 0 )
      {
      loc = strchr(&buffer [ 7 ], 'k');
      n = loc - &buffer [ 7 ];
      strncpy ( scratch, &buffer [ 7 ], n );
      *HWM = strtod ( scratch, NULL );
      }
    if ( strncmp ( buffer, "VmRSS:", 6 ) == 0 )
      {
      loc = strchr(&buffer [ 7 ], 'k');
      n = loc - &buffer [ 7 ];
      strncpy ( scratch, &buffer [ 7 ], n );
      *RSS = strtod ( scratch, NULL );
      }
    }
  }
memusage_test.c
#include <stdio.h>
#include <stdlib.h>

int main ( int argc, char **argv)
  {
  int i, j;
  double HWM, RSS;
  double *Array;
  GetMemoryUsage ( &HWM, &RSS );
  printf ( "Initial Usage: \nHWM : %f kB \nRSS : %f kB\n\n", HWM, RSS );
   // Create leaky code
  for ( j = 1; j < 100; j++ )
    {
    Array = malloc ( sizeof ( double ) * 100000 );
    for ( i = 0; i < 100000; i++ )
      Array [ i ] = 0.0;
    Array = NULL;

    GetMemoryUsage ( &HWM, &RSS );
    printf ( "Usage at j = %d \nHWM : %f kB \nRSS : %f kB\n\n", j, HWM, RSS );
    }
  return 0;
  }
Makefile
all:
        cc -c GetMemoryUsage.c
        cc -o memusage_test.exe memusage_test.c GetMemoryUsage.o

clean:
        rm -f *.o *.exe

How do I enable the creation of a coredump file when a program crashes in the compute node?

In order to enable the creation of a coredump file when a program crashes in the compute node of a CRAY XT system like Kraken, the following command should be added to the job script before the aprun call:

Bourne shellulimit -c unlimited
C shelllimit coredumpsize unlimited

 

For example if using a Bourne like job scrip, the script will look like:

#PBS MY_PROJECT
#PBS -l size=12,walltime=00:05:00
#PBS -S /bin/bash

cd $PBS_O_WORKDIR

ulimit -c unlimited

aprun -n 4 ./helloWorld

 

In the previous example, if program 'helloworld' crashes (for example, due a segmentation fault), a coredump file named 'core' will be created in the same directory where the program is located.

 

Note: Using the compiler option '-g' at compile time, will add debugging information to the executable that will facilitate figuring out the location in the source code where the program crashed.

Lustre

What is the default stripe count on the Lustre Medusa filesystem?

The default stripe count on the Lustre Medusa filesystem is 2. Lustre Medusa has 90 OSTs (Object Storage Targets), therefore the maximum stripe count possible is 90.

lfs osts | grep medusa

What should I do in the event of a lustre slowdown?

In the event of a lustre slowdown, there are many things to consider as lustre has many working parts and is shared by all users on the system. NICS continually monitors lustre's performance and seeks to improve researcher's data communications. If you notice your code's I/O performance or the lustre filesystem is slower than usual, please answer the following questions to the best of your knowledge and email XSEDE Help Desk your answers.

  • When did you first notice the slowdown? How long did it last?
  • Which login node were you on?
  • Can you estimate the magnitude of the slowdown? (ex - "It took 2 min instead of 3 secs", "batch job exceeded walltime limit of 10 hours, but normally finishes in 8 hours")
  • What were you doing? Interactive command (like "ls")? Batch job?
  • For interactive commands:
    • Which host were you using?
    • Did you see the same behavior on other hosts?
    • Can you provide the exact command that was run and the directory in which it was run?
  • For batch jobs:
    • Can you supply the job IDs for jobs that were affected?
    • Can you provide any details about the IO pattern for your job?
Is there any other faster way to list my files in my Lustre scratch area?

Yes! A basic ls only has to contact the meta-data server (MDS), not the object-storage servers (OSS), where the bottleneck often occurs. In general, ls is aliased to give additional information, which requires the OSS's. You can bypass this by using /bin/ls. When there are many files in the same directory, and you don't need the output to be sorted, /bin/ls -U is even faster.

You can also use the Lustre utility lfs to look for files. For example, the syntax to emulate a regular ls in any directory is

lfs find  -D 0  *

For convenience, you may want to add an alias definition to your login config files. For example Bash users can add to their ~/.bashrc the following line to create an alias called lls.

alias lls="/bin/ls -U"
How do I change the striping in Lustre?

A user can change the striping settings for a file or directory in Lustre by using the lfs command. The usage for the lfs command is

lfs setstripe  -s  -i  -c 

where

size - the number of bytes on each OST (0 indicating default of 1 MB) specified with k, m, or g to indicate units of KB, MB, or GB, respectively.
index - the OST index of first stripe (-1 indicating default)
count - the number of OSTs to stripe over (0 indicating default of 4 and -1 indicating all OSTs [limit of 160]).

NOTE: If you change the settings for existing files, the file will get the new settings only if it is recreated.

To change the settings for an existing directory, you will need to rename the directory, create a new directory with the proper settings, and then copy (not move) the files to the new directory to inherit the new settings.

If your application is the type in which each separate process writes to its own file, then we believe that the best option is to not use striping. This can be set by using this command:

> lfs setstripe  -c 1

Then we see that

> lfs find -v testdirectory
OBDS:
0: ost1_UUID ACTIVE
--snip--
testdirectory/
default stripe_count: 1 stripe_size: 0 stripe_offset: -1

This shows we have a stripe count of 1 (no striping), the stripe size is set to 0 (which means use the default), and the stripe offset is set to -1 (which means to round-robin the files across the OSTs).

NOTE: You should always use -1 for stripe_offset.

The stripe count and stripe size are something you can tweak for performance. If your application writes very large files, then we believe that the best option is to stripe across all or a subset of the OSTs on the file system. Striping across all OSTs can be set by using this command:

> lfs setstripe  -c -1

Caution: Not striping large files may cause a write error if the file's size is larger than the space on a single OST. Each OST has a finite size which is smaller than the total Lustre area of all OSTs.

What is the default striping for my files?

A file's striping is inherited from its parent directory. The lfs getstripe command can be used to determine the striping for a file, or the default striping for a directory. Note that each file and directory can have its own striping pattern. This means that a user can set striping patterns for his own files and/or directories. The default stripe width for a user may be 1 or 4, you can determine by running lfs getstripe /lustre/medusa/$USER.

This command will also give you information on the striping information for a directory/file.

lfs find -v 
What is file striping?

The Lustre file system is made up of an underlying set of file systems called Object Storage Targets (OST's), which are essentially a set of parallel IO servers. A file is said to be striped when read and write operations access multiple OST's concurrently. File striping is a way to increase IO performance since writing or reading from multiple OST's simultaneously increases the available IO bandwidth.

Striping will likely have little impact for the following codes:
  • Serial IO where a single processor or node performs all of the IO for an application.
  • Multiple nodes perform IO, access files at different times.
  • Multiple nodes perform IO simultaneously to different files that are small (each

Lustre allows users to set file striping at the file or directory level. As mentioned above, striping will not improve IO performance for all files. For example, in a parallel application, if each processor writes its own file then file striping will not provide any benefit. Each file will already be placed in its own OST and the application will be using OST's concurrently. File striping, in this case, could lead to a performance decrease due to contention between the processors as they try to write (or read) pieces of their files spread across multiple OST's.

For MPI applications with parallel IO, multiple processors accessing multiple OST's can provide large IO bandwidths. Using all the available OST's on Kraken will provide maximum performance.

There are a few disadvantages to striping. Interactive commands such as ls -l will be slower for striped files. Additionally, striped files are more likely to suffer from data loss from a hardware failure since the the file is spread across multiple OST's.

Please see also: Scratch Space (Lustre) and I/O and Lustre Tips.

Running Jobs

What are the memory limits on the compute nodes of NICS production resources?

Listed below are the limits on the compute nodes of NICS operated resources. Here are the results from some basic tests that were run on our resources to check the real maximum values for allocatable memory and open files:

System | MaxMem | MaxOpenFiles
------------------------------
Kraken | 15.3GB | 1018
Darter | 31.1GB | 1015
Nautilus | 32.1GB | 48 <- Crashes without warning when passing this limit
KFS | 32.1GB | >2048
KIDS | 22.5GB |>2048

I was running my program in an interactive job on Beacon but it didn’t finish. I received the message: qsub: job ####.beacon-mgt.nics.utk.edu completed. How do I request more time for interactive jobs?

Request more time for interactive jobs by providing a specific number of hours/minutes/seconds using
qsub –I –l walltime=hh:mm:ss

Note that 24 hours is the maximum that can be requested. If you need an extension, send an email to help@nics.utk.edu along with any job ids that need to run for more than 24 hours.

I copied all files to $TMPDIR/mic# and ran my program, where are all the output files?

They should be stored at $TMPDIR/mic# and they need to be copied to either the home or Lustre filesystem before the submitted job is complete

How to run jobs on Kraken using aprun?

If a user wants to use:

#PBS -l size=144 ### Assuming you want to use 24 MPI tasks

aprun -n 24 -N 2 -S 1

Here's what the above aprun command means. You are asking for 24 MPI tasks, 2 MPI tasks per node, and 1 MPI task per socket.

At 1 task per socket, it is 2 tasks per node, so it will use 12 nodes (24/2) so the size would be 12*12 = 144. It is best to start with the aprun command to figure out how many nodes will be used, then multiply by 12 to get the value of size. Now, if you want to leave one socket empty for each node (use every other socket), then you would use aprun -n 24 -N 1, that tells it put one MPI process per node.

How do I checkpoint with Amber on Kraken?

Amber 11 has a suite of programs. The following refers to "pmemd.MPI".

In the PBS script the user has the following. For now, ignore everything except the ".rst" files and the "-r" and "-c" switch.

module load amber/11

aprun -n 96 pmemd.MPI -O -i prodnve.in -o bacemdnve.out -p baceleap.prmtop -c baceleapcpw.rst -r bacemdnve.rst -x bacemdnve.nc

What this will do is pull in a restart file (this contains the last time step state that you declared in your input file) from a previous simulation run that you have done and the "-r bacemdnve.rst" portion contains the latest time step state of this simulation's last results. So, if you want to continue on with the simulation your next jobscript will contain a similar aprun command:

aprun -n 96 pmemd.MPI -O -i prodnve.in -o bacemdnve.out -p baceleap.prmtop -c bacemdnve.rst -r latestrestart1.rst -x bacemdnve.nc

It is a good idea to give them different names so that you don't overwrite your checkpointed results so that you can track down the discrepencies.

The biggest concern for this workflow is the timesteps. You should be able to linearly extrapolate the timesteps with respect to the walltime. For example, if 500 steps takes 12 hours of walltime then 1000 steps will take 24 hours. It is best to pay attention to this and start out with separate directories and different filenames so that you can completely keep track of what is going on just in case things are overwritten that you did not want overwritten.

Why is my job being rejected by the scheduler on Kraken?

Each Kraken node has 12 cores and 16 Gbytes of memory: about 1 1/3 GB
per core if all cores are used. Sometimes it is necessary to leave
some cores idle to make more memory available per core. For example,
if you use 8 cores per node, each core has access to about 2 Gbytes of
memory.

#PBS -l walltime=01:00:00,size=1500

aprun -n 1500 -S 4 $PBS_O_WORKDIR/a.out

The above aprun command won't work. The nodes on Kraken have 2
sockets and each socket has 6 cores. That makes a total of 12 cores
per node. Your size should be a multiple of 12. To make a long story
short, use the following formula to get close to a multiple of 12 with
what you want to do.

cores per socket on Kraken * size / processors per socket

6*(1500)/4 = 2250

The next number that is a multiple of 12 is 2256. Change your size to 2256 in
your aprun command and you should be fine.

Why does xtnodestat fail?

The command xtnodestat can be used to see which jobs are currently running and which cabinets, nodes, and processors they are running on. When the system is busy and many jobs are starting or stopping, a read error may occur that produces the following error:

Error: xtnodestat cannot be run unless apstat is in PATH

This is a known bug that has been reported to Cray. The error will be fixed in a future software release. Until then, please retry the command until the appropriate output is returned.

What are the flags to prevent Java code from spawning excessive numbers of garbage collecting threads

When trying to run some java code (a statistical modeling code called maxent) for the Nimbios project on Nautilus, we were seeing that one instance of the code would spawn ~1200 threads. I thought initially that maxent was the culprit--until I ran a simple 'hello world' java program and it too spawned 1200 threads.

Turns out that the java virtual machine spawns garbage collecting threads in accordance with the number of processors that it detects. It also turns out that you can have a say in this process with the following flags:

&nbsp> -XX:ParallelGCThreads=2
&nbsp> -XX:+UseParallelGC

Adding these flags when running the maxent code brought the thread count down to around 16, which seems to be around the baseline of the number of startup threads needed by the jvm. I think any java code run on Nautilus should benefit from using these flags. I haven't done any specific tests on how the value of ParallelGCThreads affects performance. At least with the maxent code, I did notice faster startup times for the jvm.

-Scott/>
Can I use MPI_Alltoall with MPI_IN_PLACE?

The MPI_IN_PLACE option causes communication on an intra-communicator to happen in place, rather than being copied into buffers. This reduces the required number of operations (it is only possible within a node, not between nodes).

In order to use this option with MPI_Alltoall, you need to disable Cray's optimization for that call:

export MPICH_COLL_OPT_OFF=mpi_alltoall
When should I use the PBS option '-V'?

The -V option tells the batch system to remember all of your environmental variables. For example, if I want to set OMP_NUM_THREADS to 4 and then submit the job, I need this flag so that OMP_NUM_THREADS is still set in the batch script. You can use it as a flag such as qsub -V ... or in your batch script like:

#PBS -V

While this can be convenient, it is best practice not to use -V. Why?

  • It makes jobs more self contained. If the script itself must set all the environment variables it needs, the script can be shared between people without confusion. Additionally, when debugging an issue, it is clear from looking at the script what variables are set.
  • This option, when used often, can create additional load for the scheduler, and in rare cases cause a crash (particularly if used in jobs which resubmit themselves)

If you do use -V it is not a problem, and may be recommendable for something like an interactive job, but it is best not to include it in every job script as a matter of habit.

Can I use dynamic shared libraries on the Cray compute nodes?

Dynamic shared libraries are officially not supported by NICS on Kraken. The primary reason for this position is that Kraken is not configured to run DVS servers (separate from Lustre servers) which are required to support the Cray shared root file system necessary to support Cray’s dynamic shared libraries. Additionally, Kraken is one of the largest Cray XTs in the world and as such there is a strong concern that a large scale job (32 K cores or more) dynamically accessing one or more shared libraries simultaneously will cause extreme system slowdown or a crash (this is because the shared libraries must reside in Lustre and in doing so this puts additional load on our IO servers which are often fully loaded.) In the best case scenario, the use of dynamic linking will still cause a performance degradation for an application - the extent is very application dependent (see Johansen, CUG 2009). Dynamic shared libraries may be used on Kraken by placing the shared libraries in Lustre and setting the LD_LIBRARY_PATH appropriately; however, this activity is not officially supported by NICS and as such no guarantee of support will be provided for this mode of operation. We must caution that Lustre striping be set appropriately (stripe count of 1) and more importantly that the LD_LIBRARY_PATH be carefully set to the minimize the number of included paths and in the order of the most used directories first.

How do I get information about my MPICH/Portals settings?

Cray's MPICH has a number of settings (changed using environment variables) that affect what algorithms are used, buffer space, etc. For a list of these variables and their default settings, you can set the following prior to calling aprun:

export MPICH_ENV_DISPLAY=1

This will print a single list, regardless of the number of MPI tasks. It is important to note that these may change based on the core count of the job. In particular, some settings, such as MPICH_UNEX_BUFFER_SIZE scale with the number of MPI tasks. In addition, a job run on a single node can use shared memory rather than Cray's portals for communication, therefore portal-related settings are undefined, and not displayed.

For more information about some of these settings, please see this workshop presentation or "man intro_mpi" contains a description of many of these settings. You can also find that information on Cray's documentation page (under "Introduction to MPI man pages").

How do I change my default limits for stack size, core file size, etc.?

When you connect to a system, your environment is set up with default limits for stack size, core file size, number of open files, etc. The system sets both soft and hard limits for these parameters. The soft limit is the actual limit imposed by the system. For example, the soft stack limit is the maximum stack size the system will allow a process to use. Users cannot increase their hard limits. Hard Limits can be decreased, but its not recommended.

 

While it is rarely necessary to change shell limits on Kraken or Nautilus, there may be times when limits must be changed to get your program to run properly. However, users occasionally need to increase the default limits. This is where the hard limit becomes important. The system allows users to increase their soft limits, but it uses the hard limit as the upper bound. So, users cannot increase their soft limit to a value greater than their hard limit.

The command to modify limits varies by shell. The C shell (csh) and its derivatives (such as tcsh) use the limit command to modify limits. The Bourne shell (sh) and its derivatives (such as ksh and bash) use the ulimit command. The syntax for these commands varies slightly and is shown below. More detailed information can be found in the man page for the shell you are using.

Limit commands

Operationsh/ksh/bash commandcsh/tcsh command
View soft limitsulimit -S -alimit
View hard limitsulimit -H -alimit -h
Set stack size to 128 MB    ulimit -S -s 131072limit stacksize 128m

 

With any shell, you can always reset both soft and hard limits to their default values by logging out and back in.

On the Cray XT, both RLIMIT_CORE and RLIMIT_CPU limits are always forwarded to the compute nodes. If you wish to set any other user resource limits, you must set APRUN_XFER_LIMITS environment variable to 1 along the new limits within the job script before the aprun call:

export APRUN_XFER_LIMITS=1
 #or
setenv APRUN_XFER_LIMITS 1

 

Default user resource limits

The default user resource limits in the compute nodes are:

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) 512
process              unlimited
nofiles              1024
vmemory(kbytes)      unlimited
locks                unlimited

 

 

Can a user login directly to a compute node?

No, users cannot login directly to a compute node, but by submitting an interactive batch job, users can get access to an aprun node, from where they can execute aprun commands to run on a compute node. For more information on how to run interactive batch jobs, please view the information found at Interactive Batch Jobs

Why do I get the error message: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

This message occurs always when running C-shell style job scripts. It is not really an error message, it is a friendly reminder that this is a remote batch job which can not be acted upon (such as ^C or ^Z for suspension).

What are your guiding principles for configuring the queues on Kraken?

Jobs with large core counts intentionally get the highest priority on Kraken – without a high priority they would never run. Kraken enables capability jobs that cannot be run on other XSEDE systems. Jobs with small core counts can be run on other XSEDE systems, and thus their relative priority is lower on Kraken. NICS does not restrict or discourage jobs with small core counts running on Kraken, but their priority is lower than for large jobs.

Jobs with short wall clock limits sometimes start sooner than jobs with a 24-hour limit. These jobs can be used for back-fill while the system is collecting nodes for a larger job. The scheduler can give those nodes temporarily to short jobs without delaying the start time of the large job.

Why am I getting "could not find *.so"? Or: can I use dynamic libraries?

These files are dynamic libraries, which exist on an NFS file system, which is not visible to the compute nodes. Thus, when the dynamic linker goes to add the library, it can not find it. In the past, dynamic libraries were not supported on the compute nodes. Now, you may be able to use dynamic libraries if you have the files on Lustre, but it is recommended that you use static executables regardless. To check if an executable has dynamic linking, use ldd executable

Why does nothing happen when I submit my job?

If you submit your job, it only executes for an instant, gets terminated without any error messages and the output files are empty, it may be that you have a customized login script that changes your shell interpreter at login time by explicitly executing another shell. For example, sometimes users whose default shell is Bash will change it to the C-Shell by doing the following in their .bashrc file:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then        . /etc/bashrc
fi

# User specific aliases and functions

exec csh

If you do want to change your default shell, use the NICS User Portal . To log into the portal, you need to use your RSA SecurID.

What is Optimistic Memory Allocation? How does it affect me?

Linux uses "virtual memory" for each process, which creates the illusion of a contiguous memory block when a process starts, even if physical memory is fragmented, or residing on a hard disk. When a process calls malloc, it is given a pointer to an address in this virtual memory. When the virtual memory is first used, it is then mapped to physical memory.

Optimistic Memory Allocation means that Linux is willing to allocate more virtual memory than there is physical memory, based on the assumption that a program may not need to use all the memory it asks for. When a node has used all its physical memory, and there is another call to malloc, instead of giving a null pointer, the program will receive a seemingly good pointer to virtual memory. When the memory is used, the kernel will try to map the virtual memory to physical memory, and enter an "Out of Memory" condition. To recover, the kernel will kill one (or more) process. On Kraken, this will almost certainly be your executable. You should see "OOM killer terminated this process."

For more information, see O'Reilly's writeup or man malloc under "Bugs".

Why am I not getting the basic error messages I expect?

Sometimes some of the basic error messages (such as reading past the EOF) are suppressed because a shell interpreter is not specified in the PBS script. Make sure that the first line of the PBS script contains a shell interpreter: #!/bin/bash, for example.

Where should I run serial executables?

If your application does not require interactivity or a rich environment, it can run on the compute nodes

If your application can not easily run on the compute nodes, you may want to get access to an analysis machine such as Nautilus. Please do not run serial analysis applications on Kraken's service nodes.

Why do I get the error "qsub: Job exceeds queue resource limits.." when submitting a job?

The queuing system on Kraken does not allow memory requests with the #PBS -lmem= flag. Jobs requesting memory will be rejected with the error message shown above.

Memory on the Kraken is not shared between nodes. Each node has access to 16 GB of memory: 1 1/3 GB per core if all cores are used. Thus, memory is directly related to the number of processors requested. Because the memory is not shared, it does not make sense to request memory directly via PBS. (It is implicitly requested based on the #PBS -lsize=... request.)

How do I find out what nodes I am using?

There are a couple of easy ways to find out what nodes are assigned to your batch job. The easiest is to issue checkjob . Part of the output will return a list of nodes like the following:

Allocated Nodes:      

[84:1][85:1][86:1][87:1][88:1][89:1][90:1][91:1]

The method returns the a logical numbering of nodes. A physical numbering of the nodes as well as the pid layout can be obtained by setting the PMI_DEBUG variable to 1.

> setenv PMI_DEBUG 1
> aprun -n4 ./a.out
Detected aprun CNOS interface
MPI rank order: Using default aprun rank ordering
rank 0 is on nid00015 pid 76; originally was on nid00015 pid 76
rank 1 is on nid00015 pid 77; originally was on nid00015 pid 77
rank 2 is on nid00016 pid 69; originally was on nid00016 pid 69
rank 3 is on nid00016 pid 70; originally was on nid00016 pid 70

From within your code, you can reference PMI_CNOS_Get_nid to get the physical number for each process.

#include 
#include "mpi.h"
int main (int argc, char *argv[])
{
  int rank,nproc,nid;
  int i;
  MPI_Status status;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);
  PMI_CNOS_Get_nid(rank, &nid);
  printf("  Rank: %10d  NID: %10d  Total: %10d \n",rank,nid,nproc);
  MPI_Finalize();
  return 0;
}

The output with four cores would be as follows:

aprun -n4 ./hello-mpi.x
  Rank:          1  NID:         15  Total:          4
  Rank:          0  NID:         15  Total:          4
  Rank:          2  NID:         16  Total:          4
  Rank:          3  NID:         16  Total:          4
Application 13390 resources: utime 0, stime 0

aprun can be used to run Unix commands on the compute nodes that display the node names as shown below.

> aprun -n4 /bin/hostname
nid00015
nid00015
nid00016
nid00016
>

Or

> aprun -n4 /bin/cat /proc/cray_xt/nid

15
15
16
16
>
How can I get more details on what hardware is on Nautilus?
Some details regarding the hardware on Nautilus can be found by using the "cpumap" command.

Runtime Errors

Why does it keep asking for my password when I try to SSH to mic# on beacon#?

Use micssh instead of ssh
The necessary SSH keys are provided through the micssh script

Why is my password is being asked for when trying to run an MPI program on a MIC?

Use micmpiexec instead of mpiexec The necessary SSH keys are provided through the micmpiexec script

Why am I getting a 'syntax error: unexpected "(" ' error message?
The binary being run was meant to be run on the host, not the MIC. Copy the binary that was compiled with –mmic and try again
Why am I getting an 'execvp error on file' error?
The proper path must be specified when using micmpiexec, i.e. use
micmpiexec –n 1 ./program
instead of
micmpiexec –n 1 program
Why do I get error " PtlMDBind failed with error : PTL_NO_SPACE " ?

This is due to the fact that the application is not pre-posting receives. When receives are not posted the MPI library does not know how much buffer space to allocate for receiving the message. Numerous messages are coming in and the MPI library runs out of memory for the allocation of buffer. One way to work around the problem is to tell the MPI library to only receive the header of the message from the sender and then it can tell where to put the message in the users application array. By modifying the environment variables users can give the MPI library more latitude to allocate buffer space. The real solution is to fix the application program to pre-post the receives prior to sending the messages.

My Kraken batch job aborts with the error MPIDI_PORTALSU_REQUEST_FDU_OR_AEP: DROPPED EVENT ON UNEXPECTED RECEIVE QUEUE. What does it mean?

To prevent the error, set MPICH_PTL_SEND_CREDITS=-1. A flow control mechanism can be enabled.

For best performance, the number of event queue entries for the MPI unexpected receive queue should be set as high as possible. For example, set MPICH_PTL_UNEX_EVENTS=80000.

Note that this fix does not address unexpected message buffer exhaustion. Thus, the user may still need to adjust MPICH_MAX_SHORT_MSG_SIZE or MPICH_UNEX_BUFFER_SIZE if this buffering overflows.

See the mpi_intro man page for details.

Accounts

How do I know how many SUs I have used on a NICS system?

Users can use command "susage" and "mybalance" to check how many SUs have been used.
mybalance
Project Machines Balance
------------- ----------------------- ----------
TG-SEE090006 keeneland.nics.teragrid 0
TG-STA110018S kraken.nics.teragrid 976104804
TG-STA110018S nautilus.nics.teragrid 1080000000

susage -u username

Project Resource StartDate EndDate Allocation Remaining Usage
-------------+-------------------------+----------+----------+------------+------------+------------
TG-STA110018S kraken.nics.teragrid 09-21-2013 09-21-2014 300000.00 271140.22 28859.78

I received an error when I logged in, and now the system can't find commands. Why?

When the C shell (or one of its derivatives, such as tcsh) is starting up and encounters an error in one of its initialization files, it stops processing its initialization files. So, any aliases, environment settings, etc., that occur after the line that caused the error will not be processed. For help in troubleshooting the startup files, contact the User Assistance Center.

Why do I get the error "init.c(375):ERROR:50: Cannot open file "" for 'append'" when I log in?

This message usually means that you are at or near your home directory quota and that some part of the login process was trying to write there. This is often caused when the modules utility is loaded because it needs to write files to your home directory. You will need to reduce the usage in your home directory to log in successfully.

You may also notice that after getting this message, some commands cannot be found. This is due to the way C shell handles errors.

How do I list all projects for which I am a member?

You can use the showusage utility to view all projects for which you are a member.

How do I view my allocation and usage?

Users can view their allocation and usage on allocated systems using the showusage utility. showusage returns year-to-date usage and allocation for the calling user’s allocated project(s). Usage is calculated from the first day of the fiscal year through midnight of the day before the request.

You can also check charges for individual jobs using glsjob [-u ] [-p ]. The sum of jobs within a project should equal the showusage within rounding error.

HPSS

How do I check my usage on HPSS?

In order to check one's usage on HPSS, use the "du" command in the most top level directory, or "du -s" (summary for the entire directly only) option.

For example: 

O:[/home/username]: du
2137614 4 /home/std00/
305920049 1 /home/std00/directory1/
86648223 1 /home/std00/directory2/
211942420 1 /home/std00/directory3/
156677661 47 /home/std00/direcotry4/
6455083743 1 /home/std00/directory5/
0 0 /home/std00/
-----------------------
7218409709 total 512-byte blocks, 55 Files (3,695,825,770,765 bytes)
 

If a file has been overwritten, can the old one be recovered?

Unfortunately, no. There are no backups of HPSS. Even if a file is written with "copies=2", the overwrite will affect both files (a recovery might technically be possible, but not without significant system interruption).

Is it possible for files to be overwritten in HPSS?

The ~ is appended if the user has "autobackup=on" in their .hsirc file. Otherwise, the file is simply overwritten. Another option is to use "hsi cput" instead of "hsi put". Using cput will cause hsi to give a warning message if the file exists. The file that the user is attempting to store won't be written to HPSS, but the old one won't be overwritten. (The user also needs to pay careful attention to the output from hsi so that they'll notice the file wasn't stored.)

When storing files with similar names, it is best to append a date (and time if necessary) in 4-digit year, 2-digit month, 2-digit day, 2-digit hour, 2-digit minute form to the filename. This provides a unique name but also causes the files to be automatically sorted by ls based on the date for which they contain information (which might not always be the date/time they were written). An example might be file.tar.201212032250 for a date at 10:50 PM.

Why do I get 'out of space' error when transferring files from HPSS to Kraken?

Your file transfer has caused a Lustre storage server (OST) to become full, resulting in an error like:

ead_cond_timedwait() return error 22, errno=0 OUT OF SPACE condition detected while writing local file

This usually happens because the stripe count is too small (often 1). To solve this issue, remove the partially transferred file and change the stripe count of the directory before transferring the file. To change the stripe count of the directory, first cd to that directory. Second, type the following command:

lfs setstripe . -c 8

where 8 is the new stripe count, meaning that any new files in that directory will be striped across 8 OSTs. The larger the stripe count, the more OSTs the file will be striped across. Typically, a stripe count that results in a file using less than 100 GB per OST should usually work.

How do I share my files on HPSS with other members of my research team ?

To find out what groups you are a member of on HPSS use the groups command.

K:[/home/username]: groups
K:HPSS Group List:
  1045: nsf008       1928: nsf008q4s 

This shows the user is a member of groups nsf008 and nsf008q4s.

If other members of your team are listed in the same group you can simply log into HPSS using HSI and change the group and permissions to share the files or directories.

For example, if both you and other members are all in nsf008q4s you will simply need to do a chgrp.

K:[/home/yourusername]: chgrp nsf008q4s filename

Then you will need to do a chmod to make the file group readable.

K:[/home/yourusername]: chmod 750 filename

The other members of the group should then be able to access your files on HPSS.

If you are unsure of the HPSS group that correlates to the NICS project, or the other members of your group are not members of the same group you will need to submit a ticket to help@xsede.org and request they be added to the group on HPSS. Please reference this FAQ in your request.

How do I verify the contents of an archive during creation?

HTAR provides the “-Hverify=option[,option...]” command line option, which causes HTAR to first create the archive file normally, and then to go back and check its work by performing a series of checks on the archive file. You choose the types of checks to be performed by specifying one or more comma-separated options. The options can be either individual items, or the keyword “all”, or a numeric level between 0, 1 or 2. Each numeric level includes all of the checks for lower-valued levels and adds additional checks. The verification options are:

all Enables all possible verification options except “paranoid”
info Reads and verifies the tar-format headers that precede each member file in the archive
crc Reads each member file and recalculates the Cyclic Redundancy Checksum (CRC), and verifies that it matches the value that is stored in the index file.
compare This option directs HTAR to compare each member file in the archive with the original local file.
paranoid This option is only meaningful if “-Hrmlocal” is specified, which causes HTAR to remove any local files or symbolic links that have been successfully copied to the archive file.

If “paranoid” is specified, then HTAR makes one last check before removing local files or symlinks to verify that:
a. For files, the modification time has not changed since the member file was copied into the archive
b. The object type has not changed, for example, if the original object was a file, it has not been deleted and recreated as a symlink or directory, etc.
It is also possible to specify a verification option such as “all”, or a numeric level, such as 0, 1 or 2, and then selectively disable one or more options. In practice, this is rarely, if ever, useful, but the following options are provided:
0 Same as “info”
1 Same as “info,crc”
2Same as “info,crc,compare”
nocompareDisables comparison of member files with their original local files
nocrc Disables CRC checking
noparanoidDisables checking of modification time and object type changes
htar -cvf TEST_VERIFY.TAR /lustre/medusa/$USER -Hcrc -Hverify=2
htar -Hcrc -tvf TEST_VERIFY.TAR
In the example above,
(1) the archive file is created (-c) with verification level 2, including CRC generation and checking. The verbose output option (-v) is used to cause HTAR to display information about each file that is added during the create phase, and then verified during the verification phase.
(2) the archive file is then listed (-t) using the "-Hcrc" option to cause HTAR to display the CRC value for each member file. />/>/>
How do I retrieve a single file from HPSS?

Use hsi -ls to show the tar file in HPSS

>hsi ls -l file.tar
...
-rw-------   1 username     username          12800 Oct  2  2008 file.tar
Use "htar" to list the contents of the tar file:
> htar -tvf file.tar
HTAR: drwxr-xr-x  username/nicsstaff          0 2008-10-02 10:47  dir2/
HTAR: -rw-r--r--  username/nicsstaff       1492 2008-10-02 10:47  dir2/data.pbs
HTAR: -rw-r--r--  username/nicsstaff       1924 2008-10-02 10:47  dir2/mpi.pbs
Use "htar" to extract a single file (name must match what is listed by the above command):
> htar -xvf file.tar dir2/data.pbs
HTAR: -rw-r--r--  username/nicsstaff       1492 2008-10-02 10:47  dir2/data.pbs

How do I retrieve a single directory from HPSS?

To retrieve a single directory from HPSS use the -R option. For example,

>hsi
>get -R dir1
/>

Has my access to HPSS been disabled?

Administrators may disable users for archiving too many small files at a time. Archiving too many small files introduces a lot of overhead on the system, and this archiving system is not designed to handle a lot of small files. Please use htar to tar together your files. Documentation can be found here.

We should contact you if this happens, but if you are concerned that your access to HPSS has been disabled, contact us at help@xsede.org. We can re-enable your HPSS access provided that it is used correctly.

One easy way to increase file size on HPSS is to use 'htar'. For the most part, this works the same as the regular tar. We would prefer that you perform htar on ~10GB chunks. After you confirm that you will be using htar from now on, we will proceed to provide you access to HPSS. Our system staff would like you to remove all of your archived small files from HPSS and archive them again using htar.

Is the HPSS system able to be accessed by more than one process at a time?

There is nothing that should prevent you from running a script that creates multiple simultaneous connections to HPSS. The HPSS system administrator recommends that you should not create more than 1 or 2 connections at a time. Every time you introduce a new instance, the performance of the overall system is degraded.

Can I run HSI from my workstation?

Because HSI is a third-party package, clients may be available for your system; however, NICS currently supports access to HPSS only through HSI clients on the HPC systems.

Can I use HSI without entering my passcode each time?

If you log into kraken using your passcode from your OTP token, you can run HSI without entering your passcode each time. You can also run batch scripts that use HSI in the "hpss" queue. If you logged into kraken-gsi using GSI authentication you will be prompted for your passcode each time you use HSI.

What is the best way to transfer a large number of small files?

HPSS performance is greatly improved when the transfer size is between 8 GB and 256 GB. For that reason, users with large numbers of relatively small files should combine those files into one or a few 8 GB to 256 GB files and then transfer the larger files. The files can be combined with tar on the HPC system, or they can be created on the fly with a command similar to tar cvf some_dir -|hsi put - : somedir.tar. This command will tar all files in the some_dir subdirectory into a file named somedir.tar on HPSS. HPSS also supports the htar command.

What is HSI?

The HSI utility allows automatic authentication and provides a user-friendly command line and interactive interface to HPSS.

How do I access HPSS?

Users may access HPSS from any NICS high-performance computing (HPC) system with the Hierarchical Storage Interface (HSI) utility. An OTP token is required upon entry. Access to HPSS is enabled by typing the command hsi in your linux environment. To exit, simply type quit.

Software Environment

Error when running OpenFOAM version 2.1.1 on Kraken in parallel: */lustre/scratch/proj/sw/openfoam/2.1.1/cnl3.1_gnu4.6.2/OpenFOAM-2.1.1/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libgfortran.so.3: cannot open shared

An extra line might be out in your ~/.bashrc that is changing the
$LD_LIBRARY_PATH. Comment it out and try to run OpenFOAM in parallel again.

Why do I get module load errors for software that I used before the CLE 3.1 upgrade?

Default versions have changed for both Cray and 3rd party software, and some software versions are no longer available. Please check the availability and default versions of applications or libraries. You can also check available software with "module avail" on Kraken.

Why does my array job not work (i.e. #PBS -t or qsub -t)?

Array jobs on Kraken are no longer supported. The submission filter will reject jobs which make use of job arrays (i.e. #PBS -t or qsub -t). These jobs (if submitted) will not run and should be deleted.

Why does my submitted job die with strange shell errors?

The shell initiation line in PBS scripts is not guaranteed to be used to determine the interpreting shell. The default behavior is to use the user's default login shell or the value of the PBS option -S (i.e. #PBS -S /bin/bash or qsub -S /bin/bash). If you are using a shell for a PBS script which is different than your default shell, please use the PBS -S option.

Can I use PVM on Kraken?

PVM is a communication interface for parallel programming. While it has been ported to a number of platforms, including some Cray platforms in the past, it has not been ported to the Cray XTs. Thus, we will not install or support PVM on Kraken. We do allow MPI and Global Arrays on Kraken, as well as pthreads within a single node (this includes OpenMP). It would also be possible to support Unified Parallel C or CoArray Fortran given sufficient demand.

How can I set my environment using .modulerc?

Some sites recommend using the .modulerc file to set your default modules. Do not do so on Kraken : the .modulerc file is read every time module is called. This causes issues with some of the Cray software, the global default module list, and can lead to unexpected results (if you unload a module in the .modulerc file, it will be re-loaded next time you use the module command). Instead, set your default environment in your .bashrc file (or analogue). It is best to send the output (stderr in particular) to a log or /dev/null to prevent .bashrc from printing anything, which may cause errors.

How do I use the modules utility?

For information on modules, see the modules page.

How do I remove the Control-M characters in my text file?

Different operating systems use different methods of indicating the end of a line in a text file. UNIX uses only a new line, whereas Windows uses a carriage return and a line feed. If you have transferred text files from your PC to a UNIX machine, you may need to remove the carriage-return characters. (The line-feed character in Windows is the same as the new-line character under UNIX, so it doesn’t need to be changed.) Some systems provide a command dos2unix that can perform the translation. However, it can also be done with a simple perl command. In the following example, win.txt is the file transferred from your PC, and unix.txt is the new file in UNIX text format:

perl -p -e 's/\r$//' win.txt unix.txt
Why is Vi unresponsive?

If vi appears to hang, but other commands (ls, cat, etc) work normally, try renaming the the .viminfo file:

mv ~/.viminfo ~/.viminfo.bak

This file saves the state of vim, but can sometimes appear to get corrupted due to incompatibilities between different versions of vim

How do I change my login shell?

Users may change their default shell in the NICS User Portal . To log into the portal, you need to use your RSA SecurID.

Where can I find more information?

If you haven't already, please check out the other Kraken resource pages at Kraken resources on compiling, file systems, batch jobs, open issues, parallel I/O tips, CrayPAT overview, and other reports and presentations related to Kraken.

Another good resource (without Kraken-specific information) is the documentation that Cray provides at CrayDocs.

How do I get performance counter data for my program?

Use the following process:

  1. Use module load xt-craypat.
  2. Compile code.
    • If Fortran90 with modules, compile with -Mprof=func.
  3. Run pat_build -u -g mpi a.out.
  4. Run a.out+pat as you would a.out, BUT make sure PAT_RT_HWPC is set to 1 in batch script.
    • If you want just a regular profile, dont set PAT_RT_HWPC.
  5. Run pat_report /*.xf, where is automatically generated by instrumented code.

The resulting output will have performance counter results for the entire run and for each subroutine (however, inlining and C++ optimizations may prevent some subroutines from being profiled).

What profiling tools are available?

At least three profiling tools are available on Kraken.

  1. CrayPat is provided by Cray. Follow this link for more information.
  2. fpmpi is an unsupported product that can provide a very concise profile of MPI routines in an application. To use it, simply load the fpmpi (or fpmpi_papi) module and relink. Then rerun your application. There are a few environment variables to control profiling output:
    • MPI_PROFILE_DISABLE : Disables statistic collection until fpmpi_enable is called (#include fpmpi.h).
    • MPI_PROFILE_SUMMARY : Setting disables creation of individual MPI process statistics files. Should set this when running with 1000s of processes.
    • MPI_PROFILE_FILE : Name of process statistic file; default is profile.txt.
    • MPI_HWPC_COUNTERS : List of events or event set number as in libhwpc.
  3. A third tool that is unsupported is TAU. TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. Basic profiling with TAU can be done in the following steps:
    1. Load the tau module: module load tau
    2. Set the environmental variable TAU_MAKEFILE: In tcsh, setenv TAU_MAKEFILE $(TAUROOT)/lib/Makefile.tau-mpi-pdt
    3. Compile code with the tau wrappers (which should be in your path), tau_f90.sh, tau_cc.sh, or tau_cxx.sh.
    4. You will get a regular executable. Submit your job as usual.
    5. After execution, there should be a profile.xxx text file.

TAU can also do MPI profiling and collect hardware performance counter data.

Compiling

Why shouldn’t I use "make -j 12" when compiling my code on Kraken?

Unlike Kraken's compute nodes, its login nodes have modest hardware specs: a single dual-core Opteron processor with 8 gigabytes of memory. However, each of the Kraken login nodes may have up to 30 user login sessions active at any given time. As a result, a single user who runs a very processor- or memory-intensive task on a Kraken login node can affect the work of several dozen other users. As a result, NICS recommends that concurrent makes ("make -j N") on Kraken be done with an N of 4 or less.

What do I do if I encounter "/usr/bin/ld: cannot find -lrca"?
You need to do "module load rca". This will most likely affect anyone building NAMD or anyone using CHARM++ for development work.
What common changes are needed to compile my programs on Kraken?
  • Replace all compiler commands (mpicc, mpif90, icc, ifort, pgCC, pgf90, etc) with the following: cc (C), CC (C++) or ftn (Fortran).
  • Remove all references to MPI libraries within the makefile.
  • Any references to libraries BLAS, LAPACK, BLACS, and ScalaPACK should be removed from your makefiles. The system will automatically link with the most highly optimized versions of these libraries. (For a complete list of libraries, enter: man libsci)
  • References to MKL can often be removed because their function is replaced by libsci.

Before you compile your code, load any relevant modules for third-party libraries. For example:

module load hdf5-parallel

The documentation will tell you how to use environment variables in your makefile. In the hdf5 example, this is documented in HDF5.

cc -o hdf5example.x hdf5example.c ${HDF5_CLIB}

There are two advantages to using the module with the environment variable instead of the pathname:

  1. If you change versions of hdf5, you only need to load a different module. The makefile does not have to be modified.
  2. If you change to a different compiler and then reload the hdf5 module, the system will load a version of hdf5 that is compiled with the other compiler.

For a list of libraries and other software available for Kraken see NICS Software.

How do I find out what macros are predefined by the compiler?

For Kraken consult the “Cray online documentation” (http://docs.cray.com).

For C, search for the Cray “C and C++ Reference Manual” and for Fortran, consult the “Cray Fortran Compiler Commands and Directives Reference Manual”.

What "endian"ness is the XT4 and XT5? Is there any way to affect it?

The Cray XT4 and XT5 are little-endian. There is a compiler switch -Mbyteswapio that makes the default Fortran unformatted I/O big-endian (read and write.)

Note that this little-endian-to-big-endian conversion feature is intended for Fortran unformatted I/O operations. It enables the development and processing of files with big-endian data organization. The feature also enables processing of the files developed on processors that generate big-endian data (such as IBM, Cray X1, Sun).

I get the error message "OOM killer terminated this process". What is OOM?

This error message indicates that the node is running Out Of Memory. This could be the result of a bug in the code, or memory requirements for the given input. Note that due to optimistic memory allocation, you probably will not get a null pointer, even if you are out of memory. The program should be killed at the point the memory is used.

One quick solution might be to run with only four MPI processes per socket so each process gets a larger share of the memory on the node:

aprun -n  -S 4 ./a.out

Where is the total number of MPI processes. The above solution uses 4 out of 6 cores on each socket, so naively, each MPI task should get 50% more memory (6/4). If this is not enough memory, it is possible to reduce the number of tasks per core further (-S 2). The best solution may be to identify the memory requirements in the code and make any necessary changes there, in terms of memory parameters, domain decomposition, etc.

How can I compile a C program that uses C99 Length Variable Arrays with the PGI compiler?

There is a bug that "affects" all previous PGI compilers that supported the C99 standard. The problem discovered, is related with the processing of a new array type added in C99 called "Variable Length Array", or VLA for short. A bug number 16741 has beed assigned to it, and the fix is expected to be included with version v10.4. A standard workaround for previous and current compiler versions consists in using the "old style" form of specifying the function header, e.g.:

void poisson_solver(myid_w, nprocs_w, nx, ny, nz, data)
int *myid_w; int *nprocs_w; int *nx; int *ny;
int *nz; double data[*nz][*ny][*nx];
Why do I get the error: "/usr/include/c++/4.1.2/backward/backward_warning.h:32:2"?

#include is the Standard C++ way to include header files. The 'iostream' is an identifier that maps to the file iostream.h. In older C++ versions you had to specify the file name of the header file, hence #include . Older compilers may not recognize the modern method but newer compilers will accept both methods even though the old method is obsolete.

fstream.h became fstream vector.h became vector string.h became string, etc.

So although the library was deprecated for several years, many C++ users still use it in new code instead of using the newer, standard compliant library. What are the differences between the two? First, the .h notation of the standard header files was deprecated more than 5 years ago. Using deprecated features in new code is never a good idea. In terms of functionality, contains a set of templatized I/O classes which support both narrow and wide characters. By contrast, classes are confined to char exclusively. Third, the C++ standard specification of iostream's interface was changed in many subtle aspects. Consequently, the interfaces and implementation of differ from components are declared in the global scope. Because of these substantial differences, you cannot mix the two libraries in one program. As a rule, use in a new code and stick to in legacy code that is incompatible with the new library.
Why do I see the message: SEEK_SET is #defined but must not be for the C++ binding of MPI?

The following error message:

#error "SEEK_SET is #defined but must not be for the C++ binding of MPI" 

Is the result of a name conflict between stdio.h and the MPI C++ binding. Users should place the mpi include before the stdio.h and iostream includes.

Users may also see the following error messages as a result of including stdio or iostream before mpi:

#error "SEEK_CUR is #defined but must not be for the C++ binding of MPI" 
#error "SEEK_END is #defined but must not be for the C++ binding of MPI"

When profiling with TAU, you may get this message regardless of the order. In this case, you can add -DMPICH_IGNORE_CXX_SEEK to the compile line to remove the error (this fix should work generally).

How do I link a C++ object with ftn?

Under the 1.5 programming environments used under Catamount, ftn linked in libC.a. Under the 2. programming environments used under CNL, ftn does not link in libC.a. Fortran codes that link in libraries that contain C++ objects will need to add -lC to the link line.

My code compiles without any trouble, but fails in the link step.

Internally, the compilers use several variables/macros even if theyre not specified on the command line. These include F90FLAGS, FFLAGS, CFLAGS, and others. If your make file defines these variables with flags not intended for the link step, the link may fail. For example, if they contain the -c flag, which tells the compiler to skip the link step, the link will fail.

How do I link a C program with Fortran routines?

Use the pgf90 compiler (via the ftn wrapper) to link and provide the -Mnomain option.

If you are receiving the “multiple definition of main” error, you probably have a C program that calls Fortran, and you are linking with the Portland Group Fortran compiler. The Fortran compiler has its own default “main,” and now there is a second main from the C source. You may need to add the -Mnomain flag during link time to fix this.

In the other case, you may get undefined reference to `main', another option is to use the C/C++ compilers to link. Now, 'main' may be defined manually: -Wl tells pgcc/pgCC to pass the following comma-deliminated list to the linker, --defsym defines a list of symbols. Thus, the following should allow your Fortran-with-C program to compile and link (assuming the PGI modules are loaded)

cc -Wl,--defsym,main=MAIN_ ...
Why does my compile fail with the message "relocation truncated to fit: R_X86_64_PC32"?

The “relocation truncated” error message occurs when an object file or executable is too large for the memory model. The default memory model for the PGI compilers is the “small” model. This requires that the object be smaller than 2 GB in size. The PGI compilers support the “medium” memory model, which allows objects to be larger than 2 GB. Unfortunately, for a code to use the medium memory model, all objects and static libraries must be compiled under the medium memory model. Several system libraries are not, so in general, executables on Kraken must use the small memory model.

To work around this error, you should reduce the static memory usage for your code. Common ways to do this include the following:

  • Remove (either by deleting or via compiler directives) subroutines that are not used on the XT platform.
  • Remove static variables (especially large arrays) that are not used on the XT platform.
  • Use allocatable arrays instead of static arrays. Because the memory model applies to only static size, allocatable arrays can be larger than 2 GB with the small memory model.
Why does my compile fail with "/usr/bin/ld: can not find -lsma"?

This error message occurs when using the mpi* compiler wrappers (mpicc, mpif90, etc.). These are intermediate wrappers that should not be called directly by users. Instead, users should compile with either ftn, cc, or CC. The ftn, cc, and CC scripts will do the necessary setup and then automatically call the appropriate intermediate scripts and ultimately the compilers.

What compilers do you support?

We support PGI, GNU, Intel, and Cray compilers. These should be more than sufficient; it is unlikely that we will add other compilers such as Borland. More information on compiling can be found at Compiling and see Modules for assistance with modifying your environment and changing compilers. When compiling code for the compute nodes, do not use the compilers directly, instead use the Cray wrappers (cc, CC, ftn).

We are investigating some new profiling features for standard MPI programs and are unlikely to purchase other compilers unless there is a strong demand for them. Please let us know if you would like to request a compiler (or any other software) by sending us an email at help@xsede.org

Access

How do I log into NICS resources via the XSEDE Portal

NICS provides different methods for logging into their resources. To log in via the XSEDE portal, please click on the link below for step-by-step instructions.

XSEDE login instructions

 

How do I share files between my project members on NICS resources?

To share read access to home directories and top-level scratch directories, each member of the group should enter the following commands:

chmod 750 $HOME
chmod 750 /lustre/scratch/$USER

To provide write access to the members of the group, use "chmod 770". This should be used on a subdirectory and not the top-level directory.

Why can't I forward X11 connections with GSI on Mac OSX?

Apple handles the local "DISPLAY" variable necessary for X11 connections differently than Linux/Unix, therefore older versions of gsissh had trouble parsing the variable, yielding this error:

$ xlogo
/tmp/launch-xZ1piK/: unknown host. (no address associated with name)
X connection to localhost:14.0 broken (explicit kill or server shutdown).

Note that if the remote DISPLAY variable had been broken, it would have given the error "Can't open display".

Updating to the most recent release of gsissh should resolve this issue.

Using GridFTP/GSISSH, I get the error: "Untrusted self-signed certificate in chain with hash..." What is wrong?

Globus is unable to find the correct certificate to authenticate: most likely, you have a ~/.globus/certificates directory which is overriding the system defaults. If this directory exists, rename it and try again. Within the TeraGrid, these certificates are managed for you, so you should not need a certificates directory.

If you do need it for regular transfers to non-TeraGrid sites, you can generally get the certificate from the /etc/grid-security/certificates directory—the name of the file is given by the error message: Untrusted self-signed certificate in chain with hash .

cp /etc/grid-security/ ~/.globus/certificates

These certificates may be changed without notice, so you will periodically have to remove and replace expired certificates.

Why does my SSH connection fail? Why does SSH report that no authentication methods are available?

Your SSH client may not be set up to use the keyboard-interactive authentication method. You will need to use a client that supports the keyboard-interactive authentication method to connect to the NICS computers. Different SSH clients will have different ways of setting the preferred authentication methods, so you may need to contact your system administrator to get your client set correctly.

If your ssh client seems to be set up correctly, it may be that the resource you are trying to connect to is unavailable. You may want to check our announcements.

How do I activate and use my RSA SecurID?

For instructions on activating and using your RSA SecurID, see the connecting page.

Why doesn't the backspace key work as expected?

If backspace produces ^? instead of what you expect, use the following to fix it at the command prompt:

stty erase 

You can put this in your .profile (ksh) or .login (csh) file so upon logging it automatically will be set. This stty command should also be executed only for interactive shells, not batch.

Another tactic is to change the configuration of your SSH client. For instance, if you are using Putty SSH from a Windows system, the default backspace key is -?. This can be changed by going to the keyboard category and changing backspace to be -H.

File Transfer

Why do I get 'no space left on device' error when writing from Fortran?

Your Fortran program seems to be writing a large file of stripe size 1, resulting in an error like:

forrtl: No space left on device, forrtl: severe (38): error during write, unit 12, file /lustre/scratch/$USER/...


Move the partially transferred file elsewhere or delete it. Then, cd to the directory where the partially transferred file once was. Issue the following command to change the striping of the directory:

lfs setstripe . -c 8
Why does SFTP exit with the error "Most likely the sftp-server is not in the path of the user on the server-side"?

Examples of this error are

File transfer server could not be started or it exited unexpectedly.
Exit value 0 was returned. Most likely the sftp-server is not in the path of the user on the server-side.orReceived message too long 1500476704

These errors are usually caused by commands in a shell run-control file (.cshrc, .profile, .bashrc, etc.) that produce output to the terminal. This output interferes with the communication between the SSH daemon and the SFTP-server subsystem. Examples of such commands might be date or echo. If you use the mail command to check for mail, it can also cause the error.

You can check to see if this is likely the problem. If you are unable to SFTP to a machine, try to connect via SSH. If you are able to SSH, and you receive output to your terminal other than the standard login banner (for example, “You have mail”), then you need to check your run-control files for commands that might be producing the output.

To solve this problem, you should place any commands that will produce output in a conditional statement that is executed only if the shell is interactive. For C shell users, a sample test to put in your .cshrc file would be

if ($?prompt)
  date
endif

The equivalent command for your .profile file (ksh/bash) would be

if [[ -n $PS1 ]]; then
 date
fi

How do I transfer data between the NICS and other UNIX-based systems?

The SSH-based SCP and SFTP utilities can be used to transfer files to and from NICS systems.

For larger files, the multistreaming transfer utility BBCP may be used. The BBCP utility is capable of breaking up your transfer into multiple simultaneously transferring streams, thereby transferring data faster than single-streaming utilities such as SCP and SFTP.

For more information on data transfers, see the remote data section of the data management page.

NFS

I accidentally deleted some files. Can I get them back?

It depends on where the files were and how recently they were created. Scratch directories (/lustre/scratch/$USER) are not backed up at all, so any files deleted from those directories cannot be recovered. Home directories are different. Please contact NICS support if you have inadvertently deleted a file in your home area.