The National Institute for Computational Sciences

Frequently Asked Questions

HPSS (Darter and Nautilus): Splitting a HPSS archive into multiple files

You can use the split command to split an archive into multiple files. Please follow the steps and examples provided below.

"Cd" into your /lustre/medusa/ directory where your data is temporary stored and run the following command. Make sure the file striping (https://www.nics.tennessee.edu/computing-resources/file-systems/lustre-s...) in the directory is appropriate for what is being done.

NOTE: The syntax is very important. Please pay close attention to the "." at the end of the filename (i.e. myarchive.tar.).

If you want to combine multiple files into an archive, then split them into 1 GB files, do the following:

$ tar -cvf -file1 file2 file3 | split --bytes=1G --suffix-length=4 --numeric-suffix - myarchive.tar. 

When the files need to be recombined and untarred:

$ cat myarchive.tar.* | tar xvf - 

If you already have a single tar file and you want to split it into 10 GB files, do the following:

$ split --bytes=10G --suffix-length=4 --numeric-suffix lustre.scratch.Cray_Tests.tar lustre.scratch.Cray_Tests.tar.split.

If you have a directory you want to tar up, then split into 10MB files (in this case an "applications" directory) you would do the following:

$ tar -cvf - applications | split --bytes=10M --suffix-length=4 --numeric-suffix - applications.tar.

The size of the split files is determined by the option --bytes=??

When the command finishes executing (which could be a while), you will end up with files applications.tar.0000, applications.tar.0001, and so on. See example output below.

$ ls -l applications.tar* 
-rw-r--r-- 1 you 10485760 Jul 24 13:49 applications.tar.0000 
-rw-r--r-- 1 you 10219520 Jul 24 13:49 applications.tar.0001 

After splitting your achieves, type hsi put *.tar.*. This will start uploading the files to HPSS. This could also take a while so feel free to use the nohup command with this.

When you are ready to retrieve the files for use, type hsi get *.tar.*. After all the files have been transferred to your /lustre/medusa/$USER area, if you want to combine the split files and extract their contents run the following command:

 $ cat applications.tar.* | tar xvf - 

Wait a bit and all the files should join and one file called applications.tar will be extracted.

Beacon: How do I switch from Intel MPI to another MPI implementation?

If the Intel compilers and programming environment is still desired, you need only to execute:

module swap intel-mpi $otherMPIModule

However, if you wish to completely remove the Intel programming environment in order to use another compiler, then you must remove the mpi module first:
module unload intel-mpi

Then, you can unload the compilers, which will automatically unload the Programming Environment (PE-intel):
module unload intel-compilers

Darter: How do I use the code bisection method to find a bug?

While using tools is a preferable method of debugging to simply using print statements, sometimes the latter option is the only method to find the bug. In this case, the most effective way to isolate the error in your code is through the method of bisection, which is an iterative process for tracing the program manually.

Step 1: In the main routine of your code, comment out the second half of the code (or approximately the second half).

Step 2: Compile and run the code. Did it crash as before?

Step 3A: If yes, return to step one and comment out the second half of the part of the main routine that ran successfully. Repeat until you have narrowed it down to the line/routine causing issues, which may include following this same tack within a subroutine.

Step 3B: If no, then swap out the half which was commented and try compiling and running again. Then, go to Step 3A.

Additionally, the use of print statements to see variable values can give insight into some earlier piece of code that might have been run through just fine but is creating an errant, unacceptable value that causes a later routine to crash.

Finally, if there is any way to duplicate the error in serial, this makes the print statements more consistent (as far as being ordered chronologically, since they are not all coming from different processors' buffers).

Now, while this might sound like a lot of work, and it is non-trivial, here is a tip to make your burden lighter: Have three sessions open on Darter simultaneously.

1. One session to edit the code.
2. Another session to compile the code.
3. Another session in which you submit for an Interactive Job so that you do not have to submit your job every time and wait in the queue.

Darter: How do I use Cray ATP to determine where and why a code died abnormally?

Sometimes a code will work fine in many cases and circumstances but there will be a bug which only rears its head when a certain perfect storm of case and job size occurs. This causes the code to die in a strange spot and it is not obvious exactly why or where. In cases like this, Cray's ATP (Abnormal Termination Processing) can likely help!

Simply do

module load atp 

and re-compile your code without optimization (use the "-g" flag for debugging) using any backend compiler (PrgEnv) with the Cray wrappers (ftn, cc, or CC). This simultaneously helps assure that the error was not brought on by compiler optimization mistakes and creates the instrumented executable.

Now, you are ready to use ATP to generate a backtrace to the line where the code died.

Add the following to your PBS script to make sure that the ATP module is loaded into your aprun environment and that the ATP environment variable is set to collect information:
module load atp
export ATP_ENABLED=1

If a backtrace file appears in your directory upon run termination, search through it to find the line that your code died on. If the code completes successfully, you need to lower the compiler optimization number in order that the compiler does not optimize your code to incorrect results.

Also, you may go back and add "-traceback", an Intel compiler flag, to the compilation, which may assist in producing a traceback file as well. This only works when "ProgEnv-intel" is loaded, but you can pass it to the Cray wrappers "cc", "CC", or "ftn" and it will pass it to the backend Intel compiler.

If you are still unable to find the problem, stepping through with a debugger like DDT or Totalview may be helpful.

General: What is the default stripe count on the Lustre Medusa filesystem?

The default stripe count on the Lustre Medusa filesystem is 2. Lustre Medusa has 90 OSTs (Object Storage Targets), therefore the maximum stripe count possible is 90.

lfs osts | grep medusa

Darter: How to determine memory usage on the compute node

In order to determine memory usage for a given process on a compute node, one would normally simply issue the command "top" and look at the memory usage of the process in question. However, this cannot be done on a Darter compute node, since they are not accessible to the user. Also, OOM (Out of Memory) errors often occur even when a problem has been discretized finely enough but memory leaks in the code occur in the worst case scenario, causing the program to crash.

This crashing behavior means that the user needs to instrument their code and fix the memory leaks, and the Scientific Computing staff at NICS have created a simple method to add to your current program in spots where memory usage is suspect due to possible leaks. This can assist with finding potential memory leaks as well as diagnosing situations where memory is growing in a manner not commensurate with what the user expected. While tools like valgrind and electric fence exist, they often slow the code execution to the point where the memory issue cannot be found within the prescribed wall time, making the run a waste of SUs and user time.

The following is a C function "GetMemoryUsage" which can be added into the source tree and compiled along with the rest of the user code. This function returns a program's memory usage on the compute node at the point in the program at which it is called. The idea is that one can insert "GetMemoryUsage" function calls at different places in the source, recompile, and run to observe memory leaks. To test if a function / subroutine has memory leak, one can call GetMemoryUsage at the beginning and end of the function and check if there is noticeable different in memory usage. If there is, that means there is some memory leak in that function, unless it is allocating memory of its own. If the latter is true, then the user should be able to note that the growth was by the exact amount allocated, otherwise a memory leak still exists. Regardless, the user should be able to see how much memory is allocated for a given function and determine if that is commensurate with what they were expecting. Through repeated insertion of the GetMemoryUsage function call, one can narrow down which part of large code is contributing to the memory leak.

The sample program "memusage_test.c" is to show how the function can be used, and running this should assist the user in becoming familiar with how the application works to prepare for use in a larger code base. In the sample program, a code with memory leak is created intentionally, and therefore GetMemoryUsage will keep returning higher and higher memory usage as the program continues. A sample makefile is also provided for convenience.

GetMemoryUsage.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MEMORY_INFO_FILE "/proc/self/status"
#define BUFFER_SIZE 1024

void GetMemoryUsage ( HWM, RSS )
double *HWM, *RSS;
  {
  FILE *fp;
  size_t n = BUFFER_SIZE;
  char buffer [ BUFFER_SIZE ], scratch [ BUFFER_SIZE ];
  char *loc;

  fp = fopen ( MEMORY_INFO_FILE, "r" );
  while ( fgets ( buffer, BUFFER_SIZE, fp ) )
    {
    if ( strncmp ( buffer, "VmHWM:", 6 ) == 0 )
      {
      loc = strchr(&buffer [ 7 ], 'k');
      n = loc - &buffer [ 7 ];
      strncpy ( scratch, &buffer [ 7 ], n );
      *HWM = strtod ( scratch, NULL );
      }
    if ( strncmp ( buffer, "VmRSS:", 6 ) == 0 )
      {
      loc = strchr(&buffer [ 7 ], 'k');
      n = loc - &buffer [ 7 ];
      strncpy ( scratch, &buffer [ 7 ], n );
      *RSS = strtod ( scratch, NULL );
      }
    }
  }
memusage_test.c
#include <stdio.h>
#include <stdlib.h>

int main ( int argc, char **argv)
  {
  int i, j;
  double HWM, RSS;
  double *Array;
  GetMemoryUsage ( &HWM, &RSS );
  printf ( "Initial Usage: \nHWM : %f kB \nRSS : %f kB\n\n", HWM, RSS );
   // Create leaky code
  for ( j = 1; j < 100; j++ )
    {
    Array = malloc ( sizeof ( double ) * 100000 );
    for ( i = 0; i < 100000; i++ )
      Array [ i ] = 0.0;
    Array = NULL;

    GetMemoryUsage ( &HWM, &RSS );
    printf ( "Usage at j = %d \nHWM : %f kB \nRSS : %f kB\n\n", j, HWM, RSS );
    }
  return 0;
  }
Makefile
all:
        cc -c GetMemoryUsage.c
        cc -o memusage_test.exe memusage_test.c GetMemoryUsage.o

clean:
        rm -f *.o *.exe

General: What are the memory limits on the compute nodes of NICS production resources?

Listed below are the limits on the compute nodes of NICS operated resources. Here are the results from some basic tests that were run on our resources to check the real maximum values for allocatable memory and open files:

System   | MaxMem | MaxOpenFiles
------------------------------
Darter    | 31.1GB | 1015
Nautilus  | 32.1GB | 48
Keeneland | 32.1GB | >2048

Nautilus: How can I get more details about hardware configuration?
Some details regarding the hardware on Nautilus can be found by using the "cpumap" command.
Beacon: I was running my program in an interactive job but it didn’t finish. I received the message: qsub: job ####.beacon-mgt.nics.utk.edu completed. How do I request more time for interactive jobs?

Request more time for interactive jobs by providing a specific number of hours/minutes/seconds using
qsub –I –l walltime=hh:mm:ss

Note that 24 hours is the maximum that can be requested. If you need an extension, send an email to help@nics.utk.edu along with any job ids that need to run for more than 24 hours.

Beacon: I copied all files to $TMPDIR/mic# and ran my program, where are all the output files?

They should be stored at $TMPDIR/mic# and they need to be copied to either the home or Lustre filesystem before the submitted job is complete

Beacon: Why does it keep asking for my password when I try to SSH to mic# on beacon#?

Use micssh instead of ssh
The necessary SSH keys are provided through the micssh script

Beacon: Why is my password is being asked for when trying to run an MPI program on a MIC?

Use micmpiexec instead of mpiexec The necessary SSH keys are provided through the micmpiexec script

Beacon: Why am I getting a 'syntax error: unexpected "(" ' error message?
The binary being run was meant to be run on the host, not the MIC. Copy the binary that was compiled with –mmic and try again
Beacon: Why am I getting an 'execvp error on file' error?
The proper path must be specified when using micmpiexec, i.e. use
micmpiexec –n 1 ./program
instead of
micmpiexec –n 1 program
General: How do I know how many SUs I have used on a NICS system?

Users can use commands "showusage" and "mybalance" to check how many SUs have been used.

$ mybalance
Project       Machines                Balance
------------- ----------------------- ----------
TG-SEE090006  keeneland.nics.teragrid          0
TG-STA110018S darter.nics.teragrid     976104804
TG-STA110018S nautilus.nics.teragrid  1080000000


$ showusage

   Project              Resource        StartDate   EndDate    Allocation   Remaining      Usage
-------------+-------------------------+----------+----------+------------+------------+------------
TG-STA110018S darter.nics.teragrid      09-21-2013 09-21-2014    300000.00    271140.22     28859.78

HPSS (Darter and Nautilus): How do I check my usage on HPSS?

In order to check one's usage on HPSS, enter the "hsi" command. Then, use the HPSS "du" command in the most top level directory, or "du -s" (summary for the entire directly only) option.

For example:

O:[/home/username]: du
2137614 4 /home/std00/
305920049 1 /home/std00/directory1/ 86648223 1 /home/std00/directory2/ 211942420 1 /home/std00/directory3/ 156677661 47 /home/std00/direcotry4/ 6455083743 1 /home/std00/directory5/ 0 0 /home/std00/ ----------------------- 7218409709 total 512-byte blocks, 55 Files (3,695,825,770,765 bytes)
Darter: Why do I get error " PtlMDBind failed with error : PTL_NO_SPACE " ?

This is due to the fact that the application is not pre-posting receives. When receives are not posted the MPI library does not know how much buffer space to allocate for receiving the message. Numerous messages are coming in and the MPI library runs out of memory for the allocation of buffer. One way to work around the problem is to tell the MPI library to only receive the header of the message from the sender and then it can tell where to put the message in the users application array. By modifying the environment variables users can give the MPI library more latitude to allocate buffer space. The real solution is to fix the application program to pre-post the receives prior to sending the messages.

Darter: Why shouldn’t I use "make -j 12" when compiling my code?

Unlike Darter's compute nodes, its login nodes have modest hardware specs: a single quad-core processor with 8 gigabytes of memory. However, each of the Darter login nodes may have up to 30 user login sessions active at any given time. As a result, a single user who runs a very processor- or memory-intensive task on a Darter login node can affect the work of several dozen other users. As a result, NICS recommends that concurrent makes ("make -j N") on Darter be done with an N of 2 or less.

General: How do I log into NICS resources via the XSEDE Portal

NICS provides different methods for logging into their resources. To log in via the XSEDE portal, please click on the link below for step-by-step instructions.

XSEDE login instructions

 

General: How do I share files between my project members on NICS resources?

To share read access to home directories and top-level scratch directories, each member of the group should enter the following commands:

chmod 750 $HOME
chmod 750 /lustre/medusa/$USER

To provide write access to the members of the group, use "chmod 770". This should be used on a subdirectory and not the top-level directory.

Darter: How do I run jobs using aprun?

If a user wants to use:

#PBS -l size=192 ### Assuming you want to use 24 MPI tasks

aprun -n 24 -N 2 -S 1

Here's what the above aprun command means. You are asking for 24 MPI tasks, 2 MPI tasks per node, and 1 MPI task per socket.

At 1 task per socket, it is 2 tasks per node, so it will use 12 nodes (24/2) so the size would be 12*16 = 192. It is best to start with the aprun command to figure out how many nodes will be used, then multiply by 12 to get the value of size. Now, if you want to leave one socket empty for each node (use every other socket), then you would use aprun -n 24 -N 1, that tells it put one MPI process per node.

HPSS (Darter and Nautilus): If a file in HPSS has been overwritten, can the old one be recovered?

Unfortunately, no. There are no backups of HPSS. Even if a file is written with "copies=2", the overwrite will affect both files (a recovery might technically be possible, but not without significant system interruption).

HPSS (Darter and Nautilus): Is it possible for files to be overwritten in HPSS?

The ~ is appended if the user has "autobackup=on" in their .hsirc file. Otherwise, the file is simply overwritten. Another option is to use "hsi cput" instead of "hsi put". Using cput will cause hsi to give a warning message if the file exists. The file that the user is attempting to store won't be written to HPSS, but the old one won't be overwritten. (The user also needs to pay careful attention to the output from hsi so that they'll notice the file wasn't stored.)

When storing files with similar names, it is best to append a date (and time if necessary) in 4-digit year, 2-digit month, 2-digit day, 2-digit hour, 2-digit minute form to the filename. This provides a unique name but also causes the files to be automatically sorted by ls based on the date for which they contain information (which might not always be the date/time they were written). An example might be file.tar.201212032250 for a date at 10:50 PM.

Darter: Why is my job being rejected by the scheduler?

Each Darter node has 16 cores and 32 Gbytes of memory: about 2 GB per core if all cores are used. Sometimes it is necessary to leave some cores idle to make more memory available per core. For example, if you use 8 cores per node, each core has access to about 4 Gbytes of memory.

#PBS -l walltime=01:00:00,size=1500

aprun -n 1500 -S 4 $PBS_O_WORKDIR/a.out

The above aprun command won't work. The nodes on Darter have 2 sockets and each socket has 8 cores. That makes a total of 16 cores per node. Your size should be a multiple of 16. To make a long story short, use the following formula to get close to a multiple of 16 with what you want to do.

cores per socket on Darter * size / processors per socket

8*(1500)/4 = 3000

The next number that is a multiple of 16 is 3008. Change your size = to 3008 in your pbs option and you should be fine.

Darter: My batch job aborts with the error MPIDI_PORTALSU_REQUEST_FDU_OR_AEP: DROPPED EVENT ON UNEXPECTED RECEIVE QUEUE. What does it mean?

To prevent the error, set MPICH_PTL_SEND_CREDITS=-1. A flow control mechanism can be enabled.

For best performance, the number of event queue entries for the MPI unexpected receive queue should be set as high as possible. For example, set MPICH_PTL_UNEX_EVENTS=80000.

Note that this fix does not address unexpected message buffer exhaustion. Thus, the user may still need to adjust MPICH_MAX_SHORT_MSG_SIZE or MPICH_UNEX_BUFFER_SIZE if this buffering overflows.

See the mpi_intro man page for details.

Darter: What do I do if I encounter "/usr/bin/ld: cannot find -lrca"?
You need to do "module load rca". This will most likely affect anyone building NAMD or anyone using CHARM++ for development work.
General: Why does my array job not work (i.e. #PBS -t or qsub -t)?

Array jobs are not supported on NICS systems. The submission filter will reject jobs which make use of job arrays (i.e. #PBS -t or qsub -t). These jobs (if submitted) will not run and should be deleted.

General: Why does my submitted job die with strange shell errors?

The shell initiation line in PBS scripts is not guaranteed to be used to determine the interpreting shell. The default behavior is to use the user's default login shell or the value of the PBS option -S (i.e. #PBS -S /bin/bash or qsub -S /bin/bash). If you are using a shell for a PBS script which is different than your default shell, please use the PBS -S option.

General: Why do I get 'no space left on device' error when writing from Fortran?

Your Fortran program may be writing a large file of stripe size 1, resulting in an error like:

forrtl: No space left on device, forrtl: severe (38): error during write, unit 12, file /lustre/medusa/$USER/...


Move the partially transferred file elsewhere or delete it. Then, cd to the directory where the partially transferred file once was. Issue the following command to change the striping of the directory:

lfs setstripe . -c 8
Darter: Why do I get 'out of space' error when transferring files from HPSS to Darter?

Your file transfer has caused a Lustre storage server (OST) to become full, resulting in an error like:

ead_cond_timedwait() return error 22, errno=0 OUT OF SPACE condition detected while writing local file

This usually happens because the stripe count is too small (often 1). To solve this issue, remove the partially transferred file and change the stripe count of the directory before transferring the file. To change the stripe count of the directory, first cd to that directory. Second, type the following command:

lfs setstripe . -c 8

where 8 is the new stripe count, meaning that any new files in that directory will be striped across 8 OSTs. The larger the stripe count, the more OSTs the file will be striped across. Typically, a stripe count that results in a file using less than 100 GB per OST should usually work.

General: What should I do in the event of a lustre slowdown?

In the event of a lustre slowdown, there are many things to consider as lustre has many working parts and is shared by all users on the system. NICS continually monitors lustre's performance and seeks to improve researcher's data communications. If you notice your code's I/O performance or the lustre filesystem is slower than usual, please answer the following questions to the best of your knowledge and email XSEDE Help Desk your answers.

  • When did you first notice the slowdown? How long did it last?
  • Which login node were you on?
  • Can you estimate the magnitude of the slowdown? (ex - "It took 2 min instead of 3 secs", "batch job exceeded walltime limit of 10 hours, but normally finishes in 8 hours")
  • What were you doing? Interactive command (like "ls")? Batch job?
  • For interactive commands:
    • Which host were you using?
    • Did you see the same behavior on other hosts?
    • Can you provide the exact command that was run and the directory in which it was run?
  • For batch jobs:
    • Can you supply the job IDs for jobs that were affected?
    • Can you provide any details about the IO pattern for your job?
Darter: How do I enable the creation of a coredump file when a program crashes in the compute node?

In order to enable the creation of a coredump file when a program crashes in the compute node of a CRAY system like Darter, the following command should be added to the job script before the aprun call:

Bourne shellulimit -c unlimited
C shelllimit coredumpsize unlimited

 

For example if using a Bourne like job scrip, the script will look like:

#PBS MY_PROJECT
#PBS -l size=12,walltime=00:05:00
#PBS -S /bin/bash

cd $PBS_O_WORKDIR

ulimit -c unlimited

aprun -n 4 ./helloWorld

 

In the previous example, if program 'helloworld' crashes (for example, due a segmentation fault), a coredump file named 'core' will be created in the same directory where the program is located.

 

Note: Using the compiler option '-g' at compile time, will add debugging information to the executable that will facilitate figuring out the location in the source code where the program crashed.

Darter: What common changes are needed to compile my programs?
  • Replace all compiler commands (mpicc, mpif90, icc, ifort, pgCC, pgf90, etc) with the following: cc (C), CC (C++) or ftn (Fortran).
  • Remove all references to MPI libraries within the makefile.
  • Any references to libraries BLAS, LAPACK, BLACS, and ScalaPACK should be removed from your makefiles. The system will automatically link with the most highly optimized versions of these libraries. (For a complete list of libraries, enter: man libsci)
  • References to MKL can often be removed because their function is replaced by libsci.

Before you compile your code, load any relevant modules for third-party libraries. For example:

module load hdf5-parallel

The documentation will tell you how to use environment variables in your makefile. In the hdf5 example, this is documented in HDF5.

cc -o hdf5example.x hdf5example.c ${HDF5_CLIB}

There are two advantages to using the module with the environment variable instead of the pathname:

  1. If you change versions of hdf5, you only need to load a different module. The makefile does not have to be modified.
  2. If you change to a different compiler and then reload the hdf5 module, the system will load a version of hdf5 that is compiled with the other compiler.

For a list of libraries and other software available for Darter see Darter Software.

HPSS (Darter and Nautilus): How do I share my files on HPSS with other members of my research team ?

To find out what groups you are a member of on HPSS use the groups command.

K:[/home/username]: groups
K:HPSS Group List:
  1045: nsf008       1928: nsf008q4s 

This shows the user is a member of groups nsf008 and nsf008q4s.

If other members of your team are listed in the same group you can simply log into HPSS using HSI and change the group and permissions to share the files or directories.

For example, if both you and other members are all in nsf008q4s you will simply need to do a chgrp.

K:[/home/yourusername]: chgrp nsf008q4s filename

Then you will need to do a chmod to make the file group readable.

K:[/home/yourusername]: chmod 750 filename

The other members of the group should then be able to access your files on HPSS.

If you are unsure of the HPSS group that correlates to the NICS project, or the other members of your group are not members of the same group you will need to submit a ticket to help@xsede.org and request they be added to the group on HPSS. Please reference this FAQ in your request.

Nautilus: What are the flags to prevent Java code from spawning excessive numbers of garbage collecting threads

When trying to run some java code (a statistical modeling code called maxent) for the Nimbios project on Nautilus, we were seeing that one instance of the code would spawn ~1200 threads. I thought initially that maxent was the culprit--until I ran a simple 'hello world' java program and it too spawned 1200 threads.

Turns out that the java virtual machine spawns garbage collecting threads in accordance with the number of processors that it detects. It also turns out that you can have a say in this process with the following flags:

  -XX:ParallelGCThreads=2
  -XX:+UseParallelGC

Adding these flags when running the maxent code brought the thread count down to around 16, which seems to be around the baseline of the number of startup threads needed by the jvm. I think any java code run on Nautilus should benefit from using these flags. I haven't done any specific tests on how the value of ParallelGCThreads affects performance. At least with the maxent code, I did notice faster startup times for the jvm.

Darter: Can I use MPI_Alltoall with MPI_IN_PLACE?

The MPI_IN_PLACE option causes communication on an intra-communicator to happen in place, rather than being copied into buffers. This reduces the required number of operations (it is only possible within a node, not between nodes).

In order to use this option with MPI_Alltoall, you need to disable Cray's optimization for that call:

export MPICH_COLL_OPT_OFF=mpi_alltoall
General: When should I use the PBS option '-V'?

The -V option tells the batch system to remember all of your environmental variables. For example, if I want to set OMP_NUM_THREADS to 4 and then submit the job, I need this flag so that OMP_NUM_THREADS is still set in the batch script. You can use it as a flag such as qsub -V ... or in your batch script like:

#PBS -V

While this can be convenient, it is best practice not to use -V. Why?

  • It makes jobs more self contained. If the script itself must set all the environment variables it needs, the script can be shared between people without confusion. Additionally, when debugging an issue, it is clear from looking at the script what variables are set.
  • This option, when used often, can create additional load for the scheduler, and in rare cases cause a crash (particularly if used in jobs which resubmit themselves)

If you do use -V it is not a problem, and may be recommendable for something like an interactive job, but it is best not to include it in every job script as a matter of habit.

General: Why can't I forward X11 connections with GSI on Mac OSX?

Apple handles the local "DISPLAY" variable necessary for X11 connections differently than Linux/Unix, therefore older versions of gsissh had trouble parsing the variable, yielding this error:

$ xlogo
/tmp/launch-xZ1piK/: unknown host. (no address associated with name)
X connection to localhost:14.0 broken (explicit kill or server shutdown).

Note that if the remote DISPLAY variable had been broken, it would have given the error "Can't open display".

Updating to the most recent release of gsissh should resolve this issue.

Darter: Can I use dynamic shared libraries on the Cray compute nodes?

Dynamic shared libraries are supported on Darter but it may have a performance impact. See Dynamic Linking on Darter.

Darter: How do I get information about my MPICH/Portals settings?

Cray's MPICH has a number of settings (changed using environment variables) that affect what algorithms are used, buffer space, etc. For a list of these variables and their default settings, you can set the following prior to calling aprun:

export MPICH_ENV_DISPLAY=1

This will print a single list, regardless of the number of MPI tasks. It is important to note that these may change based on the core count of the job. In particular, some settings, such as MPICH_UNEX_BUFFER_SIZE scale with the number of MPI tasks. In addition, a job run on a single node can use shared memory rather than Cray's portals for communication, therefore portal-related settings are undefined, and not displayed.

For more information about some of these settings, please see this workshop presentation or "man intro_mpi" contains a description of many of these settings. You can also find that information on Cray's documentation page (under "Introduction to MPI man pages").

HPSS (Darter and Nautilus): How do I verify the contents of an archive during creation?

HTAR provides the “-Hverify=option[,option...]” command line option, which causes HTAR to first create the archive file normally, and then to go back and check its work by performing a series of checks on the archive file. You choose the types of checks to be performed by specifying one or more comma-separated options. The options can be either individual items, or the keyword “all”, or a numeric level between 0, 1 or 2. Each numeric level includes all of the checks for lower-valued levels and adds additional checks. The verification options are:

all Enables all possible verification options except “paranoid”
info Reads and verifies the tar-format headers that precede each member file in the archive
crc Reads each member file and recalculates the Cyclic Redundancy Checksum (CRC), and verifies that it matches the value that is stored in the index file.
compare This option directs HTAR to compare each member file in the archive with the original local file.
paranoid This option is only meaningful if “-Hrmlocal” is specified, which causes HTAR to remove any local files or symbolic links that have been successfully copied to the archive file.

If “paranoid” is specified, then HTAR makes one last check before removing local files or symlinks to verify that:
a. For files, the modification time has not changed since the member file was copied into the archive
b. The object type has not changed, for example, if the original object was a file, it has not been deleted and recreated as a symlink or directory, etc.
It is also possible to specify a verification option such as “all”, or a numeric level, such as 0, 1 or 2, and then selectively disable one or more options. In practice, this is rarely, if ever, useful, but the following options are provided:
0 Same as “info”
1 Same as “info,crc”
2Same as “info,crc,compare”
nocompareDisables comparison of member files with their original local files
nocrc Disables CRC checking
noparanoidDisables checking of modification time and object type changes
htar -cvf TEST_VERIFY.TAR /lustre/medusa/$USER -Hcrc -Hverify=2
htar -Hcrc -tvf TEST_VERIFY.TAR
In the example above,
(1) the archive file is created (-c) with verification level 2, including CRC generation and checking. The verbose output option (-v) is used to cause HTAR to display information about each file that is added during the create phase, and then verified during the verification phase.
(2) the archive file is then listed (-t) using the "-Hcrc" option to cause HTAR to display the CRC value for each member file. />/>/>
HPSS (Darter and Nautilus): How do I retrieve a single file from HPSS?

Use hsi -ls to show the tar file in HPSS

>hsi ls -l file.tar
...
-rw-------   1 username     username          12800 Oct  2  2008 file.tar
Use "htar" to list the contents of the tar file:
> htar -tvf file.tar
HTAR: drwxr-xr-x  username/nicsstaff          0 2008-10-02 10:47  dir2/
HTAR: -rw-r--r--  username/nicsstaff       1492 2008-10-02 10:47  dir2/data.pbs
HTAR: -rw-r--r--  username/nicsstaff       1924 2008-10-02 10:47  dir2/mpi.pbs
Use "htar" to extract a single file (name must match what is listed by the above command):
> htar -xvf file.tar dir2/data.pbs
HTAR: -rw-r--r--  username/nicsstaff       1492 2008-10-02 10:47  dir2/data.pbs

HPSS (Darter and Nautilus): How do I retrieve a single directory from HPSS?

To retrieve a single directory from HPSS use the -R option. For example,

>hsi
>get -R dir1

HPSS (Darter and Nautilus): Has my access to HPSS been disabled?

Administrators may disable users for archiving too many small files at a time. Archiving too many small files introduces a lot of overhead on the system, and this archiving system is not designed to handle a lot of small files. Please use htar to tar together your files. Documentation can be found here.

We should contact you if this happens, but if you are concerned that your access to HPSS has been disabled, contact us at help@xsede.org. We can re-enable your HPSS access provided that it is used correctly.

One easy way to increase file size on HPSS is to use 'htar'. For the most part, this works the same as the regular tar. We would prefer that you perform htar on ~10GB chunks. After you confirm that you will be using htar from now on, we will proceed to provide you access to HPSS. Our system staff would like you to remove all of your archived small files from HPSS and archive them again using htar.

HPSS (Darter and Nautilus): Is the HPSS system able to be accessed by more than one process at a time?

There is nothing that should prevent you from running a script that creates multiple simultaneous connections to HPSS. The HPSS system administrator recommends that you should not create more than 1 or 2 connections at a time. Every time you introduce a new instance, the performance of the overall system is degraded.

HPSS (Darter and Nautilus): Can I run HSI from my workstation?

Because HSI is a third-party package, clients may be available for your system; however, NICS currently supports access to HPSS only through HSI clients on the HPC systems.

HPSS (Darter and Nautilus): Can I use HSI without entering my passcode each time?

If you log into Darter or Nautilus using your passcode from your OTP token, you can run HSI without entering your passcode each time. You can also run batch scripts that use HSI in the "hpss" queue. If you logged using GSI authentication you will be prompted for your passcode each time you use HSI.

HPSS (Darter and Nautilus): What is the best way to transfer a large number of small files to HPSS?

HPSS performance is greatly improved when the transfer size is between 8 GB and 256 GB. For that reason, users with large numbers of relatively small files should combine those files into one or a few 8 GB to 256 GB files and then transfer the larger files. The files can be combined with tar on the HPC system, or they can be created on the fly with a command similar to tar cvf some_dir -|hsi put - : somedir.tar. This command will tar all files in the some_dir subdirectory into a file named somedir.tar on HPSS. HPSS also supports the htar command.

HPSS (Darter and Nautilus): What is HSI?

The HSI utility allows automatic authentication and provides a user-friendly command line and interactive interface to HPSS.

HPSS (Darter and Nautilus): How do I access HPSS?

Users may access HPSS from any NICS high-performance computing (HPC) system with the Hierarchical Storage Interface (HSI) utility. An OTP token is required upon entry. Access to HPSS is enabled by typing the command hsi in your linux environment. To exit, simply type quit.

General: Why does SFTP exit with the error "Most likely the sftp-server is not in the path of the user on the server-side"?

Examples of this error are

File transfer server could not be started or it exited unexpectedly.
Exit value 0 was returned. Most likely the sftp-server is not in the path of the user on the server-side.orReceived message too long 1500476704

These errors are usually caused by commands in a shell run-control file (.cshrc, .profile, .bashrc, etc.) that produce output to the terminal. This output interferes with the communication between the SSH daemon and the SFTP-server subsystem. Examples of such commands might be date or echo. If you use the mail command to check for mail, it can also cause the error.

You can check to see if this is likely the problem. If you are unable to SFTP to a machine, try to connect via SSH. If you are able to SSH, and you receive output to your terminal other than the standard login banner (for example, “You have mail”), then you need to check your run-control files for commands that might be producing the output.

To solve this problem, you should place any commands that will produce output in a conditional statement that is executed only if the shell is interactive. For C shell users, a sample test to put in your .cshrc file would be

if ($?prompt)
  date
endif

The equivalent command for your .profile file (ksh/bash) would be

if [[ -n $PS1 ]]; then
 date
fi

General: How do I transfer data between the NICS and other UNIX-based systems?

The SSH-based SCP and SFTP utilities can be used to transfer files to and from NICS systems.

For larger files, the multistreaming transfer utility BBCP may be used (not available on Darter or Beacon). The BBCP utility is capable of breaking up your transfer into multiple simultaneously transferring streams, thereby transferring data faster than single-streaming utilities such as SCP and SFTP.

For more information on data transfers, see the remote data section of the data management page.

General: Using GridFTP/GSISSH, I get the error: "Untrusted self-signed certificate in chain with hash..." What is wrong?

Globus is unable to find the correct certificate to authenticate: most likely, you have a ~/.globus/certificates directory which is overriding the system defaults. If this directory exists, rename it and try again. Within the TeraGrid, these certificates are managed for you, so you should not need a certificates directory.

If you do need it for regular transfers to non-TeraGrid sites, you can generally get the certificate from the /etc/grid-security/certificates directory—the name of the file is given by the error message: Untrusted self-signed certificate in chain with hash .

cp /etc/grid-security/ ~/.globus/certificates

These certificates may be changed without notice, so you will periodically have to remove and replace expired certificates.

General: I received an error when I logged in, and now the system can't find commands. Why?

When the C shell (or one of its derivatives, such as tcsh) is starting up and encounters an error in one of its initialization files, it stops processing its initialization files. So, any aliases, environment settings, etc., that occur after the line that caused the error will not be processed. For help in troubleshooting the startup files, contact the User Assistance Center.

General: Why do I get the error "init.c(375):ERROR:50: Cannot open file "" for 'append'" when I log in?

This message usually means that you are at or near your home directory quota and that some part of the login process was trying to write there. This is often caused when the modules utility is loaded because it needs to write files to your home directory. You will need to reduce the usage in your home directory to log in successfully.

You may also notice that after getting this message, some commands cannot be found. This is due to the way C shell handles errors.

General: Why does my SSH connection fail? Why does SSH report that no authentication methods are available?

Your SSH client may not be set up to use the keyboard-interactive authentication method. You will need to use a client that supports the keyboard-interactive authentication method to connect to the NICS computers. Different SSH clients will have different ways of setting the preferred authentication methods, so you may need to contact your system administrator to get your client set correctly.

If your ssh client seems to be set up correctly, it may be that the resource you are trying to connect to is unavailable. You may want to check our announcements.

General: How do I activate and use my RSA SecurID?

For instructions on activating and using your RSA SecurID, see the connecting page.

General: Why doesn't the backspace key work as expected?

If backspace produces ^? instead of what you expect, use the following to fix it at the command prompt:

stty erase 

You can put this in your .profile (ksh) or .login (csh) file so upon logging it automatically will be set. This stty command should also be executed only for interactive shells, not batch.

Another tactic is to change the configuration of your SSH client. For instance, if you are using Putty SSH from a Windows system, the default backspace key is -?. This can be changed by going to the keyboard category and changing backspace to be -H.

General: How can I set my environment using .modulerc?

Some sites recommend using the .modulerc file to set your default modules. Do not do so on NICS systems : the .modulerc file is read every time module is called. This causes issues with some of the Cray software on Darter, the global default module list, and can lead to unexpected results (if you unload a module in the .modulerc file, it will be re-loaded next time you use the module command). Instead, set your default environment in your .bashrc file (or analogue). It is best to send the output (stderr in particular) to a log or /dev/null to prevent .bashrc from printing anything, which may cause errors.

General: How do I use the module utility?

For information on modules, see the modules page.

Darter: How do I find out what macros are predefined by the compiler?

For Darter, consult the “Cray online documentation” (http://docs.cray.com).

For C, search for the Cray “C and C++ Reference Manual” and for Fortran, consult the “Cray Fortran Compiler Commands and Directives Reference Manual”.

General: How do I change my default limits for stack size, core file size, etc.?

When you connect to a system, your environment is set up with default limits for stack size, core file size, number of open files, etc. The system sets both soft and hard limits for these parameters. The soft limit is the actual limit imposed by the system. For example, the soft stack limit is the maximum stack size the system will allow a process to use. Users cannot increase their hard limits. Hard Limits can be decreased, but its not recommended.

While it is rarely necessary to change shell limits on Darter or Nautilus, there may be times when limits must be changed to get your program to run properly. However, users occasionally need to increase the default limits. This is where the hard limit becomes important. The system allows users to increase their soft limits, but it uses the hard limit as the upper bound. So, users cannot increase their soft limit to a value greater than their hard limit.

The command to modify limits varies by shell. The C shell (csh) and its derivatives (such as tcsh) use the limit command to modify limits. The Bourne shell (sh) and its derivatives (such as ksh and bash) use the ulimit command. The syntax for these commands varies slightly and is shown below. More detailed information can be found in the man page for the shell you are using.

Limit commands

Operationsh/ksh/bash commandcsh/tcsh command
View soft limitsulimit -S -alimit
View hard limitsulimit -H -alimit -h
Set stack size to 128 MB    ulimit -S -s 131072limit stacksize 128m

 

With any shell, you can always reset both soft and hard limits to their default values by logging out and back in.

On the Cray XT, both RLIMIT_CORE and RLIMIT_CPU limits are always forwarded to the compute nodes. If you wish to set any other user resource limits, you must set APRUN_XFER_LIMITS environment variable to 1 along the new limits within the job script before the aprun call:

export APRUN_XFER_LIMITS=1
 #or
setenv APRUN_XFER_LIMITS 1

 

Default user resource limits

The default user resource limits in the compute nodes are:

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) 512
process              unlimited
nofiles              1024
vmemory(kbytes)      unlimited
locks                unlimited

 

 

General: How do I remove the Control-M characters in my text file?

Different operating systems use different methods of indicating the end of a line in a text file. UNIX uses only a new line, whereas Windows uses a carriage return and a line feed. If you have transferred text files from your PC to a UNIX machine, you may need to remove the carriage-return characters. (The line-feed character in Windows is the same as the new-line character under UNIX, so it doesn’t need to be changed.) Some systems provide a command dos2unix that can perform the translation. However, it can also be done with a simple perl command. In the following example, win.txt is the file transferred from your PC, and unix.txt is the new file in UNIX text format:

perl -p -e 's/\r$//' win.txt unix.txt
General: I accidentally deleted some files. Can I get them back?

It depends on where the files were and how recently they were created. Scratch directories (/lustre/medusa/$USER) are not backed up at all, so any files deleted from those directories cannot be recovered. Home directories are different. Please contact NICS support if you have inadvertently deleted a file in your home area.

General: Why is vi unresponsive?

If vi appears to hang, but other commands (ls, cat, etc) work normally, try renaming the the .viminfo file:

mv ~/.viminfo ~/.viminfo.bak

This file saves the state of vim, but can sometimes appear to get corrupted due to incompatibilities between different versions of vim

General: How do I change my login shell?

Users may change their default shell in the NICS User Portal . To log into the portal, you need to use your RSA SecurID.

General: How do I list all projects for which I am a member?

You can use the showusage utility to view all projects for which you are a member.

General: How do I view my allocation and usage?

Users can view their allocation and usage on allocated systems using the showusage utility. showusage returns year-to-date usage and allocation for the calling user’s allocated project(s). Usage is calculated from the first day of the fiscal year through midnight of the day before the request.

You can also check charges for individual jobs using glsjob [-u ] [-p ]. The sum of jobs within a project should equal the showusage within rounding error.

Darter: Where can I find more information?

If you haven't already, please check out the other Darter resource pages at Darter resources on compiling, file systems, batch jobs, open issues, parallel I/O tips, CrayPAT overview, and other reports and presentations related to Darter.

Another good resource (without Darter-specific information) is the documentation that Cray provides at CrayDocs.

Darter: Can a user login directly to a compute node?

No, users cannot login directly to a compute node, but by submitting an interactive batch job, users can get access to an aprun node, from where they can execute aprun commands to run on a compute node. For more information on how to run interactive batch jobs, please view the information found at Interactive Batch Jobs

Darter: How do I get performance counter data for my program?

Use the following process:

  1. Use module load xt-craypat.
  2. Compile code.
    • If Fortran90 with modules, compile with -Mprof=func.
  3. Run pat_build -u -g mpi a.out.
  4. Run a.out+pat as you would a.out, BUT make sure PAT_RT_HWPC is set to 1 in batch script.
    • If you want just a regular profile, dont set PAT_RT_HWPC.
  5. Run pat_report /*.xf, where is automatically generated by instrumented code.

The resulting output will have performance counter results for the entire run and for each subroutine (however, inlining and C++ optimizations may prevent some subroutines from being profiled).

General: Is there any other faster way to list my files in my Lustre scratch area?

Yes! A basic ls only has to contact the meta-data server (MDS), not the object-storage servers (OSS), where the bottleneck often occurs. In general, ls is aliased to give additional information, which requires the OSS's. You can bypass this by using /bin/ls. When there are many files in the same directory, and you don't need the output to be sorted, /bin/ls -U is even faster.

You can also use the Lustre utility lfs to look for files. For example, the syntax to emulate a regular ls in any directory is

lfs find  -D 0  *

For convenience, you may want to add an alias definition to your login config files. For example Bash users can add to their ~/.bashrc the following line to create an alias called lls.

alias lls="/bin/ls -U"
General: How do I change the striping in Lustre?

A user can change the striping settings for a file or directory in Lustre by using the lfs command. The usage for the lfs command is

lfs setstripe  -s  -i  -c 

where

size - the number of bytes on each OST (0 indicating default of 1 MB) specified with k, m, or g to indicate units of KB, MB, or GB, respectively.
index - the OST index of first stripe (-1 indicating default)
count - the number of OSTs to stripe over (0 indicating default of 4 and -1 indicating all OSTs [limit of 160]).

NOTE: If you change the settings for existing files, the file will get the new settings only if it is recreated.

To change the settings for an existing directory, you will need to rename the directory, create a new directory with the proper settings, and then copy (not move) the files to the new directory to inherit the new settings.

If your application is the type in which each separate process writes to its own file, then we believe that the best option is to not use striping. This can be set by using this command:

> lfs setstripe  -c 1

Then we see that

> lfs find -v testdirectory
OBDS:
0: ost1_UUID ACTIVE
--snip--
testdirectory/
default stripe_count: 1 stripe_size: 0 stripe_offset: -1

This shows we have a stripe count of 1 (no striping), the stripe size is set to 0 (which means use the default), and the stripe offset is set to -1 (which means to round-robin the files across the OSTs).

NOTE: You should always use -1 for stripe_offset.

The stripe count and stripe size are something you can tweak for performance. If your application writes very large files, then we believe that the best option is to stripe across all or a subset of the OSTs on the file system. Striping across all OSTs can be set by using this command:

> lfs setstripe  -c -1

Caution: Not striping large files may cause a write error if the file's size is larger than the space on a single OST. Each OST has a finite size which is smaller than the total Lustre area of all OSTs.

General: What is the default striping for my files?

A file's striping is inherited from its parent directory. The lfs getstripe command can be used to determine the striping for a file, or the default striping for a directory. Note that each file and directory can have its own striping pattern. This means that a user can set striping patterns for his own files and/or directories. The default stripe width for a user may be 1 or 4, you can determine by running lfs getstripe /lustre/medusa/$USER.

This command will also give you information on the striping information for a directory/file.

lfs find -v 
General: What is file striping?

The Lustre file system is made up of an underlying set of file systems called Object Storage Targets (OST's), which are essentially a set of parallel IO servers. A file is said to be striped when read and write operations access multiple OST's concurrently. File striping is a way to increase IO performance since writing or reading from multiple OST's simultaneously increases the available IO bandwidth.

Striping will likely have little impact for the following codes:
  • Serial IO where a single processor or node performs all of the IO for an application.
  • Multiple nodes perform IO, access files at different times.
  • Multiple nodes perform IO simultaneously to different files that are small (each

Lustre allows users to set file striping at the file or directory level. As mentioned above, striping will not improve IO performance for all files. For example, in a parallel application, if each processor writes its own file then file striping will not provide any benefit. Each file will already be placed in its own OST and the application will be using OST's concurrently. File striping, in this case, could lead to a performance decrease due to contention between the processors as they try to write (or read) pieces of their files spread across multiple OST's.

For MPI applications with parallel IO, multiple processors accessing multiple OST's can provide large IO bandwidths. Using all the available OST's will provide maximum performance.

There are a few disadvantages to striping. Interactive commands such as ls -l will be slower for striped files. Additionally, striped files are more likely to suffer from data loss from a hardware failure since the the file is spread across multiple OST's.

Please see also: Scratch Space (Lustre) and I/O and Lustre Tips.

General: Why do I get the error message: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

This message occurs always when running C-shell style job scripts. It is not really an error message, it is a friendly reminder that this is a remote batch job which can not be acted upon (such as ^C or ^Z for suspension).

Darter: Why am I getting "could not find *.so"? Or: can I use dynamic libraries?

These files are dynamic libraries, which exist on an NFS file system, which is not visible to the compute nodes. Thus, when the dynamic linker goes to add the library, it can not find it. In the past, dynamic libraries were not supported on the compute nodes. You may be able to use dynamic libraries if you have the files on Lustre, but it is recommended that you use static executables regardless. To check if an executable has dynamic linking, use ldd executable

General: Why does nothing happen when I submit my job?

If you submit your job, it only executes for an instant, gets terminated without any error messages and the output files are empty, it may be that you have a customized login script that changes your shell interpreter at login time by explicitly executing another shell. For example, sometimes users whose default shell is Bash will change it to the C-Shell by doing the following in their .bashrc file:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
   . /etc/bashrc
fi

# User specific aliases and functions

exec csh

If you do want to change your default shell, use the NICS User Portal . To log into the portal, you need to use your RSA SecurID.

General: What is Optimistic Memory Allocation? How does it affect me?

Linux uses "virtual memory" for each process, which creates the illusion of a contiguous memory block when a process starts, even if physical memory is fragmented, or residing on a hard disk. When a process calls malloc, it is given a pointer to an address in this virtual memory. When the virtual memory is first used, it is then mapped to physical memory.

Optimistic Memory Allocation means that Linux is willing to allocate more virtual memory than there is physical memory, based on the assumption that a program may not need to use all the memory it asks for. When a node has used all its physical memory, and there is another call to malloc, instead of giving a null pointer, the program will receive a seemingly good pointer to virtual memory. When the memory is used, the kernel will try to map the virtual memory to physical memory, and enter an "Out of Memory" condition. To recover, the kernel will kill one (or more) process. On Darter, this will almost certainly be your executable. You should see "OOM killer terminated this process."

For more information, see O'Reilly's writeup or man malloc under "Bugs".

Darter: I get the error message "OOM killer terminated this process". What is OOM?

This error message indicates that the node is running Out Of Memory. This could be the result of a bug in the code, or memory requirements for the given input. Note that due to optimistic memory allocation, you probably will not get a null pointer, even if you are out of memory. The program should be killed at the point the memory is used.

One quick solution might be to run with only four MPI processes per socket so each process gets a larger share of the memory on the node:

aprun -n  -S 4 ./a.out

Where is the total number of MPI processes. The above solution uses 4 out of 8 cores on each socket, so naively, each MPI task should get twice as much memory. If this is not enough memory, it is possible to reduce the number of tasks per core further (-S 2). The best solution may be to identify the memory requirements in the code and make any necessary changes there, in terms of memory parameters, domain decomposition, etc.

General: Why am I not getting the basic error messages I expect?

Sometimes some of the basic error messages (such as reading past the EOF) are suppressed because a shell interpreter is not specified in the PBS script. Make sure that the first line of the PBS script contains a shell interpreter: #!/bin/bash, for example.

Darter: Where should I run multiple serial executables?

If you need to run many instances of a serial code (as in a typical parameter sweep study for instance), we highly recommend using Eden. Eden is a simple script-based master-worker framework for running multiple serial jobs within a single PBS job.

Darter: How do I find out what nodes my batch job is using?

There are a couple of easy ways to find out what nodes are assigned to your batch job. The easiest is to use the checkjob command. Part of the output will return a list of nodes like the following:

Allocated Nodes:      

[84:1][85:1][86:1][87:1][88:1][89:1][90:1][91:1]

The method returns the a logical numbering of nodes. A physical numbering of the nodes as well as the pid layout can be obtained by setting the PMI_DEBUG variable to 1.

> setenv PMI_DEBUG 1
> aprun -n4 ./a.out
Detected aprun CNOS interface
MPI rank order: Using default aprun rank ordering
rank 0 is on nid00015 pid 76; originally was on nid00015 pid 76
rank 1 is on nid00015 pid 77; originally was on nid00015 pid 77
rank 2 is on nid00016 pid 69; originally was on nid00016 pid 69
rank 3 is on nid00016 pid 70; originally was on nid00016 pid 70

From within your code, you can reference PMI_CNOS_Get_nid to get the physical number for each process.

#include 
#include "mpi.h"
int main (int argc, char *argv[])
{
  int rank,nproc,nid;
  int i;
  MPI_Status status;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);
  PMI_CNOS_Get_nid(rank, &nid);
  printf("  Rank: %10d  NID: %10d  Total: %10d \n",rank,nid,nproc);
  MPI_Finalize();
  return 0;
}

The output with four cores would be as follows:

aprun -n4 ./hello-mpi.x
  Rank:          1  NID:         15  Total:          4
  Rank:          0  NID:         15  Total:          4
  Rank:          2  NID:         16  Total:          4
  Rank:          3  NID:         16  Total:          4
Application 13390 resources: utime 0, stime 0

aprun can be used to run Unix commands on the compute nodes that display the node names as shown below.

> aprun -n4 /bin/hostname
nid00015
nid00015
nid00016
nid00016
>

Or

> aprun -n4 /bin/cat /proc/cray_xt/nid

15
15
16
16
>
General: Why do I get the error: "/usr/include/c++/4.1.2/backward/backward_warning.h:32:2"?

#include is the Standard C++ way to include header files. The 'iostream' is an identifier that maps to the file iostream.h. In older C++ versions you had to specify the file name of the header file, hence #include . Older compilers may not recognize the modern method but newer compilers will accept both methods even though the old method is obsolete.

fstream.h became fstream vector.h became vector string.h became string, etc.

So although the library was deprecated for several years, many C++ users still use it in new code instead of using the newer, standard compliant library. What are the differences between the two? First, the .h notation of the standard header files was deprecated more than 5 years ago. Using deprecated features in new code is never a good idea. In terms of functionality, contains a set of templatized I/O classes which support both narrow and wide characters. By contrast, classes are confined to char exclusively. Third, the C++ standard specification of iostream's interface was changed in many subtle aspects. Consequently, the interfaces and implementation of differ from components are declared in the global scope. Because of these substantial differences, you cannot mix the two libraries in one program. As a rule, use in a new code and stick to in legacy code that is incompatible with the new library.
Darter: Why do I see the message: SEEK_SET is #defined but must not be for the C++ binding of MPI?

The following error message:

#error "SEEK_SET is #defined but must not be for the C++ binding of MPI" 

Is the result of a name conflict between stdio.h and the MPI C++ binding. Users should place the mpi include before the stdio.h and iostream includes.

Users may also see the following error messages as a result of including stdio or iostream before mpi:

#error "SEEK_CUR is #defined but must not be for the C++ binding of MPI" 
#error "SEEK_END is #defined but must not be for the C++ binding of MPI"

When profiling with TAU, you may get this message regardless of the order. In this case, you can add -DMPICH_IGNORE_CXX_SEEK to the compile line to remove the error (this fix should work generally).

Darter: How do I link a C++ object with ftn?

Under the 1.5 programming environments used under Catamount, ftn linked in libC.a. Under the 2. programming environments used under CNL, ftn does not link in libC.a. Fortran codes that link in libraries that contain C++ objects will need to add -lC to the link line.

Darter: Why does my compile fail with "/usr/bin/ld: can not find -lsma"?

This error message occurs when using the mpi* compiler wrappers (mpicc, mpif90, etc.). These are intermediate wrappers that should not be called directly by users. Instead, users should compile with either ftn, cc, or CC. The ftn, cc, and CC scripts will do the necessary setup and then automatically call the appropriate intermediate scripts and ultimately the compilers.