|
Running Jobs on Kraken |
|
General Information
When you log into Kraken, you are placed on one of the several login nodes. Login nodes should only be used for basic tasks such as file editing, code compilation, data backup, and job submission.
The login nodes should not be used to run production jobs. Production work should be performed on the system's compute resources. The serial jobs (pre- and post-processing, etc.) may be run on the compute nodes as long as they are statically linked. For one or more single-processor jobs please refer to the Job Execution section for more information. Access to compute resources is managed by the Portable Batch System (PBS). Job scheduling is handled by Moab, which interacts with PBS and the XT system software.
This page provides information for getting started with the batch facilities of PBS with Moab as well as basic job execution. Sometimes you may want to chain your submissions to complete a full simulation without the need to resubmit, you can read about this here (Please read it carefully).
Notice: Compute nodes can see only the Lustre scratch directories.
Batch scripts are run on service nodes that have access to the home, project and software directories. Executables launched with the aprun command do not have access to these directories; they have access only to the Lustre scratch directories. In your batch script, make sure to cd to the Lustre scratch directory before the aprun command is issued. If this is not done, you may see an error like:
aprun: [NID 94]Exec /lustre/scratch/userid/a.out failed: chdir /nics/b/home/userid No such file or directory
For the program launched by aprun, all input and output files must reside in the Lustre scratch directories.Batch Scripts
Batch scripts can be used to run a set of commands on a system's compute partition. Batch scripts allow users to run non-interactive batch jobs, which are useful for submitting a group of commands, allowing them to run through the queue, and then viewing the results. However It is sometimes useful to run a job interactively (primarily for debugging purposes). Please refer to the Interactive Batch Jobs section for more infomation on how to run batch jobs interactively.
All non-interactive jobs must be submitted on Kraken using job scripts that are submitted via the qsub command. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. The batch script is submitted to the batch manager, PBS, where it is parsed. Based on the parsed data, PBS places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources.
All job scripts start with an Interpreter line, followed by a series of #PBS declarations that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable.
Batch scripts are divided into the following three sections:
-
Shell interpreter (one line)
- The first line of a script can be used to specify the script's interpreter.
- This line is optional.
- If not used, the submitter's default shell will be used.
-
The line uses the syntax #!/path/to/shell, where the path to the shell may be
- /usr/bin/csh
- /usr/bin/ksh
- /bin/sh
-
PBS submission options
- The PBS submission options are preceded by #PBS, making them appear as comments to a shell.
- PBS will look for #PBS options in a batch script from the script's first line through the first non-comment line. A comment line begins with #.
- #PBS options entered after the first non-comment line will not be read by PBS.
-
Shell commands
- The shell commands follow the last #PBS option and represent the executable content of the batch job.
- If any #PBS lines follow executable statements, they will be treated as comments only. The exception to this rule is shell specification on the first line of the script.
- The execution section of a script will be interpreted by a shell and can contain multiple lines of executables, shell commands, and comments.
- During normal execution, the batch script will end and exit the queue after the last line of the script.
The following example shows a typical job script that includes the minimal requirements to submit a parallel job that executes ./a.out on 192 cores, charged to the fictitious account UT-NTNL0121 with a wall clock limit of one hour and 35 minutes:
#!/bin/bash #PBS -A UT-NTNL0121 #PBS -l size=192,walltime=01:35:00 cd $PBS_O_WORKDIR aprun -n 192 ./a.out
Jobs should be submitted from within a directory in the Lustre file system. It is best to always execute cd $PBS_O_WORKDIR as the first command. Please refer to the PBS Environment Variables section for further details.
On Kraken you must request size=cores to be a multiple of 12 - there are 12 cores per node, and it is not possible to allocate part of a node. If you want to run on 8 cores (-n 8), for example, you still need to request 12 cores (size=12), otherwise you will receive the following error:
Notice: Your job was NOT submitted Core requests on Kraken must be a multiple of twelve. You have requested an invalid number of cores ( 8 ). Please resubmit the job requesting an appropriate number of cores.
There is online documentation that describes the many PBS options that can be used for more complex job scripts.
Unless otherwise specified your default shell interpreter will be used to execute shell commands in job scripts. In some cases it may even try to guess what interpreter to use. If the job script should use a different interpreter, then specify the correct interpreter using:
#PBS -S /bin/XXXX
Altering Batch Jobs
This section shows how to remove or alter batch jobs.
Remove Batch Job from the Queue
Jobs in the queue in any state can be stopped and removed from the queue using the command qdel.
For example, to remove a job with a PBS ID of 1234, use the following command:
> qdel 1234
More details on the qdel utility can be found on the qdel man page.
Hold Queued Job
Jobs in the queue in a non-running state may be placed on hold using the qhold command. Jobs placed on hold will not be removed from the queue, but they will not be eligible for execution.
For example, to move a currently queued job with a PBS ID of 1234 to a hold state, use the following command:
> qhold 1234
More details on the qhold utility can be found on the qhold man page.
Release Held Job
Once on hold the job will not be eligible to run until it is released to return to a queued state. The qrls command can be used to remove a job from the held state.
For example, to release job 1234 from a held state, use the following command:
> qrls 1234
More details on the qrls utility can be found on the qrls man page.
Modify Job Details
Non-running (or on hold) only jobs can be modified with the qalter PBS command. For example, this command can be used to:
Modify the job´s name,
$ qalter -N <newname> <jobid>
Modify the number of requested cores,
$ qalter -l size=<NumCores> <jobid>
Modify the job´s wall time
$ qalter -l walltime=<hh:mm:ss> <jobid>
Set job´s dependencies
$ qalter -W depend=type:argument <jobid>
Remove a job´s dependency (omit :argument):
$ qalter -W depend=type <jobid>
Notes:
- Use qstat -f <jobid> to gather all the information about a job, including job dependencies.
- Use qstat -a <jobid> to verify the changes afterward.
- Users cannot specify a new walltime for their job that exceeds the maximum walltime of the queue where your job is.
- If you need to modify a running job, please contact us. Certain alterations can only be performed by NICS operators.
Interactive Batch Jobs
Interactive batch jobs give users interactive access to compute resources. A common use for interactive batch jobs is debugging. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.
Users are not allowed to run interactive jobs on compute resources from the login nodes. Running a batch-interactive PBS job is done by using the -I option with qsub.
Interactive Batch Example
For interactive batch jobs, PBS options are passed through qsub on the command line.
qsub -I -A UT-NTNL0121 -X -l size=12,walltime=1:00:00
|
Option |
Description |
| -I | Start an interactive session |
| -A | Charge to the “UT-NTNL0121” project |
| -X | Enables X11 forwarding which is necessary for interactive GUIs. Note that you must have X11 forwarding enabled when you log in to Kraken |
| -l size=12,walltime=1:00:00 | Request 12 compute cores for one hour |
After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the standard input and standard output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command will end the interactive job. From here commands may be executed directly instead of through a batch script.
Using Interactive Batch Jobs to Debug
A common use of interactive batch jobs is debugging (see the Debugging page). The tips below may be useful while interactively debugging the code through PBS. To help a job run quickly rather than sit in the queue it is important to choose the job size appropriately. You can use the showbf command (for “show back fill) to see immediately available resources that would allow your job to be backfilled (and thus started) by the scheduler. There are 30 (unavailable) service nodes that will need to be subtracted from the available nodes returned. For example, the snapshot below shows that there are 61 (91 minus 30) nodes available, so a job requesting four compute nodes would run immediately.
% showbf Partition Tasks Nodes StartOffset Duration StartDate --------- ----- ----- ------------ ------------ -------------- ALL 3488 91 00:00:00 6:17:26:42 14:33:18_04/01 ALL 3000 30 00:00:00 INFINITY 14:33:18_04/01 xt5 3488 91 00:00:00 6:17:26:42 14:33:18_04/01 xt5 3000 30 00:00:00 INFINITY 14:33:18_04/01
The following command would then take advantage of this window for an interactive session:
qsub -I -A UT-NTNL0121 -X -l size=12,walltime=1:00:00
See showbf -help for additional options. For more information, see the online user guide for the Moab Workload Manager.
Common PBS Options
This section gives a quick overview of common PBS options.
Necessary PBS options
|
Option |
Use |
Description |
| A | #PBS -A <account> | Causes the job time to be charged to <account>. The account string UT-NTNL0121 is typically composed of three letters followed by three digits and optionally followed by a subproject identifier. The utility showusage can be used to list your valid assigned project ID(s). This is the only option required by all jobs. |
| l | #PBS -l size=<cores> | Maximum number of compute cores. Must request an entire node (multiples of 12). |
| #PBS -l walltime=<time> | Maximum wall-clock time. <time> is in the format HH:MM:SS. Default is 1 hour. |
Other PBS Options
|
Option |
Use |
Description |
| o | #PBS -o <name> | Writes standard output to <name> instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier. |
| e | #PBS -e <name> | Writes standard error to <name> instead of <job script>.e$PBS_JOBID. |
| j | #PBS -j {oe,eo} | Combines standard output and standard error into the standard error file (eo) or the standard out file (oe). |
| m | #PBS -m a | Sends email to the submitter when the job aborts. |
| #PBS -m b | Sends email to the submitter when the job begins. | |
| #PBS -m e | Sends email to the submitter when the job ends. | |
| M | #PBS -M <address> | Specifies email address to use for -m options. |
| N | #PBS -N <name> | Sets the job name to <name> instead of the name of the job script. |
| S | #PBS -S <shell> | Sets the shell to interpret the job script. |
| q | #PBS -q <queue> | Directs the job to the specified queue.This option is not required to run in the general production queue. |
Note: Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_ needed_envars>. You can also include module load statements in the job script.
Example:
#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT
Further details and other PBS options may be found using the man qsub command.
PBS Environment Variables
This section gives a quick overview of useful environment variable sets within PBS jobs.
-
PBS_O_WORKDIR
- PBS sets the environment variable PBS_O_WORKDIR to the directory from which the batch job was submitted.
- By default, a job starts in your home directory. Often, you would want to do cd $PBS_O_WORKDIR to move back to the directory you were in. The current working directory when you start aprun must be on Lustre Scratch Space.
Include the following command in your script if you want it to start in the submission directory:
cd $PBS_O_WORKDIR
-
PBS_JOBID
- PBS sets the environment variable PBS_JOBID to the job's ID.
- A common use for PBS_JOBID is to append the job's ID to the standard output and error file(s).
Include the following command in your script to append the job's ID to the standard output and error file(s)
%PBS -o scriptname.o$PBS_JOBID
-
PBS_NNODES
- PBS sets the environment variable PBS_NNODES to the number of cores requested (not nodes). Given that Kraken has 12 cores per node, the number of nodes would be given by $PBS_NNODES/12.
- For example, a standard MPI program is generally started with aprun -n $PBS_NNODES ./a.out. See the Job Execution section for more details.
Monitoring Job Status
This page lists some ways to monitor jobs in the batch queue. PBS and Moab provide multiple tools to view the queues, system, and job status. Below are the most common and useful of these tools.
qstat
Use qstat -a to check the status of submitted jobs.
> qstat -a
nid00004: NICS
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS Tasks Memory Time S Time
------- -------- ------ -------- ------ ---- ----- ------ ----- - -----
29668 user1 batch job2 21909 1 256 -- 08:00 R 02:28
29894 user2 batch run128 -- 1 128 -- 02:30 Q --
29895 user3 batch STDIN 15921 1 1 -- 01:00 R 00:10
29896 user2 batch jobL 21988 1 2048 -- 01:00 R 00:09
29897 user4 batch STDIN 22367 1 2 -- 00:30 R 00:06
29898 user1 batch job1 25188 1 1 -- 01:10 C 00:00
>
The qstat output shows the following:
| Job ID | The first column gives the PBS-assigned job ID. |
| Username | The second column gives the submitting user's login name. |
| Queue | The third column gives the queue into which the job has been submitted. |
| Jobname | The fourth column gives the PBS job name. This is specified by the PBS -n option in the PBS batch script. Or, if the -n option is not used, PBS will use the name of the batch script. |
| SessID | The fifth column gives the associated session ID. |
| NDS | The sixth column gives the PBS node count. Not accurate; will be one. |
| Tasks | The seventh column gives the number of cores requested by the job's -size option. This number may be different for Nautilus. Please see the Nautilus job accounting page for more information. |
| Req’d Memory | The eighth column gives the job's requested memory. This number may be different for Nautilus. Please see the Nautilus job accounting page for more information. |
| Req’d Time | The ninth column gives the job's requested wall time. |
| S | The tenth column gives the job's current status. See the status listings below. |
| Elap Time | The eleventh column gives the job's time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank. |
The job's current status is reported by the qstat command. The possible values are listed in the table below.
|
Status value |
Meaning |
| E | Exiting after having run |
| H | Held |
| Q | Queued |
| R | Running |
| S | Suspended |
| T | Being moved to new location |
| W | Waiting for its execution time |
| C | Recently completed (within the last 5 minutes) |
showq
The Moab showq utility gives a different view of jobs in the queue. The utility will show jobs in the following states:
| Active | These jobs are currently running. |
| Eligible | These jobs are currently queued awaiting resources. A user is allowed five jobs in the eligible state. |
| Blocked | These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold and the owning user currently having five jobs in the eligible state. |
checkjob
The Moab checkjob utility can be used to view details of a job in the queue. For example, if job 736 is currently in a blocked state, the following can be used to view the reason:
> checkjob 736
The return may contain a line similar to the following:
BLOCK MSG: job 736 violates idle HARD MAXIJOB limit of 5 for user <your_username> partition ALL (Req: 1 InUse: 5)
This line indicates the job is in the blocked state because the owning user has reached the limit of five jobs currently in the eligible state.
showstart
The Moab showstart utility gives an estimate of when the job will start.
> showstart 100315 job 100315 requires 16384 procs for 00:40:00 Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12 Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12
The start time may change dramatically as new jobs with higher priority are submitted, so you need to periodically rerun the command.
showbf
The Moab showbf utility gives the current backfill. This can help you create a job which can be backfilled immediately. As such, it is primarily useful for short jobs.
xtnodestat
The utility xtnodestat can be used to see which jobs are currently running and which cabinets, nodes, and processors they are running on.
Queues
Queues are used by the batch scheduler to aid in the organization of jobs. This section lists the available queues on Kraken. Nautilus queue information can be found at the Nautilus queues page. An individual user may have up to 5 jobs eligible to start at any one time (regardless of how many jobs may already be running), while an account may have a total of 10 jobs eligible to run across all the users charging against that account. Jobs in excess of these limits will not be considered for execution. Note that these limits apply to the number of jobs eligible to run, not the number of jobs running.
For example, if you submit 12 jobs, 5 would be eligible, and 7 would be blocked (with an "Idle" state). If three of the jobs run, some blocked jobs will be released so that there are still 5 eligible jobs, and 4 blocked jobs. This continues if all jobs are run. This is done to make it easier to schedule the jobs (there are fewer jobs to consider), and to prevent a single user from dominating the system with many small jobs.
Job priority on Kraken is based on the number of cores and wall clock time requested. Jobs with large core counts (over 32K processors) intentionally get the highest priority on Kraken. Many jobs with small core counts may be run on other XSEDE systems, therefore their priority is lower on Kraken. Jobs with smaller core counts do run effectively on Kraken as backfill. While the scheduler is collecting nodes for larger jobs, those with short wall clock limits and small core counts may use those nodes temporarily without delaying the start time of the larger job. For a better explanation of backfilling jobs and NICS scheduling policies point your browser to Kraken Scheduling Policies.
Capability and Dedicated jobs on Kraken
Users are encouraged to submit capability or dedicted jobs on Kraken at any time. However, capability and dedicated jobs are only executed at specific times at the discretion of NICS. Capability jobs are generally executed after preventative maintenance periods or by demand, if preventative maintenance is not performed. Users who plan on running or have questions concerning capability jobs are encouraged to contact help@xsede.org or their NICS point of contact.
Kraken Capability Jobs Discount Policy
The capability job charging policy on Kraken has been slightly refined to the following: Capability jobs (over 49,536 and up to 98,352 cores) will be charged at the flat rate of 49,536 cores. Dedicated jobs (anything over 98,352 cores) will be charged at the flat rate of 56,498 cores. Disclaimer: NICS retains the right to alter or discontinue any discount at any time. Users are encouraged to submit proposals for computational time without respect to any particular discount.
Kraken Queues
Jobs on Kraken are sorted into queues based on size and walltime.
|
Kraken Queue |
Min Size |
Max Size |
Max Wall Clock Limit |
| small | 0 | 512 | 24:00:00 |
| medium | 513 | 8192 | 24:00:00 |
| large | 8193 | 49536 | 24:00:00 |
| capability | 49537 | 98352 | 48:00:00 |
| dedicated | 98353 | 112896 | 48:00:00 |
| hpss | 24:00:00 |
* Requests for jobs on Kraken must be multiples of 12. For example, the largest "small" job on kraken would request 504 cores.
HPSS Queue
The HPSS queue can be used to transfer files or directories to hpss using a batch file. Jobs running in this queue are not allocated compute nodes, so the aprun command will fail and should not be added to batch files submitted to this queue on Kraken.
More information can be found on the HPSS queue page.
Job Execution
Once access to compute resources has been allocated through the batch system, users have the ability to execute jobs on the allocated resources. This section gives examples of job execution and provides common tips.
The PBS script is executed on the aprun node (or login node for interactive jobs). If executables are called directly (eg ./a.out), they will be run serially on the service node. This may be useful for records keeping, staging data, etc. Please run any memory- or computationally-intensive programs using aprun, otherwise it bogs down the node, and may cause system problems. You may run non-MPI programs on a compute node using aprun, see the Single-Processor (Serial) jobs and Multiple Single-Processor Programs sections below.
To launch parallel jobs on one or more compute nodes, use the aprun command. System specifications for Kraken should be kept in mind when running a job using aprun. A Kraken XT5 node consists of two sockets, each with 6 cores, so there are 12 cores per node. The PBS size option requests compute cores. This is not necessarily the number of cores that will be used, but rather the number of cores that will be made unavailable (idle cores are still inaccessible to other users). The easiest way to determine this number may be to calculate the number of nodes that will be occupied (even partially) and multiplying that number by 12 cores/node.
The following options are commonly used with aprun:
|
Commonly used options for aprun |
|
| -n | Total number of MPI processes (default: 1) |
| -N | Number of MPI processes per node (XT5: 12) |
| -S | Number of MPI processes per socket (XT5: 1-6) |
| -d | Specifies number of cores per MPI process (for use with OpenMP, XT5: 1-12) |
The best way to understand the effects of these options is to try them yourself, please see our tutorial on the subject.
MPI examples
aprun -n $PBS_NNODES ./a.out
This uses all cores, one MPI process on each core. The environmental variable PBS_NNODES is the number of cores requested at the top of the PBS script. In most cases, it is unnecessary to do anything beyond this.
aprun -n 15 ./a.out
If for some reason you want to use a number of cores that is not a multiple of 12, that is valid. Round up to the next multiple of 12 for the resource request, the extra cores will remain idle. This example would require #PBS -l size=24.
aprun -n 8 -N 4 ./a.out
This will cause the XT5 to emulate the 4 cores/node layout of the XT4: there will be four MPI processes per node, all on one socket. This example would require you to request 24 cores on the XT5 for the cores that are left idle.
aprun -n 8 -S 2 ./a.out
On the XT5, this is similar to the previous example, running 4 MPI processes per node, however, now they are running two on each socket. This ensures that both sockets are used, and that the memory is evenly distributed among the sockets. This ensures even distribution of L3 cache, and memory (a process can access memory on the other socket, but not as quickly as its own memory).
MPI/OpenMP
Kraken supports threaded programming within a node. The aprun -d flag is used to specify the number of cores per MPI process, so with OpenMP, aprun -d $OMP_NUM_THREADS uses one thread per core. When using every core, this would require at least n*d cores to be requested, the following examples assume that three nodes have been requested – #PBS -l size=36.
export OMP_NUM_THREADS=2 aprun -n12 -N4 -S2 -d2 ./a.out
Here, each MPI process has two OpenMP threads, filling three whole nodes. For some codes, two OpenMP threads per MPI process may be optimal. If the reason for using OpenMP is instead to increase the available memory, you may want to use 6 or even 12 threads per MPI process instead, though there is some performance penalty for using OpenMP across sockets in Kraken's current configuration (using HyperTransport 1).
export OMP_NUM_THREADS=5 aprun -n6 -N2 -S1 -d5 ./a.out
The -d flag specifies the depth, or number of cores to assign to each MPI process (when the MPI process spawns an OpenMP thread, it has a dedicated core to put it on). The -S option causes the second process to be put all on the second socket, rather than filling out the first socket first.
Single-Processor (Serial) Jobs
Serial programs which are memory or computationally intensive should never be run on the service nodes (anything outside of aprun). Service nodes have limited resources shared between all users, and when they run out, system problems may result. To run serial programs on the compute nodes, the program must be compiled with the compiler wrappers (cc, CC or ftn). You would then request one node (12 cores) with PBS (#PBS -l size=12). Use the following line to run a serial executable on a compute node:
aprun -n 1 ./a.out
Running Multiple Single-Processor Programs on a Compute Node
The following batch script shows how to run multiple copies of a serial program on a compute node:
#!/bin/csh #PBS -A TG-XXXXXXXXX #PBS -N run_serial #PBS -l walltime=00:30:00,size=12 #PBS -j oe set echo cd /lustre/scratch/$USER/serial_job # Use aprun to start a shell script which runs 12 copies of the # same executable on a compute node # Note: all aprun options specified below are required # -n 1 # run on a single node # -d 12 # allows the script to access all the cores on a node # -cc none # allows each serial process to run on its own core # -a xt # required by aprun to run a script instead of a program aprun -n 1 -d 12 -cc none -a xt ./run_serial
The run_serial script looks like this:
#!/bin/sh # This must be /bin/sh (other shells do not work) # Run 12 copies of serial_code in the background ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & ./serial_code & # Wait until all copies of serial_code have finished wait
Job Accounting
Projects are charged based on usage of compute resources. This section gives details on how each job’s usage is calculated. PBS allocates cores to batch jobs in units of the number of cores available per node. A node cannot be allocated to multiple jobs, so a job is charged for the entire node whether or not it uses all its cores. The PBS -l size option specifies the number of cores to allocate to a job. For example on Kraken a multiple of 12 must be requested.
Getting Accounting Information
This section illustrates the usage of two commonly used utilities for obtaining accounting information.
showusage
The showusage utility can be used to view your project allocation and overall usage through the last job accounting posting (usually the previous night).
glsjob
More detailed accounting information can be obtained using the glsjob command, specifying Kraken with the -m option:
- glsjob -m kraken.nics.xsede -u <username>
- Prints current accounting information for a particular user.
- glsjob -m kraken.nics.xsede -J <jobid>.xt5
- Can be used to find information for a particular job.
- glsjob -m kraken.nics.xsede -p <project>
- Prints current accounting information for all jobs charged to a particular project account.
- glsjob --man
- Displays documentation for glsjob
Note: Without the -m option, glsjob will show information for jobs across all NICS resources.
On Kraken the service unit charge for each job is:
PBS 'size' * walltime
where walltime is the number of wall clock hours used by the job.
Job Refund Policy
NICS will provide refunds for user jobs which are adversely impacted by system issues beyond the control of the user. Refund requests must be made within two calendar weeks of a job’s completion date by submitting a ticket to help@xsede.org. Please provide: username, machine name, jobID, reason for refund request.
Examples of refund requests that will not be approved include: jobs run on projects that have a negative balance, jobs that started and completed after the project’s end date, and jobs that failed because they reached the user-specified wallclock limit.
NICS strongly encourages the use of application checkpoint restart files. Users should only request refunds from the time of the last successful checkpoint. The refund limit for eligible jobs is six hours. Exceptions to the maximum refund will only be considered for cases where appropriate checkpointing can not effectively mitigate loss due to the nature of the underlying machine problem.
Scheduling Policy
Kraken uses TORQUE and Moab to schedule jobs. NICS is constantly reviewing the scheduling policies in order to adapt and better serve users. This section gives details of the scheduling policies on Kraken.
Because of the unique large computing capacity of Kraken, the scheduler will give preference to large core count jobs up to the size of capability, see the queues section. Moab is configured to do “first fit” backfill. Backfilling allows smaller, shorter jobs to use otherwise idle resources.
Users can alter certain attributes of queued jobs until they start running. The order in which jobs are run is dependent on the following factors:
- number of cores requested - jobs that request more cores get a higher priority.
- queue wait time - a job's priority increases as the time it waits to run.
- account balance - jobs that use an account with a negative balance will have significantly lowered priority.
- number of jobs - a maximum of five jobs per user, at a time, will be eligible to run. The rest will be blocked.
In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact NICS User Support. NICS will need the job ID and reason to submit the request.
More detailed information can be found in the Queues section.

