Queues are used by the batch scheduler to aid in the organization of jobs. An individual user may have up to 5 jobs eligible to run at any one time, while an account may have a total of 10 jobs eligible to run across all the users charging against that account. Jobs in excess of these limits will not be considered for execution. Note that these limits apply to the number of jobs eligible to run, not the number of jobs running.
For example, if you submit 12 jobs, 5 would be eligible, and 7 would be blocked (with an "Idle" state). If three of the jobs run, some blocked jobs will be released so that there are still 5 eligible jobs, and 4 blocked jobs. This continues if all jobs are run. This is done to make it easier to schedule the jobs (there are fewer jobs to consider), and to prevent a single user from dominating the system with many small jobs.
Job priority on Kraken is based on the number of cores and wall clock time requested. Jobs with large core counts (over 32K processors) intentionally get the highest priority on Kraken. Jobs with small core counts can be run on other TeraGrid systems, therefore their priority is lower on Kraken. NICS does not restrict or discourage jobs with small core counts. While the scheduler is collecting nodes for larger jobs, those with short wall clock limits and small core counts may use those nodes temporarily without delaying the start time of the larger job. For a better explanation of backfilling jobs and NICS scheduling policies point your browser to NICS Scheduling Policies.
By default, jobs are sorted into a number of queues based on their size and (for the longsmall queue) their walltime. Long jobs (ie, the longsmall queue) can prevent the machine from being scheduled efficiently, therefore the longsmall queue is limited to 256 cores between all users. In order to get jobs through quicker, it is highly recommended that you break your jobs into 24 hour segments instead.
Queue |
Min Size |
Max Size |
Max Wall |
|---|---|---|---|
| small | 0 | 512 | 24:00:00 |
| *longsmall | 0 | 256 | 60:00:00 |
| medium | 513 | 8192 | 24:00:00 |
| large | 8193 | 32768 | 24:00:00 |
| capability | 32769 | 99072 | 24:00:00 |
* The longsmall queue is limited to about 256 cores for all running jobs.
HPSS Queue
The "hpss" queue can be used for jobs that access HPSS. Jobs running in this queue are not allocated compute nodes, so the aprun command is not available. The wall clock limit for the hpss queue is 24 hours. You may only submit jobs to this queue if you logged into a node with your RSA SecurID OTP token. To submit jobs to the hpss queue, use the qsub -q option (#PBS -q hpss ). Since no compute nodes are needed, you must request zero compute cores (#PBS -l size=0). Using job dependencies, you can schedule a hpss job to stage data before and/or after a normal production job.

