• National Institute for Computational Sciences is a UT/ORNL Partnership

Monitoring Job Status

PBS and Moab provide multiple tools to view queue, system, and job status. Below are the most common and useful of these tools.

qstat

Use qstat -a to check the status of submitted jobs.

> qstat -a 

nid00004: NICS
                                                   Req'd  Req'd   Elap
Job ID  Username Queue  Jobname  SessID NDS  Tasks Memory Time  S Time
------- -------- ------ -------- ------ ---- ----- ------ ----- - -----
29668    user1   batch   job2     21909   1    256   --   08:00 R 02:28
29894    user2   batch   run128    --     1    128   --   02:30 Q   --
29895    user3   batch   STDIN    15921   1      1   --   01:00 R 00:10
29896    user2   batch   jobL     21988   1   2048   --   01:00 R 00:09
29897    user4   debug   STDIN    22367   1      2   --   00:30 R 00:06
29898    user1   batch   job1     25188   1      1   --   01:10 C 00:00
>

The qstat output shows the following:

Job ID
The first column gives the PBS-assigned job ID.
Username
The second column gives the submitting user’s login name.
Queue
The third column gives the queue into which the job has been submitted.
Jobname
The fourth column gives the PBS job name. This is specified by the PBS -n option in the PBS batch script. Or, if the -n option is not used, PBS will use the name of the batch script.
SessID
The fifth column gives the associated session ID.
NDS
The sixth column gives the PBS node count. Not accurate; will be one.
Tasks
The seventh column gives the number of cores requested by the job’s -size option.
Req’d Memory
The eighth column gives the job’s requested memory.
Req’d Time
The ninth column gives the job’s requested wall time.
S
The tenth column gives the job’s current status. See the status listings below.
Elap Time
The eleventh column gives the job’s time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.
Status value Meaning
E Exiting after having run
H Held
Q Queued
R Running
S Suspended
T Being moved to new location
W Waiting for its execution time
C Recently completed (within the last 5 minutes)

showq

The Moab utility showq gives a different view of jobs in the queue. The utility will show jobs in the following states:

Active
These jobs are currently running.
Eligible
These jobs are currently queued awaiting resources. A user is allowed five jobs in the eligible state.
Blocked
These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold, the owning user currently having five jobs in the eligible state, and running jobs in the longsmall queue.

checkjob

The Moab utility checkjob can be used to view details of a job in the queue. For example, if job 736 is currently in a blocked state, the following can be used to view the reason:

>checkjob 736

The return may contain a line similar to the following:

BlockMsg: job 736 violates idle HARD MAXJOB limit of 2 for user  (Req: 1 InUse: 2)

This line indicates the job is in the blocked state because the owning user has reached the limit of five jobs currently in the eligible state.

showstart

The Moab utility showstart gives an estimate of when the job will start.

> showstart 100315
job 100315 requires 16384 procs for 00:40:00

Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12
Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12

The start time may change dramatically as new jobs with higher priority are submitted, so you need to periodically rerun the command.

showbf

The Moab utility showbf gives the current backfill. This can help you create a job which can be backfilled immediately. As such, it is primarily useful for short jobs.

xtshowcabs

The utility xtshowcabs can be used to see what jobs are currently running and which nodes they are running on.