As previously stated, the qstat command will display the status of the
queue(s), specifically, its most common usage is to retrieve
information regarding the jobs running on the scheduling system. Depending on
the cluster and the configuration, this command may return a list of jobs
submitted only by yourself, or a list of all jobs submitted, as seen
in these examples. Axiom is configured to report only jobs submitted by the
user, while umms-amino reports on all jobs currently submitted to the system.
Note: Once a job is complete, it will no longer be reported in
qstat, completed jobs will be discussed in another section.
Example:
[jdpoisso@axiom ~]$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 1027.axiom 15-34 jdpoisso 0 Q first 1028.axiom 16-34 jdpoisso 0 Q first 1029.axiom 17-34 jdpoisso 0 Q first 1030.axiom 18-34 jdpoisso 0 Q first 1031.axiom 19-34 jdpoisso 0 Q first 1032.axiom 20-34 jdpoisso 0 Q first 1033.axiom 21-34 jdpoisso 0 Q first 1034.axiom 22-34 jdpoisso 0 Q first [jdpoisso@axiom ~]$
Example:
[jdpoisso@umms-amino ~]$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 1036842.umms-amino T10372_2_AMF_Z yzhang 00:05:55 R casp 1036843.umms-amino T10372_2_A_closc yzhang 00:05:53 R casp 1036850.umms-amino d1l5ja2 jinrui 00:03:28 R default 1036852.umms-amino S46334_2F_1_run zhanglabs 00:03:01 R urgent 1036853.umms-amino S46334_3F_1_run zhanglabs 00:02:51 R urgent 1036854.umms-amino d1l5ja3 jinrui 00:02:43 R default [jdpoisso@umms-amino ~]$
The qstat command (when run with no arguments) will return
information broken down into six columns. first column is the Job
ID, this is the unique job number of the job and the scheduling server that
job number was submitted to. Each job submitted is given a unique number that
is used by the scheduler to reference the job. This job number is required for
many other
textttTORQUE commands, and can even be given to the
qstatcommand to gain more information about a particular
job.
The second column, Name, is the jobs defined
name. Defining names will be covered when discussing the qsub command.
For now it is enough to know that each job may optionally be given a non-unique
name to describe the job in the queue. This is useful as it allows you to
describe the job with a word, or phrase, or input, that means more to a human
reader than a job number. Also, for future reference, job outputs returned by
the scheduler are written to files using this defined name.
The third column, User, lists the username which submitted the job.
This allows you to know who submitted the job to the queue, and search for jobs
submitted by collaborators, or by yourself. system does allow you to
submit on behalf of another user, so all jobs in this field are accurate as to
who is used the qsub command.
The fourth column, Time Use, lists long your job has been
running for. Until the job actually starts running, the number is set to 0.
Depending on how the job is run, and the arguments being used this can be
displayed either in CPU time (default), or walltime.
time is the aggregate amount of time each CPU in the system has devoted to
running your job, so if one of your job used two CPUs for thirty seconds
(00:00:30) each, your CPU time will be one minute (00:01:00).
Walltime, on the other hand is defined by how long the job took to run
according to the clock on the wall, i.e. a normal clock. So that same job that
used two CPUs for thirty seconds each, if using the CPUs at the exact same
time, will finish in thirty seconds, and the walltime will be thirty
seconds (00:00:30).
Note: Walltime is also considered a resource, as
briefly mentioned before, the system will run jobs based on availability of
resources. If you expect your job to run for a long period, say twenty hours,
you must request a walltime resource for twenty (20:00:00) hours,
meaning the cluster will schedule your job when it most optimally can allocate
twenty hours according to its queue and configuration. If a
walltime resource is not specified when you submit your job, the
default walltime allotment will be used, the exact amount of which
varies by system and by queue. Exceeding the walltime allotment may
cause the scheduler to forcibly terminate your job, whether it is complete or
not.
textttWalltime will be covered further in sections about advanced job
submission.
The fifth column, S, represents that status of the job. This is a
one letter code for what your jobs current state is. The
possible job states are:
Q - Queued, waiting to be run
R - , job has been assigned resources and is running
E - , job is complete and is cleaning up, and copying files
H - , the job has been held, either by the submitter or an
administrator
The sixth column, Queue, lists the current job queue that the job
is submitted to. As each queue has varying resources and rules, you are able to
use this information to help deduce what resources the job is using, or what
resources the job is waiting to become available for.