Job queues

When you submit a job, the job scheduler will place that job into a queue. Queues are just lists of submitted jobs which share scheduling and resource requirements.

To take a look at which queues are available, you can use the command: bqueues.


In [ ]:
bqueues
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
system          1000 Open:Active       -    -    -    -     0     0     0     0
yesterday       500  Open:Active      20    8    -    -     0     0     0     0
small            31  Open:Active       -    -    -    -     0     0     0     0
normal           30  Open:Active       -    -    -    -    35    13     1     0
long              3  Open:Active      50    -    -    - 31686 31636    46     0
basement          1  Open:Active      20   10    -    -   180   170    10     0

This will return information about the queues which are available and how busy they are. Here, we can see information about six queues into which jobs can be submitted on the cluster.

By default, bqueues will give you the following information:

  • QUEUE_NAME - the name of the queue
  • PRIO - the priority of the queue
  • STATUS - the status of the queue
  • MAX - the maximum number of job slots available
  • JL/U, JL/P and JL/H - the job slot limit for users, processors and hosts respectively
  • NJOBS - the total number of tasks for all jobs in the queue
  • PEND - the number of pending jobs in the queue
  • RUN - the number of running jobs in the queue
  • SUSP - the number of suspended jobs in the queue

How busy is the cluster?

For each queue, you can see the total number of tasks scheduled (NJOBS) and a breakdown of how many of those jobs are waiting to be dispatched (PEND), are running (RUN) or are suspended (SUSP).

Queue priority

You may have some jobs which are more urgent than others and that you would like to be run sooner. In these instances, the priority of the queue is important.

Jobs submitted to higher priority queues are run first. You can check the queue priority by looking at the PRIO column. The larger the priority value of the queue, the higher the priority of the queue. In this example, we can see that the yesterday queue has a much higher priority than the normal queue and so a job submitted to the yesterday queue will often be run before a job on the normal queue if the resources that were requested for that job are available.

For more information on priority and how this works, please see priority and fairshare.

Queue status

Sometimes a queue might not be available. You can check the status of the queue by looking at the STATUS column.

  • Open - the queue is able to accept jobs
  • Closed - the queue is not able to accept jobs
  • Active - jobs in the queue will be allowed to start when resources are available
  • Inactive - jobs in the queue won't be started for the time being

Getting more information about a particular queue

You can get more detailed information by using the -l option with bqueues.

Let's try getting some information about the queues on our cluster.


In [ ]:
bqueues -l

This will give you the requirements and limits for all of the queues on the cluster. You can also get the this information for a specific queue by specifying the name of the queue.

bqueues -l <queue_name>

In the example command below, we are asking for detailed information about a queue called yesterday.

bqueues -l yesterday

The -l option will give us a lot more information, such as the resource limits for the yesterday queue (e.g. maximum memory usage or run time).

QUEUE: yesterday
  -- As in I needed it yesterday highest priority (all nodes)

PARAMETERS/STATISTICS
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV
500   20  Open:Active      20    8    -    -     0     0     0     0     0    0
Interval for a host to accept two jobs is 0 seconds

DEFAULT LIMITS:
 MEMLIMIT
    100 M

MAXIMUM LIMITS:
 RUNLIMIT
     2880.0 min of BL465c_G8

 CORELIMIT MEMLIMIT
      0 M     250 G

SCHEDULING PARAMETERS
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -

              poe nrt_windows adapter_windows ntbl_windows  uptime
 loadSched     -           -               -            -       -
 loadStop      -           -               -            -       -

SCHEDULING POLICIES:  FAIRSHARE
USER_SHARES:  [default, 1]

SHARE_INFO_FOR: yesterday/
USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
user1            1       0.302      0        0        47.0     1590       0.000
user2            1       0.301      0        0       590.3     1634       0.000

USERS: all
HOSTS:  pcs5a pcs5b+1 others+2
RES_REQ:  select[type==any]
Maximum slot reservation time: 14400 seconds

Below is an example for three queues which have different resource limits. Here, jobs in the normal queue will automatically be terminated or killed by LSF if they try to run for more than 12 hours (RUNLIMIT = 720.0 min), in the long queue after 2 days (RUNLIMIT = 2880.0 min) and in the hugemem queue after 15 days (RUNLIMIT = 21600.0 min). The hugemem also has a much larger memory limit (727.5G) than the normal or long queues (250G).

normal:

 RUNLIMIT
 720.0 min of BL465c_G8

 CORELIMIT MEMLIMIT
      0 M     250 G

long:

 RUNLIMIT
 2880.0 min of BL465c_G8

 CORELIMIT MEMLIMIT
      0 M     250 G

hugemem:

 RUNLIMIT
 21600.0 min of HS21_E5450_8

 CORELIMIT MEMLIMIT
      0 M   727.5 G

For more information, please see the working with queues section of the LSF user guide.


What's next?

For an overview of the key concepts, you can go back to the introduction. Otherwise, let's take a look at submitting jobs.