Priority and fairshare

Priority and fairshare make sure that everyone can get at least some jobs running and that everyone gets an equal share of the cluster to use over time.

Using an example from the Sanger Farm training course, we'll consider a "first come, first served" situation where user A submits 1000 jobs and then some time shortly after, user B submits another 1000 jobs. In this situation, user B would have to wait for all of the jobs submitted by user A to finish before the first of user B's jobs can begin. That just wouldn't be very fair.

However, with fairshare, the user's priority changes over time based on their recent usage. This means that users which have used the cluster recently (i.e. in the last 48 hours) will have a lower priority than new users when they submit their next job.

Consider two users, User A and User B, who start off with the same priority. User A starts running some jobs and gets the full share of available CPUs. User B now starts submitting some jobs. Initially, User B gets a smaller proportion of CPUs than User A. As the jobs submitted by User A start to finish, User B starts to get a larger share of the CPUs. This is because User A now has a lower priority. Eventually, over a period of 48 hours or so, User A and User B will have equal priorities again and will have used an equal proportion of the cluster resources.

At Sanger we use hierarchical fairshare which means that the initial priority for each user will also depend on their group membership. This is because the system needs to accommodate the needs of both the pipelines and the users.

Priority will only affect the decision on which jobs run next. It will not affect jobs that are already running. Sanger users can find more information about fairshare here.

You can check your priority in a particular queue using bqueues and the -r option followed by the name of the queue that you want to look at.

bqueues -r <queue_name>
USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
userA           1       0.333      0        0         0.0        0       0.000
userB           1       0.211      0        0      7450.8     8924       0.000
userC           1       0.104      0        0     11218.3    34168       0.000
userD           1       0.044      0        0     72918.9   100394       0.000
userE           1       0.003     16        0    205565.6  1369196       0.000

What's next?

For an overview of jobs arrays and dependencies, you can go back to job arrays and dependencies. Otherwise, let's take a look at troubleshooting.