Core reservation in PBS

Sep 2, 2015application, hpc, Memory, torque

Researchers who bother to read the Acceptable Usage Policy may have noticed there is a clause which states:

Users must abide by the ratio of 2GB of RAM (memory) per allocated core. Usage of the pmem directive should also be observed.

This post attempts to address the rationale behind this in simple, accessible language. Most people are familiar with the process whereby one makes a restaurant reservation. Apart from the date and time one critical piece of information is the table size. If you make a reservation for 4 people but arrive with 50, you will not be seated. Equally, if you reserve 50 seats but arrive with only 4 people you will meet with a certain hostility on the part of the restaurant manager.

This gastronomic analogy may be a bit simplistic so here is the actual problem; our main series servers, the Dell C6145s that constitute the 600 series on Hex have 64 cores and 128GB RAM each. This means that there is a 1:2 ratio between cores and GB of RAM. A user’s job consumes 10GB of RAM, they submit 12 of these jobs to a single server. This means that at some point 120GB of the 128GB of RAM will be consumed. If more jobs are added the server will have to start rapidly page swap the RAM, (writing it out to disk), reading in other RAM from disk, and then repeating this as more and more jobs come in until the server most likely starts to thrash and die. How do we avoid this problem?

In PBS only cores are considered consumable resources, when a user submits a 1 core job the scheduler subtracts this core from the total available cores on this server leaving 63 cores for other jobs. As soon as this job uses more than 2GB of RAM then the ratio of cores to RAM falls below 1:2. In order to ensure that the ratio of 2GB of RAM remains for all available cores the user must consume cores at a rate of 1 for every 2GB of RAM. So in the previous example the user sets the required cores per job to 5 (ppn=5), submits 12 of them leaving only 8GB of RAM and 4 of the 64 cores. When they attempt to submit a 13th 5 core job it queues as there are insufficient cores (and RAM), which means that the server will not crash.

There is a pmem directive which specifies that a job must be directed to a server with a specified amount of free RAM, however if this jobs RAM grows beyond this specified amount the job is not killed. The pmem directive is also not mandatory and most of our users do not bother using it.

We have therefore implemented a wrapper script in the scheduler which will kill any job which exceeds the specified core to RAM ratio and email the user that they will need to increase the core (ppn) value for the job. It should be noted that it is the user’s responsibility to know in advance how much RAM their job will consume. This can be calculated theoretically by summing the largest variables or arrays the job will create in RAM simultaneously or empirically by running a small job initially, and observing the RAM consumed as the job size is increased.

For example, if a user has an R job which creates a 10000 x 10000 array of integers:
> mat = matrix(nrow=10000, ncol=10000)

One can display the RAM used with this command:
> object.size(mat)
400000200 bytes
Or 0.4GB

Increasing this to a 20000×20000 matrix now gives:
> object.size(mat)
1600000200 bytes
Or 1.6GB

Note, other software such as Python, Perl, C etc. will have other methods of displaying object size and it is the responsibility of the user to be familiar with their use. For users that need more than 128GB of RAM for any one job we have a small number of high-memory servers on the 800 series, two of which have 1000GB of RAM and 24 cores giving a ratio of approximately 40GB/core, however the rule of reserving the correct number of cores per job above remains the same.

High Performance Computing in the early 20th Century.

Core reservation in PBS

UCTHPC