SLURM Archives - Page 2 of 2

Updates

Jun 20, 2016

The Slurm POC cluster scheduler on hal.uct.ac.za has been upgraded from 15.08 to 16.05. We have revamped the hex cluster dashboard. It is now much simpler and is also mobile friendly. The old style dashboard is still available though.…

Worker job submission

Nov 21, 2015

hpc, SLURM, torque

Most clusters have the ability to submit jobs from inside other jobs. Until recently this was disabled on hex. We have now enabled this facility on the series600 nodes. You can now call qsub from inside your job script although…

Say hello to HAL

Aug 11, 2015

hpc, SLURM

Over the past few months we’ve been referring to a new type of scheduler that we’ve been testing. We decided to move away from PBS as MAUI is no longer being maintained. The scheduler we have selected, SLURM, is …

MPI affinity

Aug 6, 2015

hpc, maui, MPI, SLURM, torque

MPI socket vs core affinity When running a MPI job you can bind your threads to either a socket or to individual cores. There is no one best solution, the choice depends completely on your task and the way it…

Berkeley Lab Checkpoint/Restart

Jul 7, 2015

hpc, SLURM

Berkeley Lab Checkpoint/Restart has been installed on the SLURM cluster. This allows users to checkpoint a job, cancel it and then resume the job at a later date. The executable is started with the cr_run wrapper: cr_run /home/andy/ram.pl >> /home/andy/ramtest.out…

SLURM and memory management

Jun 26, 2015

hpc, SLURM

SLURM allows a DefMemPerCPU and a MaxMemPerCPU to be set. If a user does not set a memory limit the default will be used. This is normally set to MaxMem/NumCores. As memory is a consumable resource (SelectTypeParameters=CR_Core_Memory) MaxMemPerCPU serves not

…

GPUs and GRES

Jun 22, 2015

GPU, hardware, hpc, SLURM

Our current cluster, hex, runs Torque with MAUI as the scheduler. While MAUI is GPU aware it does not allow GPUs to be scheduled. In other words you can list the nodes with GPUs but you cannot submit a job …

SLURM job preemption

Jun 19, 2015

hpc, SLURM

SLURM provides a preemption mechanism to deal with situations where cluster become overloaded. This can be configured in several ways:

FIFO:
This is the most simplistic method of queueing in which there is no preemption, jobs come in, queue and …

Where are all the HPC servers disappearing to?

Jun 3, 2015

hpc, SLURM

Those users still making use of hpc.uct.ac.za will have noticed that a few worker nodes have vanished. As mentioned previously we’re investigating a new scheduler, SLURM, Simple Linux Utility for Resource Management. SLURM is a very different animal to

…

New stuff

Apr 30, 2015

hardware, hpc, SLURM

SLURM

The HPC team is busy testing a prototype SLURM cluster. This will replace both the hex and hpc clusters, although it is likely that the hex infrastructure will be incorporated into the new cluster. The time frame for

…

SLURM

UCTHPC