Those users still making use of hpc.uct.ac.za will have noticed that a few worker nodes have vanished. As mentioned previously we’re investigating a new scheduler, SLURM, Simple Linux Utility for Resource Management. SLURM is a very different animal to PBS. Gone are the groups and queues, in their place are accounts and partitions. Management and reservation of resources is extremely granular and can also be very complex. Below is an example where two maths users have GrpCPU limits set to 24 and have each submitted two 4 core jobs. One of these jobs is pending, why?
The answer is that an overriding core reservation (GrpCPU) of 12 has been set on their common partition (maths) which means that only three 4 core jobs will run. This allows finer control of group behaviour.
There are many more features that we want to add to this new cluster and each requires a significant amount of testing so we have no release date yet.