This month marks 10 years since we started providing HPC services to UCT. The landscape has changed considerably since then, we no longer have to rely on “hand me down” servers and cobbled together file services. Our new cluster has been running for 6 months now and has seen a very good uptake of new researchers. Hex has been down-scaled to just 6 nodes and researchers are being moved across prior to its decommissioning. Our new disk system is working well, BeegFS is extremely fast and the quota system protects the volume from rogue jobs making our system administrator’s lives a lot easier.
The new scheduler, SLURM, has taken a bit of getting used to, similar to Torque\PBS it has some behavioural differences that can trip one up if one is used to Torque. However it’s super reliable and very fast. We’ve also moved away from SLES and now run Centos 7 which makes deploying applications much easier.
In those 10 years we’ve run almost 26 million CPU hours, completed just under 3 million jobs and have been acknowledged in 129 publications that we’re aware of.