On Monday 5 Jun at 13:45 an environmental event in the Upper Campus Data Center caused damage to the HPC rack. Currently 5 worker nodes are offline. We have ordered replacement parts from our suppliers, however the implication is that…
We have added another GPU server to our a100 partition. This server was purchased with funding from several groups as well as ICTS and additional resources will be dedicated to the shared a100free account. The server contains four a100-80GB cards…
Being able to analyze the energy usage of every core in every CPU of the cluster enables us to detect jobs that are not making good use of allocated cores over time. Here is a node that is using 1…
We have moved away from Cacti\Nagios for graphing and now make use of Grafana. Unfortunately there is no public facing portal for Grafana, however there is a way to export graphs as static png files, so we have set up…
The HPC cluster has been moved to the new upper campus data centre. The new data centre provides more electrical power and cooling and also has a new UPS and generators in order to better withstand load shedding. In addition…
We have migrated our WordPress website to https://ucthpc.uct.ac.za. The old domain, hpc.uct.ac.za redirects here. The current cluster dashboard remains at hpc.uct.ac.za/db.…
Dear colleagues, As part of the process of ongoing improvement, ICTS will be migrating the High-Performance Computing cluster from its current location to the new data centre. This will result in some downtime for the cluster. How does this affect…
After more than a decade we finally got around to redesigning the HPC dashboard. Initially created as a way for sysadmins to monitor MPI software the dashboard used simplistic images strung together in html. This worked fine while there were…
It’s been a while since the HPC cluster has had a major update. Over the last few weeks we’ve been planning and constructing a new test environment. One of the major issues on the horizon is the impending demise of…
The UCTHPC cluster will be down on the weekend of the 25th\26th of June for scheduled network maintenance. We will place the cluster in draining mode on Friday the 24th of June at 17:00 at which point new jobs may…