The HPC cluster will be unavailable during the weekend of the 20th of April due to a network upgrade.
We will be placing the cluster into draining mode at 17:00 on Friday the 19th at which point no new jobs …
The HPC cluster will be unavailable during the weekend of the 20th of April due to a network upgrade.
We will be placing the cluster into draining mode at 17:00 on Friday the 19th at which point no new jobs …
On Monday 5 Jun at 13:45 an environmental event in the Upper Campus Data Center caused damage to the HPC rack. Currently 5 worker nodes are offline. We have ordered replacement parts from our suppliers, however the implication is that …
We have added another GPU server to our a100 partition. This server was purchased with funding from several groups as well as ICTS and additional resources will be dedicated to the shared a100free account.
The server contains four a100-80GB cards…
Being able to analyze the energy usage of every core in every CPU of the cluster enables us to detect jobs that are not making good use of allocated cores over time.
Here is a node that is using 1 …
Our new cluster will use cgroup to control RAM and thread allocation. One of the biggest hassles we’ve faced over the years is code not adhering to the scheduler reservation, in other words grabbing more cores and more RAM than …
We have moved away from Cacti\Nagios for graphing and now make use of Grafana. Unfortunately there is no public facing portal for Grafana, however there is a way to export graphs as static png files, so we have set up …
The HPC cluster has been moved to the new upper campus data centre. The new data centre provides more electrical power and cooling and also has a new UPS and generators in order to better withstand load shedding. In addition …
We have migrated our WordPress website to https://ucthpc.uct.ac.za. The old domain, hpc.uct.ac.za redirects here. The current cluster dashboard remains at hpc.uct.ac.za/db.…
Dear colleagues,
As part of the process of ongoing improvement, ICTS will be migrating the High-Performance Computing cluster from its current location to the new data centre. This will result in some downtime for the cluster.
How does this affect …
After more than a decade we finally got around to redesigning the HPC dashboard. Initially created as a way for sysadmins to monitor MPI software the dashboard used simplistic images strung together in html. This worked fine while there were …