Over the past few days the HPC and networks team relocated 3 racks worth of servers to a new location in the upper campus data center. This will free up space for the continuation of the refurbishment project. The end …
HPC moving to new racks in UCDC
The UCT HPC cluster is being migrated to its final location in the newly refurbished ICTS data center. The move will take place in the first week of June during which time the cluster will be unavailable.
The cluster will …
HPC unavailable 20-21 April
The HPC cluster will be unavailable during the weekend of the 20th of April due to a network upgrade.
We will be placing the cluster into draining mode at 17:00 on Friday the 19th at which point no new jobs …
HPC degraded
On Monday 5 Jun at 13:45 an environmental event in the Upper Campus Data Center caused damage to the HPC rack. Currently 5 worker nodes are offline. We have ordered replacement parts from our suppliers, however the implication is that …
New GPU server
We have added another GPU server to our a100 partition. This server was purchased with funding from several groups as well as ICTS and additional resources will be dedicated to the shared a100free account.
The server contains four a100-80GB cards…
Time based analysis of core\energy usage
Being able to analyze the energy usage of every core in every CPU of the cluster enables us to detect jobs that are not making good use of allocated cores over time.
Here is a node that is using 1 …
Future resource management
Our new cluster will use cgroup to control RAM and thread allocation. One of the biggest hassles we’ve faced over the years is code not adhering to the scheduler reservation, in other words grabbing more cores and more RAM than …
Performance graphs
We have moved away from Cacti\Nagios for graphing and now make use of Grafana. Unfortunately there is no public facing portal for Grafana, however there is a way to export graphs as static png files, so we have set up …
Cluster migration
The HPC cluster has been moved to the new upper campus data centre. The new data centre provides more electrical power and cooling and also has a new UPS and generators in order to better withstand load shedding. In addition …
New domain
We have migrated our WordPress website to https://ucthpc.uct.ac.za. The old domain, hpc.uct.ac.za redirects here. The current cluster dashboard remains at hpc.uct.ac.za/db.…