On Monday 5 Jun at 13:45 an environmental event in the Upper Campus Data Center caused damage to the HPC rack. Currently 5 worker nodes are offline. We have ordered replacement parts from our suppliers, however the implication is that …
New GPU server
We have added another GPU server to our a100 partition. This server was purchased with funding from several groups as well as ICTS and additional resources will be dedicated to the shared a100free account.
The server contains four a100-80GB cards…
Time based analysis of core\energy usage
Being able to analyze the energy usage of every core in every CPU of the cluster enables us to detect jobs that are not making good use of allocated cores over time.
Here is a node that is using 1 …
Future resource management
Our new cluster will use cgroup to control RAM and thread allocation. One of the biggest hassles we’ve faced over the years is code not adhering to the scheduler reservation, in other words grabbing more cores and more RAM than …
Performance graphs
We have moved away from Cacti\Nagios for graphing and now make use of Grafana. Unfortunately there is no public facing portal for Grafana, however there is a way to export graphs as static png files, so we have set up …
Cluster migration
The HPC cluster has been moved to the new upper campus data centre. The new data centre provides more electrical power and cooling and also has a new UPS and generators in order to better withstand load shedding. In addition …
New domain
We have migrated our WordPress website to https://ucthpc.uct.ac.za. The old domain, hpc.uct.ac.za redirects here. The current cluster dashboard remains at hpc.uct.ac.za/db.…
Data Centre migration
Dear colleagues,
As part of the process of ongoing improvement, ICTS will be migrating the High-Performance Computing cluster from its current location to the new data centre. This will result in some downtime for the cluster.
How does this affect …
New dashboard
After more than a decade we finally got around to redesigning the HPC dashboard. Initially created as a way for sysadmins to monitor MPI software the dashboard used simplistic images strung together in html. This worked fine while there were …
Jupyter Notebook
If you want to run Jupyter Notebook on the cluster and work in this from your desktop the following may help.
As you do not have write access to the software volume you will need to do a local install, …