Over the past few days the HPC and networks team relocated 3 racks worth of servers to a new location in the upper campus data center. This will free up space for the continuation of the refurbishment project. The end…
We have added another GPU server to our a100 partition. This server was purchased with funding from several groups as well as ICTS and additional resources will be dedicated to the shared a100free account. The server contains four a100-80GB cards…
The new GPU nodes have been installed into our POC cluster and are currently running jobs. We installed the servers into our SLURM cluster to allow us to develop hands on experience in provisioning SLURM partitions with GPU resources, something …
We noticed a while back that several of our GPU cards retained high utilization even though no processes were running on them. nvidia-smi Fri Mar 23 13:52:55 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name…
We have completed the upgrade to the GPU portion of the hex cluster: – Installed new GPU004 server with two nVidia K40 cards. – Two additional nVidia K40 cards added to GPU003. This brings the number of GPU cards in…
The new GPU server, srvslsgpu004, is up and running. Still to be configured is the Infiniband card and the BGFS volume. The server is being tested and will remain offline until next week. In the server are 2 x 10…
One of our top researchers, Professor Michelle Kuttel, recently made the cover of the Journal of Computational Chemistry. The computational work was performed on the UCT HPC cluster using our Supermicro GPU nodes. The cover image depicts a highly-branched …
Our current cluster, hex, runs Torque with MAUI as the scheduler. While MAUI is GPU aware it does not allow GPUs to be scheduled. In other words you can list the nodes with GPUs but you cannot submit a job …
The HPC rack was neatened up. This involved moving and consolidating servers, making space for PDU's and removing redundant cables that were impeding airflow. New HPC servers were installed. This task took two entire days as other