There will be a full shut down of the Hex HPC cluster this coming weekend. We will be placing Hex into draining mode at 16:00 Friday 14 September and no new jobs will be accepted. The cluster will be shut…
The data center where hex will reside is still being worked on. The spiders nest of network cables under the raised floor has been cleaned out to allow better air flow. All networking will now run in overhead…
It has been a long time in coming, several years in fact, but the first of our new worker nodes went live today on our POC cluster. There is still much to be done, Infiniband is not yet ready, the…
The new GPU nodes have been installed into our POC cluster and are currently running jobs. We installed the servers into our SLURM cluster to allow us to develop hands on experience in provisioning SLURM partitions with GPU resources, something …
We noticed a while back that several of our GPU cards retained high utilization even though no processes were running on them. nvidia-smi Fri Mar 23 13:52:55 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name…
The beegfs cluster was updated to 2015.03.r23. The fhgfs volume is back up and mounted on all nodes. During the Infiniband switch firmware upgrade an error was encountered. We have logged a support call with Mellanox regarding this. The switch …
The BeeGFS (fhgfs) cluster will be offline on Monday 13th March from 09:00 to 17:00 for a major update. Please ensure that all jobs referencing the BeeGFS volume, /researchdata/fhgfs, are completed before 09:00. The firmware on the Mellanox Infiniband switch …
This is an updated entry for the issue we encountered last year upgrading our HPC servers and Infiniband drivers. An updated installation ISO needs to be created that allows kernel support for the newly updated kernel. To create the ISO…