On Sunday the 16th of September Hex was moved from the Upper Campus data centre to the Bremner data centre. The move went smoothly and Hex was back up and running by Monday morning. There were some minor issues; the beegfs scratch volume didn’t want to mount at first due to a nodeID mismatch and hard disks in nodes 614 and 615 died. The CPU temperature sensors in gpu001 also failed putting the cooling fans into permanent high mode. Unfortunately node 614 lost a second disk 24 hours later but hopefully we can source a replacement.
The reason for moving hex was to make space in the Upper Campus data centre in terms of power, rack space and cooling in order to start building our new cluster. The Hex move took a great deal of planning and hard work, necessitating a minor rebuild of the Bremner data centre. Thanks are due to our project manager, Robert Lefebure, facilities managers Randolph Thompson and Bongani Quwe, networking experts Edgar Chikwete and Moegamat Alexander as well as Dimension Data for assisting with the physical move of the servers.
In the coming weeks we will be racking the new Dell servers in the Upper Campus data centre, building a new Hpc cluster and Beegfs scratch volume and deploying software. We hope that the new cluster will be ready in early October.