Yesterday we moved 8 HPC nodes to an alternate rack in order to alleviate pressure on UDC HVAC. The implication for this move is that nodes 608 – 615 are connected to the FHGFS storage via a second Infiniband switch.…
This week saw our HPC cluster exceed two thresholds. We went over 16 million total computational hours as well as 500,000 current computational hours. This last figure is the amount of CPU hours our clusters are dealing with right now.…
Thank you to all our readers who voted for our blog in the category of Best Science and Technology blog at the SA Blog Awards 2015. We managed to scoop the runner-up award. Once again, thank you for the …
So we had a weird error post updating our cluster nodes with the latest available kernel in the SLES11 SP3 repositories. Restarting our Infiniband services caused the following error ” Module mlx4_core belong to kernel-default which is not a part…
Today is System Administrator Appreciation Day. Often a thankless task, but today ICTS management bought all the department’s system administrators pizza for lunch, no small task as evidenced by a 31U rack of pizza boxes. …