Our users may have noticed that the hpc cluster dashboard is reflecting some infrastructure changes. Please note that this post refers to the older hpc cluster, not hex. The 200 series are going to be decommissioned soon, and this is…
This weekend the head node suffered an unexpected reboot. We're still not sure what the cause of this was. However it looks as if the running jobs were not effected.…
We're currently investigating a memory issue with some of the worker nodes. Memory is not being freed up after jobs complete.
UPDATE - 4 July:
Turns out it's not a memory error. The problem is the way that net-snmp monitors…
Patched kernels on HPC servers to 2.6.18-238.1.1.el5; All went fine except for the head node which has an issue with latest kernel (dies at boot with a kernel panic) so booting it into older version 2.6.18-194.1.1.el5 until we can sort…