operating system Archives - Page 2 of 2

Head node reboot

May 2, 2012

This weekend the head node suffered an unexpected reboot. We're still not sure what the cause of this was. However it looks as if the running jobs were not effected.…

We're currently investigating a memory issue with some of the worker nodes. Memory is not being freed up after jobs complete. UPDATE - 4 July: Turns out it's not a memory error. The problem is the way that net-snmp monitors…

A stressful day

Feb 11, 2011

hardware, hpc, MPI, operating system

Patched kernels on HPC servers to 2.6.18-238.1.1.el5; All went fine except for the head node which has an issue with latest kernel (dies at boot with a kernel panic) so booting it into older version 2.6.18-194.1.1.el5 until we can sort…

operating system

Head node reboot

Memory issues

A stressful day

UCTHPC