Both clusters are back up again after the upgrade, we'd like to thank our users for their cooperation and patience over the weekend. Apart from numerous software patches we also transfered various data volumes to the new Netapp as our old HP EVA 8000 SAN is due for decomissioning. One minor hiccup experienced during the upgrade was the Infiniband drivers requiring a change to OpenMPI. While mpi jobs ran they completed with the following cosmetic error:
--------------------------------------------------------------------------
Open MPI failed to open the /dev/knem device due to a local error. Please check with your system administrator to get the problem fixed, or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem support.
Local host: srvslshpc601
Errno: 2 (No such file or directory)
--------------------------------------------------------------------------
[srvslshpc601:24784] 3 more processes have sent help message help-mpi-btl-sm.txt / knem fail open
[srvslshpc601:24784] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
This required adding btl_sm_use_knem = 0 to /usr/mpi/gcc/openmpi-1.6.5/etc/openmpi-mca-params.conf