This week we installed five new servers bringing our core count to 208. In 2009 we were asked to develop a five year road map of HPC at UCT. Given our inexperience we focused on the two most obvious resources, cores and disk space, and completely ignored RAM. Over the past 18 months our researchers have taught us that large memory machines are a critical component of HPC and our next provisioning strategy will be designed around this requirement. Below is a graph of our original predicted growth path versus our actual implementation:
We were able to deliver pretty much what we'd planned for as we knew that the hardware re-provisioning strategy would make these resources available. What was more difficult to provide was rapid storage growth, especially storage protected by reliable backups.
Our current disaster recovery system was spec'd for our operational infrastructure, and not the rapid geometric growth of research data which is an order of magnitude larger than our email, file services and database systems combined. Solving this problem will require new technologies such as snapshot, block level replication and NDMP. We're hoping that our new Netapp will assist us in addressing these challenges.
Over the next three years we'd like to bring down the number of servers, reduce the core growth rate and rather focus on more powerful cores with very large RAM footprints, probably in excess of 200GB. We're anticipating disk growth towards 50 to 75 TB, although some of this will be data copied from other institutes and will not fall within our disaster recovery strategy.