The new GPU nodes have been installed into our POC cluster and are currently running jobs. We installed the servers into our SLURM cluster to allow us to develop hands on experience in provisioning SLURM partitions with GPU resources, something our current Torque\Maui production cluster cannot do.
We chose DELL PowerEdge C4140 servers which can house up to four GPU cards. The servers are populated with four Tesla P100 cards. In addition each server has 32 cores and 128GB of RAM. This increases our GPU count by 16.
The cards were part of a grant for computer science\chemistry research and are reserved for this purpose. At a later stage this partition may be made available to other research groups on a limited basis, however this will free up other GPU cards.
Currently the server interconnect is limited to 1GB ethernet, however later this month we will be connecting it to our infiniband fabric.
For those who been using our cluster for a while we now have more GPU cards than we had cores when we started.