We recently deployed RAxML to our cluster. RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel maximum likelihood based inference of large phylogenetic trees. It has 3 different modes of operation: Sequential on one CPU only, MPI where jobs distribute work across multiple bootstraps (instantiations), Pthreads which gives most consistency and is well load balanced (the more threads the faster the job) and Hybrid which gives better options than MPI but is still dependent on the number of bootstraps.
As expected the hybrid methods converge as the number of threads increase. The Hybrid TxJ slows down as the number of bootstraps for this job was set at 2 and hence increasing the number of jobs does not improve performance.
Our best times were produced on the GPU series as RAxML is highly memory intensive. As the memory chips in the GPU servers are faster than the 600 series (1.6GHz vs 1.3GHz) the jobs ran faster. Please note that the RAxML jobs were not running on the GPU cards but on the CPUs in the GPU servers. From the graph below this software ranks in our top 10 for both hours per job and memory consumed per job:
NB, CUDA and MATLAB above indicate software written in that application and not the application itself.