We've been playing about with Amazon Web Service cloud based
infrastructure for the last few weeks and thought we'd share this.
Using StarCluster and following this very cool demo we launched a 3 node cluster
on Amazon’s Elastic Compute Cloud infrastructure within the space of
about 10 minutes.
You'll need an Amazon account (i.e. you give Amazon your credit card
details) and create a private key pair via their website. You then
launch a small machine instance of StarCluster to act as the coordination
server. Installation of StarCluster is a single command. To configure
the 'personality' of the cluster one modifies the configuration file
(about 2 lines). To set up permissions you need to copy your key to the
coordination server and reference this key in the config file.
Starting the cluster is also one command, as is adding or deleting
nodes. In about 5 minutes we had a head node and a worker node. We decided to add a second worker node which took about 4 minutes. StarCluster uses Oracle
Grid Engine and works in a very standard manner. We were able to
compile and submit an OpenMPI job in the space of about 60 seconds and
it ran perfectly.
However the coolest bit is the ability to 'bid' for Amazon spot
instances. StarCluster
allows one to look at the spot instance price history over time to allow
you to make an informed bid in $/hour. You can launch your cluster
based on this bid and as resources become available so your nodes are
provisioned. As long as your bid is (just) over the average price
you're pretty much guaranteed a reasonably priced compute cluster for as
long as you want.
Above is the current hourly value of an 8 core Nehalem server with two 2050 NVIDIA GPU cards.
We're now looking at ways to scale our current (and pending) infrastructure into this space.