MPI Jobs in Amazon EC2?

tell's picture


Hi All, 

 I have been thinking about this for a while.  Running serial jobs in a cluster that contains Amazon EC2 servers makes sense...I can see a good use case for it, but what about Parallel jobs running in Amazon EC2 servers?

Some people already have created beowulf clusters and run MPI in Amazon EC2 such as this blog  Data Wrangling Blog.  It seems to work, well at least according to the blog but does it really make sense?  I mean for testing or learning purposes I'm sure it is okay but compared to a 'real cluster' the performance sucks.  What I think is needed is enhancements to the virtualization layer (and probably the kernel) on Amazon EC2 so that effectively the MPI message passing 'bypasses' all of the layers of software and copying that are needed to send a message from one MPI process to another.  Removing all of that extra overhead will dramatically improve the MPI performance in virtual machines but it will still be slow since, as far as I know the Amazon EC2 machines are physically connected via ethernet....and ethernet is good for small MPI jobs but once again, terrible for large MPI jobs.

So my 'hunch' is that running simple data parallel jobs on Amazon EC2 makes alot of sense and I am sure some people are already doing it but running real parallel.....well right now it is pretty disappointing.  A small 'real' cluster will run circles around a larger Amazon EC2 cluster when running MPI jobs.

 

0

Comments

Ya, even I was thinking

Ya, even I was thinking about the inter-connects that are used in this type of clusters. Can unicluster or amazon explain what type of inter-connects are used between the nodes?