Wednesday, 18 August 2010

Playing with NFS & GlusterFS on Amazon cc1.4xlarge EC2 instance types

I wished I had time to do stuff like what they do at bioteam.
Benchmarking the Amazon cc1.4xlarge EC2 instance.

These are the questions they aimed to answer

We are asking very broad questions and testing assumptions along the lines of:
  • Does the hot new 10 Gigabit non-blocking networking fabric backing up the new instance types really mean that “legacy” compute farm and HPC cluster architectures which make heavy use of network filesharing possible?
  • How does filesharing between nodes look and feel on the new network and instance types?
  • Are the speedy ephemeral disks on the new instance types suitable for bundling into NFS shares or aggregating into parallel or clustered distribtued filesystems?
  • Can we use the replication features in GlusterFS to mitigate some of the risks of using ephemeral disk for storage? 
  • Should the shared storage built from ephermeral disk be assigned to “/scratch” or other non-critical duties due to the risks involved? What can we do to mitigate the risks?
  • At what scale is NFS the easiest and most suitable sharing option? What are the best NFS server and client tuning parameters to use? 
  • When using parallel or cluster filesystems like GlusterFS, what rough metrics can we use to figure out how many data servers to dedicate to a particular cluster size or workflow profile?

