Wednesday May 13, 2009
EC2 Variability: The numbers revealed
Measuring EC2 system performance
I've been spending a lot of time at Terracotta working on cloud deployments of Terracotta and the cloud in general. People have been asking me what the difference is running apps on the cloud, and specifically EC2. There are a number of differences (m1.small is a uni-processor machine!) but the number one answer is "Variability". You just cannot rely on getting a consistent level of performance in the cloud. At least not that I've been able to observe.
I decided to put EC2 to the test and examine three different areas that were easy to measure: disk I/O, latency, and bandwidth
Disk I/O
- Environment: EC2 m1.small
- File size: 10 MB (mmap() files)
- Mount point: /mnt
- Testing software: iozone
- Duration: almost 24 hours
As you can see the numbers can vary a good deal. This is on an otherwise completely quiescent virtual machine and with a 10 MB filesize, the tests themselves took almost no time to run. Most of the numbers actually look remarkably consistent with the exception of Random Reads. Those numbers are all over the place which you might expect from "random" but this looks to be a bit much. The numbers are actually pretty respectable and compare to about a 7200 RPM SATA drive. Certainly not the kind of machine you would use for performance benchmarks, but if you threw enough instances at a clustering problem, you could certainly get the job done.
Latency
- Environment: EC2 m1.small
- Datacenter: us-east-1b
- Testing software: smokeping
- Duration: about 20 hours
Here, where networking is involved between instances, things start to get a little bit more varied. The median RTT is 0.3506 ms which is about 3 times more latency than you would get on a typical gigabit ethernet network. You can see the numbers hover there for the most part but there is a tremendous amount of variability around that number. Smokeping shows outliers about 2 ms but I have seen numbers as high as 65 ms or worse in ad hoc tests. I don't know what happened at 4 a.m. on this graph but I'm glad I wasn't running a production application at the time. If you look closely, you can also see a few instances of packet loss which is something we don't usually experience on a production network. Again, this is on an otherwise quiescent machine. For comparison's sake, here is the smokeping graph between Terracotta's San Francisco and India office which is actually carrying a fair bit of traffic. This is a LAN to WAN comparison so the numbers are not going to look as exaggerated because they are running on a different scale, but in the EC2 instance, we can see more than 5 times the variability in latency, which we don't see on the WAN segment (or ever on any of my lab switches for that matter).
Bandwidth
- Environment: EC2 m1.small
- Datacenter: us-east-1b
- Testing software: nttcp
- Duration: about 24 hours
In this graph, we can see that the gigabit connection between EC2 instances is hardly gigabit at all. I would say the numbers may trend upwards of 600 Mbps on average but they fluctuate pretty wildly between real gigabit to barely faster than a 100 Mbps connection. This was run on an otherwise quiescent machine. In a real "production" environment we would expect much more consistency, especially if trying to run performance numbers.
Conclusions
It is pretty safe to say that there won't be any vendors publishing performance numbers of what they are able to achieve with their software running on the cloud unless they had no other choice. You can easily get much more consistent, faster numbers running on dedicated hardware. In our tests with Terracotta, we've seen that you just have to throw that much more hardware at the problem to get the same kinds of numbers. It stands to reason however as a uni-processor m1.small instance is just not going to be as powerful as our quad-core Xeons. In throwing more instances at the problem you start to introduce more of the network into the equation, which as you can see, is a rather variable quantity. Thankfully, the amount of latency introduced even in this case is no big deal for the Terracotta server, so I've been having a pretty good time running bigger and bigger clusters in the cloud.
Posted by Dave Mangot in Applications at 20090513 Comments[4]
Search This Site
Recent Entries
- A framework for running anything on EC2: Terracotta tests on the Cloud - Part 1
- A Trade Show Booth: Part 2 - The Puppet Config
- Intstalling Fedora 10 on a Mac Mini
- A Trade Show booth with PF and OpenBSD
- EC2 Variability: The numbers revealed
- Linksys WET54G, a consumer product?
- Choosing Zimbra as told to ex-Taosers@groups.yahoo
- Information Security Magazine Chuckle
- A SysAdmin's impressions of MacOS Leopard
- Worlds collide: RMI vs. Linux localhost
- Hello World