Every time I set up a private installation of WebPageTest I struggle with trying to figure out what size instances I should be using for the WPT agents. If you make them too small, you risk having inconsistent or inaccurate data. If you make them too big, you’re wasting resources (and paying more than you need to). I decided to empirically run some tests comparing different Amazon instance sizes so I knew for sure which is the most consistent while still being cost sensitive.
Skip down to the conclusion if you just want my recommendation and don’t care about the details of how I came to it
Patrick Meenan (who runs WebPageTest.org) states that at least medium instance types should be used (“Medium instances are highly recommended for more consistent test results”). In the past I had experiemened with t2.micros and they seemed to do okay, so I decided to test a range of instance types, starting with the t2 tier.
Since WebPageTest is supposed to be mimicking what real users experience, a hunch would be to pick one that has similar configuration to your average desktop computer. These days most computers standard with dual cores, and at least 6 GB memory, so I predicted that would be ideal. However the testing task only requires using a common browser, which shouldn’t be using that many resources. I can see that my current browser window is using less than 100MB of memory, so would the additional GBs of memory even be utilized?
Trying t2.micro, t2.small, t2.medium
I tested by spinning up each instance size one at a time, then sending 4 urls to it to test, and repeating every 5 minutes until I had a couple hours of data. The sites I tested include a personal site of mine, Google, Yahoo, and Amazon. I’m keeping track of the load time (“document complete”) for each run and looking for inconsistencies.
Here are the specs of the EC2 instance sizes I used:
A few notes about my testing process:
- I was using a Chrome/Firefox/IE11 agent image: wpt-ie11-20140912 (ami-561cb13e), deployed from N. Virginia
- After the agent spun up, I discarded the first 2 test results, just in case there was any overhead in getting initialized or warmed up.
- Tests were ran with Cable-simulated connectivity, and the latest version of Chrome (39 at the time).
- I found previous comments about instance performance being affected by video, so video was enabled for all tests
Here are the results of each instance size, over time.
By looking at the data grouped by instance type, we’re hoping to see straight horizontal lines. Jagged lines mean the instances were jumping around with their results. It doesn’t appear to me that mediums are any more consistent than micros at this point.
You can also view these per website:
By looking at each URL at a time, we’re hoping to see all the data grouped together as close as possible. And for the most part, we do. There is a jump for Micro on the personal site, but since we see sometime similar for Medium on Yahoo, I’m not going to degrade the Micro. Another interesting finding is that you see more variability the larger your site is (eg. Google never strays beyond 1.4 – 1.6, while Amazon ranges from 6 – 10).
Considering all the data I was surprised to see a Micro instance performing about as consistent as a Medium. I got excited at the cost savings I was about to have by downgrading my existing agents (would be several hundred dollars/month!). But then I realized my test was flawed.
Testing Secure Sites
After running my first set of tests I unfortunately came across a comment by someone claiming that HTTPS leads to a bottleneck on the agents. None of my tests were accessing secure sites. Darn.
The reasoning behind this was that doing the extra processing required to encrypt/decrypt SSL traffic will limit your tests. Was it true that we’d be maxing out the processing power on these machines?
I ran a couple tests and monitored CPU usage directly from the task manager. I didn’t like looking at the CPU statistic from getTesters.php since it didn’t seem to be real time.
CPU Usage on Micro:
CPU Usage on Small:
CPU Usage on Medium:
If we were maxing out the micro and small agents on insecure traffic, there’s no way they’d be able to handle HTTPS.
I repeated my original tests using only secure sites on t2.small and t2.medium sizes.
By instance type, over time:
Secure Sites, Small:
Secure Sites, Medium:
Remember, by looking at the data grouped by instance type, we’re hoping to see straight horizontal lines. Jagged lines mean the instances were jumping around with their results.
Per secure website:
Personal Site (secure):
Remember, by looking at each URL at a time, we’re hoping to see all the data grouped together as close as possible.
By looking at the graphs you can see a huge performance difference between small and medium with my personal site over HTTPS. There is a huge blip on Yahoo-Medium, but I’m going to disregard that since it was only one. The personal site graphs proves that smalls are being bottlenecked by the secure processing, and you need to use an agent with more power.
From this data I’d recommend that if you’re going to be testing any traffic over HTTPS you should use a medium. However, I was about to be wrong once again.
Can’t use t2’s long term
I had just decided to use t2.medium sized agents and let them run over the weekend. When I got back I realized something dramatic had happened. Here’s a plot of the load time, speed index, and TTFB on a single page over a few days:
All of a sudden tests started taking about 3x as long to load pages. What happened??
I reached out to Patrick to see if he’d seen anything like that, and after going back and forth we realized it was because I was using a t2 tiered instance type. From Amazon:
“T2 instances are Burstable Performance Instances that provide a baseline level of CPU performance with the ability to burst above the baseline. Instances in this family are ideal for applications that don’t use the full CPU often or consistently, but occasionally need to burst (e.g. web servers, developer environments, and small databases).”
In other words, t2’s are by definition going to be unreliable since they get bursts of performance periodically. When I started my tests they were using all the allocated bursts, but once they ran out their real performance kicked in. It was obvious that t2’s were not going to work for me.
The bad news is that to go up to a fixed performance tier (ie, m3) was going to be more expensive, especially since they didn’t even offer the same specs as t2 tiers. Check out the difference in specs between t2 and m3:
Onto the m3 tier
Time to spin up some some m3 instances and test all over again. I didn’t have time to test several sites again, so I setup testing one url and swapped out agents in the middle – started with an m3.medium then transitioned to an m3.large:
Wow. Now there’s a big difference. The m3.large has an incredible consistency that I haven’t seen in any of my testing yet. It’s obvious that large is the way to go. I hate to accept that because they’re expensive. At $0.266/hour it’s going to increase my costs quite a bit!
Looking at the specs between a m3.large I wondered if I could use another tier of instances that would be cheaper, ie compute or memory optimized. And good news – there is. a c3.large instance has about the same specs (minus a little RAM) and is cheaper:
But let’s put it to the test:
Can you see the moment I switched from a m3.large to a c3.large? I can’t either. They perform identically as webpagetest agents and a c3 is less expensive.
I hope you just skipped down to this section, because that was a lot to go through. The simple answer for which EC2 agent size you should use: it depends.
- If you’re running just a couple tests per hour, on small HTTP sites, a t2.micro will be okay ($13/month)
- If you’re running just a couple tests per hour, on large or secure sites, you’ll need to use a t2.medium ($52/month)
- If you’re running lots of tests per hour, you can’t use t2’s – the most efficient agent will be a c3.large ($135/month)
I don’t know the magic number of tests per hour that you can get away with using a t3 instance (before they’re maxed out on bursts), so maybe someone else can do those tests. But if you’re attempting to get away with a burstable tier just be warned that you may eventually become inconsistent after enough tests.
I hate to be suggesting everyone spends money on a c3.large instance, as it’s not cheap, but it’s proven to be the best for all cases. I’ve been running one now for months, and it’s incredibly stable.
Note: Since running these tests, I see that a new generation of c4.large types have come out. However, they’re more expensive than the c3’s, and since c3’s are amazing, I’m not going to bother upgrading.
Hope that helps everyone decide! Feel free to ask questions/comment below.