Migrating your servers to Amazon EC2: Instance sizing

One of the central tenets of cloud computing it's a cheap way to run large-scale compute jobs. If you're more concerned about starting small, and want to tackle the problem of growing big when you get to it¹, then there's still a solution for you, though it might not be quite like the one you're used to.

If you're currently running on a hosted, virtualized platform, you are probably in one of two situations:

Your hosting provider buys servers for you, and runs something like VMware ESX on them
You're dealing with a VPS provider

If you're in the former bucket, as we were, you have a pretty fine-grained control over your instance (virtual server) scaling. You can add more CPU power (or weight one instance to be allowed to burst at the expense of others), and control, sometimes to the megabyte, how much memory is available to your application.

When you're in the latter bucket, you tend to get a number of discrete plans (such as the ones Linode offer), but your provider has a human element, and if you ask nicely, you can probably get a plan with very low disk but very high memory, by paying a little bit extra (RimuHosting tends towards the confusing with the amount of choice they offer!)

Amazon EC2, being an entirely automated provider, doesn't give you the option to customize your plans. They offer a selection of instance sizes, at various prices. Those are the choices, take or leave them.² Because of the ease of creating and using multiple machines, and the relatively low extra cost,³ you have to consider if the cost of scaling up is best for you, compared to the cost of scaling out.

Our applications ran almost exclusively on 32-bit machines. There are a number of reasons, in both theory and practice, why 64-bit may not be for you: lack of vendor support, having to maintain software packages for both 32- and 64-bit architectures, slower performance/more memory use for simple tasks, etc. I prefer to stay with 32-bit across the board, which also suggests horizontal scaling. If your application benefits from 64-bit computing, then you have a slightly different problem to the one I had, and your mileage will vary.

Some figures

Consider, for example, the 'default' instance for new users, the m1.small:

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)

This instance costs 8.5c/hour to run.

Side note: With the launch of Canonical's new Ubuntu Server 10.10, they're announcing a "Try Ubuntu Server on our dime" promotion. It's worth noting that they get 1.5c change for that dime. 🙂

The next option up gives you about four times the performance, for about four times the cost. However, you don't get too much insight into what four times "Low" IO performance is, vs "High", and you don't get any redundancy. We decided that we'd rather have two small instances in one AZ, and two in another, to build resilience into our infrastructure for free.

It soon dawned on us that 1 "EC2 Compute Unit", which they claim is currently roughtly equivalent to a "1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor", is roughly equivalent to "not good enough for PHP".

The stolen generation

When you use VMware, you get given a virtual CPU, with a speedo that goes from 0 to 100. With Xen (which is the hypervisor used by Amazon EC2), you can be given a certain percentage of the cycles on the parent CPU, but the gauge you see goes up to the number of cycles you are allowed on the parent CPU, not a percentage of a virtual CPU.

The practical upshot of this is that you end up seeing your CPU maxing out at a certain value (for us, around 40%) - but the other 60% of the cycles are stolen from you to feed other, more deserving, instances. This blog post from Axibase neatly sums up the issues, with graphs. You will see stolen CPU cycles in a new column in 'top':

Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 96.1%id,  0.1%wa,  0.0%hi,  0.0%si,  2.4%st

Not all tools are aware of steal time: you will see stolen ticks in vmstat -s, but not in the tabular vmstat output. You must have Xen-aware tools in order to get this information; Ubuntu provides them out of the box.

Thankfully, there happens to be a suitable instance for us. Doubling the price from 8.5c to 17c/hour, we get the c1.medium instance:

1.7 GB memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)

This one is twice the price of the m1.small, but has 5 times the CPU. A worthwhile upgrade, and all of a sudden our Apache benchmarks are back up where we expect them.

You might have noticed that both the previous instances have a relatively small 1.7 GB of memory. Want more? You're moving up to 7GB, minimum. If you want to stay with a small application, on 32-bit platform, the c1.medium instance is about where the line ends. We would love an instance type with 4GB of RAM; if you agree, please make yourself known to Amazon. The more customer demand they get, the more likely they are to implement it.

If we get to the point where it suits us, for example, to run two Size 4 machines, rather than eight Size 1 machines, we may consider moving to larger instances; we would save a little on EBS disks and inter-AZ transfer costs, but then a failure on one machine will mean we lose half of our hosting potential, rather than one eighth.

Planning for growth

You don't need to know all this up-front. If an instance is lacking in resource, upgrade it for a bigger/better one. Right?

Earlier in the history of EC2, you couldn't upgrade an instance, because root disks was on the instance-only, or ephemeral, store. If you step back and think of EC2 as actually being a room full of servers, each machine has some (presumably) local hard disk space. That is the ephemeral ("short-lived"; from the Greek for "one day") store. It's big, and it's free. However, when you turn your instance off, it's wiped.

In contrast, EBS is permanent, paid, network-attached storage (think iSCSI).

Before late 2009, your only option was to turn off your current instance, and then spin up a new one, from your template image (AMI). Then, AWS announced an upgrade, which allows you to boot an instance from an EBS disk. This means you can turn your instance off, and the root file system stays there waiting. You can use the web services to instruct that instance to be bigger, or smaller, when it returns. Because of the obvious usefulness of this, and the relatively low cost of a 10GB root disk, we're running all our instances on an EBS root.

When you upgrade your EBS-root instances, you are causing them to change class, which generally means, bring them up on a new physical host machine.

This means one obvious thing:

Your ephemeral disk (/mnt) is wiped

And two "less obvious" things:

Your internal IP address will change
Your internal IP address will change

Technically speaking that's only one "less obvious" thing, but I thought it was such a big one, I thought it was worth mentioning twice.

If you have an elastic IP address attached to that instance, your external IP address will remain the same. However, your instance is now on a different physical host, with a different physical NIC in its host, so it will get a new IP address. For someone who is running a traditional application without cloud-awareness, changing IP can be something which requires thought. For example, if you are upgrading your private DNS server, you will have a problem here. You can't know what the IP address will be before you upgrade, so make very sure you have moved all the services off this machine before you upgrade it. Then, get the new connection details from the console, and reconnect.

As every machine needs an internal IP address, and they are not scarce (Amazon provides them from 10.0.0.0/8, meaning there should be no problem up to about 16 million instances), something that is really missing from EC2 for the "always on" use case we run is static internal IP addresses. Fire up your request e-mails.⁴

I think I pretty much wrote Rework there! ↩
Amazon do often add new instances sizes, so the information in this article may one day be superseded. ↩
In the case of non-EBS instances in the same AZ, there should be no extra cost. ↩
I even offer a suggestion on how to implement them: 5 for free per customer, and one more for every reserved instance you buy. Then, they're issued by DHCP to any instance ID they are registered to. ↩

This entry was posted on Monday, October 11th, 2010 at 3:37 pm and is filed under Amazon Migration, Technical, Work. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Off the record - Craig Box