Craig Box's journeys, stories and notes...


Migrating your servers to Amazon EC2: Initial design considerations

From Powerhouse Museum on Flickr

Cloud architecture!

Even without making major changes to your application, you can make Amazon EC2 work for you.

Here are some things that I considered when designing our new setup:

No single point of failure

Any one machine should be able to go down - as Amazon CTO Werner Vogels says, "everything fails, all the time".  Guaranteed failure makes you think. The parts of the site that are identified as being most important should be able to run even if an entire datacentre fails.

Thankfully, EC2 makes this simple. Availability Zones (AZs) have been described to me as far enough apart that a disaster at one will not affect the other, but close enough that an engineer can drive between them in a reasonable time.

In my experience, the difference in ping times between our eu-west-1a instances and our eu-west-1b instances is less than 1ms. You do pay a "regional data transfer" rate of $0.01/GB  for transfer between instances in different AZs in the same region. At that price, it is cost-effective for us to run the system across two AZs. Our load balancing doesn't care which zone the machines are in, so even if one zone fails, then the site is still reachable.

No wasted cycles

You can turn on a machine and turn it off as you see fit; assuming you have an EBS-root instance (and you should), you only pay for the disk while the machine is off.  You can also attach that disk to a more powerful instance, should you have a need for a short-term boost of computing power!

Further to that, if we have a second machine running for failover purposes, it should be serving traffic, so that when we're in our good state, we have twice the performance available to us.

No private networking

Amazon network access is controlled by security groups. Instances are assigned to a security group at startup. You can then do things like say "proxy servers may access web servers on port 80", "the public may access proxy servers on port 443", "my office may access everything on port 22".

While Amazon instances know about security groups, your applications don't.  You can't allow access to something from the public Internet, and allow more access to it from a nominated network range, on the same port. I'll touch on this more when talking about security and mail servers later in this series.

Amazon offers a Virtual Private Cloud, which allows you to put more machines behind your firewall via an IPsec VPN.  It comes with an important proviso that is missing to many first-time readers: you can't access a VPC instance directly from the Internet. There's no way to use VPC as a management VPN, but have the instances on the public Internet - unless you want to accept traffic for those instances on your own servers, in which case you should have more redundant network connectivity than Amazon has, and you now pay for traffic in two places.

You can, of course, run a VPN server on your EC2 instances, or you can require your users have a VPN connection to your office, in order to get trusted access to your EC2 servers.

Size your instances as necessary

We started trying to run as many of our instances as we could on the smallest type (the m1.small), and quickly hit its limitations. However, remember that resizing instances isn't difficult. I'll touch on this later as well.

Use the right levels of redundancy

You can get a lot of benefits if you rethink your application and build it with the cloud in mind, but you can still get a great cost saving and a faster application just by treating EC2 as a big VM farm. For example, we're not using S3 at all, and barely using EBS.

Our root disks are on EBS, but our data is mostly replicated across multiple nodes, so using the ephemeral store - which is otherwise wasted - was perfect for us. Why pay extra to store a Mercurial repository, which has to be in sync across four machines, when each other machine already has a consistent copy by default?

Automate everything

You can register your own disk image (AMI) which you can create instances of.  By using a combination of configuration management and locally-developed deployment scripts, we haven't yet had the need to do this.

For us, firing up a new instance involves running a script with a wanted hostname and the instance ID we're given when we create it.  This will add the machine to the DNS, SSH to it, install Puppet, register with our puppetmaster and install the machine to the current spec. Our machines auto-register with our monitoring servers.

Once something is totally automated, it can be done automatically, as a result of an external stimulus. For example, when our ELB detects a spike of traffic to the site, you can have it auto-scale and create new instances in response. Even if you don't think you need this now, if you design your system right from the beginning, you're well placed to introduce it later.

Employ the principles of structured system management and your EC2 environment will pass the Joel Test for System Administrators.

One Response to “Migrating your servers to Amazon EC2: Initial design considerations”

Leave a Reply