Craig Box's journeys, stories and notes...


Archive for October, 2010

Clustering an Amazon Elastic IP address

Wednesday, October 27th, 2010
Balls of red elastic bands

If you have a problem that Amazon's Elastic Load Balancing can't solve, you might want to do the old fashioned "two machine IP failover" cluster.

Amazon instances only have one internal, and one external, IP address at a time. Consider this:

  • Instance 1: 256.256.256.4 [Elastic IP]
  • Instance 2: 257.257.257.8

If you claim the elastic IP on instance 2, then a new IP will be allocated to instance 1:

  • Instance 1: ¿?
  • Instance 2: 256.256.256.4 [Elastic IP]

You won't know what it is unless you query the web services, or look at the console, for instance 1. Be sure you are aware of the implications of this before proceeding.

I found a forum post from Alex Polvi which, with some tidying, does the job nicely. When the slave node realises that its master mate has gone offline, it will claim the IP address; when the master returns, you can have the master claim it back, or you can have the slave just become the new master.

Claiming the shared/elastic IP

Your script needs a command that the master machine can call to claim the elastic IP address.  Alex's example uses Tim Kay's 'aws' script, which doesn't require Java like the official Amazon ec2-utils.

You need /root/.awssecret to contain the Access Key ID on the first line and the Secret Access Key on the second line:

AK47QWERTY7890ASDFG0H
01mM4Rkl4RmArkLArmaRK14rM4rkL4MarKLar

You can now test this:

$ export AWS_PARAMS="--region=eu-west-1"
$ export ELASTIC_IP=256.256.256.4
$ export MY_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
$ aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID"

The MY_ID command uses the instance data service to get the instance ID for the machine you're running on, so you can use this script, unedited, on both machines.

This should claim the IP 256.256.256.4 for the instance on which the script is run.

In order for Heartbeat to be able to use this script, we need a simple init script. When run with 'start' it should claim the IP, and when run with 'stop' it should relinquish it. You will need to edit the parameters at the top (or better yet, put them in /etc/default/elastic-ip and source that in your file).  Remember to ensure this script is executable.

/etc/init.d/elastic-ip

#!/bin/bash
DESC="elastic-ip remapper"
MY_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
ELASTIC_IP="256.256.256.4"
AWS_PARAMS="--region=eu-west-1"

if ! [ -f ~/.awssecret ] && ! [ -f /root/.awssecret ]; then
    echo "$DESC: cannot find ~/.awssecret or /root/.awssecret"
    exit 1
fi

case $1 in
    start)
        aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID" > /dev/null
        [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP associated with $MY_ID || echo $DESC: Could not map IP $ELASTIC_IP to $MY_ID
        ;;
    stop)
        aws $AWS_PARMAS disassociate-address "$ELASTIC_IP" > /dev/null
        [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP disowned || echo $DESC: Could not disown $ELASTIC_IP
        ;;
    status)
        aws $AWS_PARAMS describe-addresses | grep "$ELASTIC_IP" | grep "$MY_ID" > /dev/null
        # grep will return true if this ip is mapped to this instance
        [ $? -eq 0 ] && echo $DESC: I have $ELASTIC_IP || echo $DESC: I do not have $ELASTIC_IP
        ;;
esac

Heartbeat

Each server needs the heartbeat package installed:

$ apt-get install heartbeat

Allow heartbeat traffic between your instances:

$ ec2-authorize $group -P udp -p 694 -u $YOURUSERID -o $group # heartbeat

Heartbeat is configured by three files, all in /etc/ha.d, and in our case, all identical on both servers:

authkeys

auth 1
1 sha1 foobarbaz

The authkeys page on the heartbeat wiki offers a script to help generate a key.

ha.cf

# Log to syslog as facility "daemon"
logfacility daemon 

# List of cluster members by short hostname (uname -n)
node server1 server2

# Send one heartbeat each second
keepalive 1 

# Declare nodes dead after 10 seconds
deadtime 10 

# internal IP of the peer
ucast eth0 10.256.256.4
ucast eth0 10.257.257.8

# Fail back, so we're normally running on the primary server
auto_failback on

All pretty self-explanatory: set your own 'node' and 'ucast' entries with your hostnames and internal IP addresses. Even when the external IPs are bouncing around, the internal IPs should stay the same. auto_failback is optional, as mentioned above. Read the docs for more options.

haresources

server1 elastic-ip

Here, we set up a link between the  primary server (server1) and the script we want to run (elastic-ip). The wiki shows you what else you can do.

Putting it all together

Start heartbeat on both nodes, and server1 should claim the IP address.  Stop heartbeat on server1 (or if server1 crashes), and server2 will notice after 10 seconds  and claim the IP address. As soon as server1 is back up, it should claim it back too. You can run /etc/init.d/elastic-ip status to prove this:

server1:~$ sudo /etc/init.d/elastic-ip status
elastic-ip remapper: I have 256.256.256.4
server2:~$ sudo /etc/init.d/elastic-ip status
elastic-ip remapper: I do not have 256.256.256.4

Whatever happens, your elastic IP will always point to a good instance!

Postscript: what Heartbeat will not do

Heartbeat will notice if a server goes away, and claim the IP. However, it will not notice if a service stops running but the machine stays alive. Your good work may all be for nothing!

To solve this, I suggest monit, or if you're a ruby fan, bluepill. These will monitor a service, and restart it if it is not responding.

Migrating your servers to Amazon EC2: Load balancing

Monday, October 25th, 2010
Refrigerator

When you run a large web site, you probably have a number of machines, across a number of different availability zones, but you need to present a single URL to the user. You distribute the load between your machines with (a redundant pair of) load balancers, and point your DNS to the floating IP of the balancers.

A number of options for doing similar exist for Amazon EC2 users: as a good balance between convenience and performance, we chose to use Amazon's Elastic Load Balancing (ELB) service offering, with a caveat listed below. While a good default position, this may not be for you; check the bottom of this article for some resources to help you choose.

ELB has some great features. As well as the regular load balancer feature of tracking of which backend instances are up, it proactively adds extra capacity (which I term 'nodes', so as not to get confused with backend instances) in the event of increasing load. You can also set ELB up to spin up more backend instances in the case of there not being enough to serve your requests. All this for a small per-hour and per-GB cost.

Side note: You may be thinking "Why not use round robin DNS, and put the IPs of more than one server?" This is a trap for young players; you actually make things worse, because any one of N machines failing means there's a 1/N chance a request goes to a broken instance.  There's a good writeup on Server Fault if you want more information.

Then and now

In the old world, our site sat behind a hardware load balancer appliance.  Being that we were using a shared device at a co-location provider; I never saw it, and thus can't give you the exact details: but the important part of this story is that when traffic got to our instance, its source IP was still set to the IP of the sender, or at least the last proxy server it went through on it's travels. This matters to us, because, just like Phil Zimmerman's brain, some of Symbian's code is export controlled, due to containing cryptographic awesomesauce. We need to know the source IP of all requests, in case they are requesting our restricted areas.

When you're in EC2, you're operating under their network rules which "will not permit an instance to send traffic with a source IP or MAC address other than its own". This also applies to the instances that run the ELB service. If you set up an ELB, your backend servers will see all their traffic coming from the IP addresses of your ELB nodes, telling them nothing about where it came from before that.

The story that is falling into place largely revolves around the X-Forwarded-For header, which is added to HTTP transactions by proxy servers. Our back-end servers are told the packet arrived from the load balancer, but if you tell ELB that it's using the HTTP protocol on this port, it adds the X-F-F header automatically: the backends can then look at the most recently added entry to the X-F-F and learn the source IP as the ELB knew it.1

Because the load balancer sits between the client and the server, who are either end of an encrypted transaction, it can't rip open a HTTPS packet and add an arbitrary header. So, we had a Heisenproblem: it was not possible to know where something came from, and have that same something happen securely. And, stuff you are only giving to certain allowed people is exactly the sort of stuff you probably want to distribute over SSL.

There were two possible solutions to this:

  1. Direct secure traffic directly to a backend instance
  2. Wait for Amazon to implement SSL termination on ELB

In order to go live, we did #1. It came with a bunch of downsides, such as having to instruct our cache to redirect requests for certain paths to a different URL, such that if you requested site.example.org/restricted, you were taken to https://site-secure.example.org/restricted. "But what happens when that server goes down?", you say! When I planned this article, it was going to include a nice little description of how we got Heartbeat sharing an elastic IP address, so that we always had our "secure" IP pointing to whichever one of (a pair of) our servers which was up. It's a useful trick, so I'll come back to it later.

However, I'm pleased to announce that since then, Amazon have introduced #2: support for SSL termination, so you can upload your certificate to your load balancer, and then it can add the X-F-F header to your secure packets, and you don't need to worry about it any more.2

I was similarly going to have to worry about how to handle database failover in EC2, but they introduced that between me looking and go-live. I surmise that if you wait long enough, Amazon will do everything for you, and now delay introducing anything! 🙂

Now we know all that, let's dig a little deeper into how ELB works.

A Little Deeper

Amazon is all about the short-TTL DNS. If they want to scale something, they do so, and change what their DNS server returns when you query it.

When you register an ELB, you get given a DNS name such as lb-name-1234567890.eu-west-1.elb.amazonaws.com. You're explicitly warned to set your chosen site name as a CNAME to this; and indeed if you use the IP as it stands now, one day your site will break (for reasons you will learn below.)

First oddity with this setup: you can't CNAME the root of a domain, so you have to make example.org a redirect to www.example.org, preferably one hosted somewhere outside the cloud, as example.org needs to be an A record to an IP address. Some DNS providers have a facility for doing redirects using their own servers, which is an option here.

If you were to query that DNS record you would find that it has a 60 second TTL; thus if you query it twice, 2 mins apart, and you have more than one ELB node3 you may, at the discretion of Amazon's algorithms, get different results.  Try this:

$ dig lb-name-1234567890.eu-west-1.elb.amazonaws.com
lb-name-1234567890.eu-west-1.elb.amazonaws.com. 60 IN A 256.256.256.4
$ dig lb-name-1234567890.eu-west-1.elb.amazonaws.com @8.8.8.8
lb-name-1234567890.eu-west-1.elb.amazonaws.com. 60 IN A 257.257.257.8

Dude, where's my balancing?

When you register an ELB, you tell it the availability zones it should operate it. Each AZ has at least one ELB node, and that node will route you to instances in its own AZ, unless there are none available. That, along with the fact you are pseudo-randomly given a IP (with a minimum 60 second TTL), leads to a non-obvious conclusion. This actually happened to us - our policy is that odd numbered servers are in -1a, and even numbered servers are in -1b.

external:~$ ab -n 10 http://lb-name-123.eu-west1.elb.amazonaws.com/test.txt
web1:~$ wc -l /var/log/apache2/access.log
10 /var/log/apache2/access.log
web2:~$ wc -l /var/log/apache2/access.log
0 /var/log/apache2/access.log
Lop-sided load

That is to say: if your servers are in multiple availability zones4, a single user doing requests in quick succession isn't load-balanced across your backend instances, so ELB doesn't appear to be working at all. Thankfully, it is, you just can't see it, because you're not looking from enough places at once. ELB is designed to work for a widely distributed client base, and in that case, you should expect about half the traffic on one instance, and half on the other. If you ran this test from a different location, you might see all 10 requests go to web2.

If you ask Amazon5, they can change the DNS for an ELB so that it presents all the IP addresses associated, not just one of them. This means your client has the choice to pick the IP each time it connects, and depending on how your application works, may be better for test servers.

OBEY THE TTL

The prime reason to use an ELB is that Amazon can transparently add more computing power to support your load if needed.  The converse of that is that when it is no longer needed, it will be removed. It bears mention that if they take an IP address out of the DNS, it will last at least 60 minutes before being taken out of service. Not everyone obeys a TTL on a DNS zone!

To reiterate: don't ever take what the name currently resolves to, and use that IP.  It's not yours and one day it will break.

Further reading

For this article, I have touched on some of the interesting parts of ELB. I didn't feel I needed to write a general introduction, as there are already several good resources out there:

Check back later for talk about databases, storage, security, mail and more!

  1. If you're worried about people spoofing the X-F-F, you can trust that the most recently added entry was yours, and throw away all the rest. 
  2. It's like my boss knew I'd been sitting on writing this post, and just had to pip me to the post! 
  3. Due to having more traffic than one node can service, or being hosted in more than one AZ 
  4. A good practice if you're trying to mitigate site failure - see "No single point of failure". 
  5. You may have to have commercial support for them to do this. 

Migrating your servers to Amazon EC2: Instance sizing

Monday, October 11th, 2010

One of the central tenets of cloud computing it's a cheap way to run large-scale compute jobs. If you're more concerned about starting small, and want to tackle the problem of growing big when you get to it1, then there's still a solution for you, though it might not be quite like the one you're used to.

If you're currently running on a hosted, virtualized platform, you are probably in one of two situations:

  • Your hosting provider buys servers for you, and runs something like VMware ESX on them
  • You're dealing with a VPS provider

If you're in the former bucket, as we were, you have a pretty fine-grained control over your instance (virtual server) scaling. You can add more CPU power (or weight one instance to be allowed to burst at the expense of others), and control, sometimes to the megabyte, how much memory is available to your application.

When you're in the latter bucket, you tend to get a number of discrete plans (such as the ones Linode offer), but your provider has a human element, and if you ask nicely, you can probably get a plan with very low disk but very high memory, by paying a little bit extra (RimuHosting tends towards the confusing with the amount of choice they offer!)

Social & Business Card Sizes

Amazon EC2, being an entirely automated provider, doesn't give you the option to customize your plans. They offer a selection of instance sizes, at various prices. Those are the choices, take or leave them.2 Because of the ease of creating and using multiple machines, and the relatively low extra cost,3 you have to consider if the cost of scaling up is best for you, compared to the cost of scaling out.

Our applications ran almost exclusively on 32-bit machines. There are a number of reasons, in both theory and practice, why 64-bit may not be for you: lack of vendor support, having to maintain software packages for both 32- and 64-bit architectures, slower performance/more memory use for simple tasks, etc. I prefer to stay with 32-bit across the board, which also suggests horizontal scaling.  If your application benefits from 64-bit computing, then you have a slightly different problem to the one I had, and your mileage will vary.

Some figures

Consider, for example, the 'default' instance for new users, the m1.small:

  • 1.7 GB memory
  • 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)

This instance costs 8.5c/hour to run.

Side note: With the launch of Canonical's new Ubuntu Server 10.10, they're announcing a "Try Ubuntu Server on our dime" promotion. It's worth noting that they get 1.5c change for that dime. 🙂

The next option up gives you about four times the performance, for about four times the cost. However, you don't get too much insight into what four times "Low" IO performance is, vs "High", and you don't get any redundancy. We decided that we'd rather have two small instances in one AZ, and two in another, to build resilience into our infrastructure for free.

It soon dawned on us that 1 "EC2 Compute Unit", which they claim is currently roughtly equivalent to a "1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor", is roughly equivalent to "not good enough for PHP".

The stolen generation

Speedometer

When you use VMware, you get given a virtual CPU, with a speedo that goes from 0 to 100.  With Xen (which is the hypervisor used by Amazon EC2), you can be given a certain percentage of the cycles on the parent CPU, but the gauge you see goes up to the number of cycles you are allowed on the parent CPU, not a percentage of a virtual CPU.

The practical upshot of this is that you end up seeing your CPU maxing out at a certain value (for us, around 40%) - but the other 60% of the cycles are stolen from you to feed other, more deserving, instances. This blog post from Axibase neatly sums up the issues, with graphs. You will see stolen CPU cycles in a new column in 'top':

Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 96.1%id,  0.1%wa,  0.0%hi,  0.0%si,  2.4%st

Not all tools are aware of steal time: you will see stolen ticks in vmstat -s, but not in the tabular vmstat output. You must have Xen-aware tools in order to get this information; Ubuntu provides them out of the box.

Thankfully, there happens to be a suitable instance for us.  Doubling the price from 8.5c to 17c/hour, we get the c1.medium instance:

  • 1.7 GB memory
  • 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)

This one is twice the price of the m1.small, but has 5 times the CPU. A worthwhile upgrade, and all of a sudden our Apache benchmarks are back up where we expect them.

You might have noticed that both the previous instances have a relatively small 1.7 GB of memory. Want more? You're moving up to 7GB, minimum.  If you want to stay with a small application, on 32-bit platform, the c1.medium instance is about where the line ends. We would love an instance type with 4GB of RAM; if you agree, please make yourself known to Amazon. The more customer demand they get, the more likely they are to implement it.

If we get to the point where it suits us, for example, to run two Size 4 machines, rather than eight Size 1 machines, we may consider moving to larger instances; we would save a little on EBS disks and inter-AZ transfer costs, but then a failure on one machine will mean we lose half of our hosting potential, rather than one eighth.

Planning for growth

You don't need to know all this up-front. If an instance is lacking in resource, upgrade it for a bigger/better one. Right?

Earlier in the history of EC2, you couldn't upgrade an instance, because root disks was on the instance-only, or ephemeral, store.  If you step back and think of EC2 as actually being a room full of servers, each machine has some (presumably) local hard disk space. That is the ephemeral ("short-lived"; from the Greek for "one day") store. It's big, and it's free. However, when you turn your instance off, it's wiped.

In contrast, EBS is permanent, paid, network-attached storage (think iSCSI).

Before late 2009, your only option was to turn off your current instance, and then spin up a new one, from your template image (AMI). Then, AWS announced an upgrade, which allows you to boot an instance from an EBS disk. This means you can turn your instance off, and the root file system stays there waiting.  You can use the web services to instruct that instance to be bigger, or smaller, when it returns. Because of the obvious usefulness of this, and the relatively low cost of a 10GB root disk, we're running all our instances on an EBS root.

When you upgrade your EBS-root instances, you are causing them to change class, which generally means, bring them up on a new physical host machine.

This means one obvious thing:

  • Your ephemeral disk (/mnt) is wiped

And two "less obvious" things:

  • Your internal IP address will change
  • Your internal IP address will change

Technically speaking that's only one "less obvious" thing, but I thought it was such a big one, I thought it was worth mentioning twice.

If you have an elastic IP address attached to that instance, your external IP address will remain the same.  However, your instance is now on a different physical host, with a different physical NIC in its host, so it will get a new IP address. For someone who is running a traditional application without cloud-awareness, changing IP can be something which requires thought. For example, if you are upgrading your private DNS server, you will have a problem here. You can't know what the IP address will be before you upgrade, so make very sure you have moved all the services off this machine before you upgrade it. Then, get the new connection details from the console, and reconnect.

As every machine needs an internal IP address, and they are not scarce (Amazon provides them from 10.0.0.0/8, meaning there should be no problem up to about 16 million instances), something that is really missing from EC2 for the "always on" use case we run is static internal IP addresses. Fire up your request e-mails.4

  1. I think I pretty much wrote Rework there! 
  2. Amazon do often add new instances sizes, so the information in this article may one day be superseded. 
  3. In the case of non-EBS instances in the same AZ, there should be no extra cost. 
  4. I even offer a suggestion on how to implement them: 5 for free per customer, and one more for every reserved instance you buy.  Then, they're issued by DHCP to any instance ID they are registered to. 

Migrating your servers to Amazon EC2: Initial design considerations

Friday, October 1st, 2010
From Powerhouse Museum on Flickr

Cloud architecture!

Even without making major changes to your application, you can make Amazon EC2 work for you.

Here are some things that I considered when designing our new setup:

No single point of failure

Any one machine should be able to go down - as Amazon CTO Werner Vogels says, "everything fails, all the time".  Guaranteed failure makes you think. The parts of the site that are identified as being most important should be able to run even if an entire datacentre fails.

Thankfully, EC2 makes this simple. Availability Zones (AZs) have been described to me as far enough apart that a disaster at one will not affect the other, but close enough that an engineer can drive between them in a reasonable time.

In my experience, the difference in ping times between our eu-west-1a instances and our eu-west-1b instances is less than 1ms. You do pay a "regional data transfer" rate of $0.01/GB  for transfer between instances in different AZs in the same region. At that price, it is cost-effective for us to run the system across two AZs. Our load balancing doesn't care which zone the machines are in, so even if one zone fails, then the site is still reachable.

No wasted cycles

You can turn on a machine and turn it off as you see fit; assuming you have an EBS-root instance (and you should), you only pay for the disk while the machine is off.  You can also attach that disk to a more powerful instance, should you have a need for a short-term boost of computing power!

Further to that, if we have a second machine running for failover purposes, it should be serving traffic, so that when we're in our good state, we have twice the performance available to us.

No private networking

Amazon network access is controlled by security groups. Instances are assigned to a security group at startup. You can then do things like say "proxy servers may access web servers on port 80", "the public may access proxy servers on port 443", "my office may access everything on port 22".

While Amazon instances know about security groups, your applications don't.  You can't allow access to something from the public Internet, and allow more access to it from a nominated network range, on the same port. I'll touch on this more when talking about security and mail servers later in this series.

Amazon offers a Virtual Private Cloud, which allows you to put more machines behind your firewall via an IPsec VPN.  It comes with an important proviso that is missing to many first-time readers: you can't access a VPC instance directly from the Internet. There's no way to use VPC as a management VPN, but have the instances on the public Internet - unless you want to accept traffic for those instances on your own servers, in which case you should have more redundant network connectivity than Amazon has, and you now pay for traffic in two places.

You can, of course, run a VPN server on your EC2 instances, or you can require your users have a VPN connection to your office, in order to get trusted access to your EC2 servers.

Size your instances as necessary

We started trying to run as many of our instances as we could on the smallest type (the m1.small), and quickly hit its limitations. However, remember that resizing instances isn't difficult. I'll touch on this later as well.

Use the right levels of redundancy

You can get a lot of benefits if you rethink your application and build it with the cloud in mind, but you can still get a great cost saving and a faster application just by treating EC2 as a big VM farm. For example, we're not using S3 at all, and barely using EBS.

Our root disks are on EBS, but our data is mostly replicated across multiple nodes, so using the ephemeral store - which is otherwise wasted - was perfect for us. Why pay extra to store a Mercurial repository, which has to be in sync across four machines, when each other machine already has a consistent copy by default?

Automate everything

You can register your own disk image (AMI) which you can create instances of.  By using a combination of configuration management and locally-developed deployment scripts, we haven't yet had the need to do this.

For us, firing up a new instance involves running a script with a wanted hostname and the instance ID we're given when we create it.  This will add the machine to the DNS, SSH to it, install Puppet, register with our puppetmaster and install the machine to the current spec. Our machines auto-register with our monitoring servers.

Once something is totally automated, it can be done automatically, as a result of an external stimulus. For example, when our ELB detects a spike of traffic to the site, you can have it auto-scale and create new instances in response. Even if you don't think you need this now, if you design your system right from the beginning, you're well placed to introduce it later.

Employ the principles of structured system management and your EC2 environment will pass the Joel Test for System Administrators.