Craig Box's journeys, stories and notes...

Archive for the ‘Work’ Category

Cloud pricing is hard

Tuesday, September 6th, 2011

One of the many benefits to cloud computing is the pricing model. Following Amazon's lead, any provider worth their salt lists their per-hour pricing on their website, and that is the price you pay, regardless of what you use.1 Gone are the days where you have to call for a custom price list, tailored for you by a man in a suit who is incentivised to charge exactly the maximum he thinks you will pay, no more no less. This means startups can get hold of scalable infrastructure at economies previously only available to the canny corporate negiotiator.

However, even in the automated, API-driven present, there are still different models for pricing which you can choose from. For example, Amazon has an on-demand price, reserved instances (pay up front to buy the right to run a machine for a cheaper rate) and spot instances (an instance market, where you bid a price and if the spot price is below that price, your instance runs). While spot instances sound like a curiosity for people doing queue-based distributed computing that can be started and stopped at will, James Saull points out they turn out to be an oddly cost-effective way to run your always-on infrastructure. You may not like the risk, and you are not getting the guarantee of instance availability that comes with reserved instances.

For the general case, once you understand what your infrastructure requirements look like on Amazon, you buy suitable reserved instances: you then save 34% or 49% on the cost of running the equivalent on-demand instance over 1 or 3 years.

Mull that over for a second. This morning, I came across a comparison of pricing between IBM SmartCloud Enterprise and Amazon EC2 (via Adrian Cockroft). I don't know a lot about the IBM cloud, but I do know bad math when I see it.

Lies, damned lies and estimated usage quotes

Amazon offer an online cost calculator. It's accurate, and always kept up-to-date, but admittedly it can be hard to use. For example, you have a small drop-down box at the top of the page which dictates which region you're in; if you are adding infrastructure in multiple reasons, it's easy to get lost.

The author of the IBM article, Manav Gupta, has obviously lost his way around the AWS calculator. His first estimate comes in at over $10,000 a month, as  "Amazon has included costs for redundant storage and compute in Europe". Amazon do no such thing. No data crosses a region unless you specifically request it - an important thing to note for compliance with data protection law. What is more likely is Gupta has started pricing his infrastructure in Europe, noticed his error, and continued in the US, without realising that AWS offers five global regions (six if you include the new US GovCloud) and you can easily provision infrastructure in all of them. In fairness, the IBM calculator seem to be much simpler; I can't find information on where IBM host their SmartCloud.

Quote 1 is replaced by quote 2, which comes in at $6370.62. Ignoring the obvious-but-insignificant errors (how does an application which does 20GB of inbound data per week do 120GB/week through its load balancer?) However, a quick look at the bill tab shows storage allocated in US-WEST, where everything else is allocated in US-EAST. Gupta's quote includes 7GB of S3 storage which is not mentioned on the post (or accounted for in the IBM quote). Not only that, it's charged twice: once in US-EAST and once in US-WEST! Assuming that's an error, I removed both allocations, and in order to be fair to what has been requested, added 300GB of snapshot storage for the EBS volumes to the correct page of the calculator.

Our new estimate - only correcting for errors, and without touching the compute cost - is $4211.90.

I've already beaten the published IBM price, but why stop there? As I mentioned above, sensible cloud purchasing almost always involves instance reservations. Because the pricing appears to have changed since the IBM article was published (I can't find a way to make IBM instances cost the same as shown in the calculator screenshot), I can't tell what reservation was used (if any) in the initial calculation. However, IBM offer 6- and 12-month reservations on a 64-CPU pool, with the note that "reserved capacity may not be economically attractive with the low monthly usage you have selected above".

Let's go for a 12 month reservation on AWS, in case our habits change. (And if they do, remember that reserved instance pricing can apply to any instance in the same availability zone on the same account.)

Our monthly cost has dropped to $2738.04. We do have an up-front reservation cost to pay, but if we amortize that over 12 months (as IBM does in their calculator) we are down to $3420.54 per month. Why not throw in Gold Premium Support? It's only another $341/month.

With regard to Gupta's criticisms about not having a PDF export on the Calculator, I find it easy enough to hit "Print to PDF" on a web page myself, and the fact I can export these quotes and publish them on this blog, far outweighs that hassle.

On the topic of software licensing

Pricing is even harder when you have to factor in the price of licensing. In fairness to IBM, the quoted Amazon costs do not include Red Hat Linux licenses. However, I suspect the only reason they were included, aside from IBM being a Big Support kind of company, is that commercially licensed software (RHEL, SUSE, Windows) is the only option you have on SmartCloud Enterprise.

If you want to run Oracle applications on EC2, why not run them on the freely-licensed Oracle Enterprise Linux? Or the most popular operating system for the cloud, Ubuntu Server?

Alternatively, if the requirement for Red Hat Linux is hard-and-fast, then there is an option to run Red Hat on-demand with Amazon EC2. Reserved instance pricing is not currently available for RHEL, therefore you would be better advised to bring your own RHEL licenses to the cloud with Red Hat Cloud Access.

In the interests of full disclosure, the on-demand RHEL price is $4519.34/mo, vs the $4211.90 above.

Did I mention the "everything else?"

Amazon have defined the cloud computing marketplace - at least for infrastructure - with EC2. As Adrian Cockroft points out in his excellent write-up on using clouds vs. building them, no-one can even come close to the price and performance, let alone the global scope, of EC2. If I were building Manav Gupta's web application, I would have the benefit of resiliency by balancing the application between multiple Availability Zones, and the benefit of reduced maintenance by using RDS for the database tier. And the price would probably be even lower, too.

The cloud provides great benefits to those who can make their application fit its ways. This is not a trivial task - sometimes even working the calculators can be too hard. If you want help with this, I am the Head of Cloud Services at Stoneburn in London, and I'd love you to get in touch. (And follow me on Twitter.)

Update: Manav Gupta has commented and provided a much neater explanation for why his first quote was vastly over-provisioned: there is a sample 'web application' option in the AWS calculator, which assigns a bunch of sample infrastructure over and above what was included in the IBM sample web application. The moral of the story is to ensure you are comparing like for like (as much as possible with differing size options between cloud providers) when making provider comparisons.


  1. Or, tiered options are clearly laid out, as with AWS data transfer. 

Clustering an Amazon Elastic IP address

Wednesday, October 27th, 2010
Balls of red elastic bands

If you have a problem that Amazon's Elastic Load Balancing can't solve, you might want to do the old fashioned "two machine IP failover" cluster.

Amazon instances only have one internal, and one external, IP address at a time. Consider this:

  • Instance 1: [Elastic IP]
  • Instance 2:

If you claim the elastic IP on instance 2, then a new IP will be allocated to instance 1:

  • Instance 1: ¿?
  • Instance 2: [Elastic IP]

You won't know what it is unless you query the web services, or look at the console, for instance 1. Be sure you are aware of the implications of this before proceeding.

I found a forum post from Alex Polvi which, with some tidying, does the job nicely. When the slave node realises that its master mate has gone offline, it will claim the IP address; when the master returns, you can have the master claim it back, or you can have the slave just become the new master.

Claiming the shared/elastic IP

Your script needs a command that the master machine can call to claim the elastic IP address.  Alex's example uses Tim Kay's 'aws' script, which doesn't require Java like the official Amazon ec2-utils.

You need /root/.awssecret to contain the Access Key ID on the first line and the Secret Access Key on the second line:


You can now test this:

$ export AWS_PARAMS="--region=eu-west-1"
$ export ELASTIC_IP=
$ export MY_ID=$(curl -s
$ aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID"

The MY_ID command uses the instance data service to get the instance ID for the machine you're running on, so you can use this script, unedited, on both machines.

This should claim the IP for the instance on which the script is run.

In order for Heartbeat to be able to use this script, we need a simple init script. When run with 'start' it should claim the IP, and when run with 'stop' it should relinquish it. You will need to edit the parameters at the top (or better yet, put them in /etc/default/elastic-ip and source that in your file).  Remember to ensure this script is executable.


DESC="elastic-ip remapper"
MY_ID=$(curl -s

if ! [ -f ~/.awssecret ] && ! [ -f /root/.awssecret ]; then
    echo "$DESC: cannot find ~/.awssecret or /root/.awssecret"
    exit 1

case $1 in
        aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID" > /dev/null
        [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP associated with $MY_ID || echo $DESC: Could not map IP $ELASTIC_IP to $MY_ID
        aws $AWS_PARMAS disassociate-address "$ELASTIC_IP" > /dev/null
        [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP disowned || echo $DESC: Could not disown $ELASTIC_IP
        aws $AWS_PARAMS describe-addresses | grep "$ELASTIC_IP" | grep "$MY_ID" > /dev/null
        # grep will return true if this ip is mapped to this instance
        [ $? -eq 0 ] && echo $DESC: I have $ELASTIC_IP || echo $DESC: I do not have $ELASTIC_IP


Each server needs the heartbeat package installed:

$ apt-get install heartbeat

Allow heartbeat traffic between your instances:

$ ec2-authorize $group -P udp -p 694 -u $YOURUSERID -o $group # heartbeat

Heartbeat is configured by three files, all in /etc/ha.d, and in our case, all identical on both servers:


auth 1
1 sha1 foobarbaz

The authkeys page on the heartbeat wiki offers a script to help generate a key.

# Log to syslog as facility "daemon"
logfacility daemon 

# List of cluster members by short hostname (uname -n)
node server1 server2

# Send one heartbeat each second
keepalive 1 

# Declare nodes dead after 10 seconds
deadtime 10 

# internal IP of the peer
ucast eth0
ucast eth0

# Fail back, so we're normally running on the primary server
auto_failback on

All pretty self-explanatory: set your own 'node' and 'ucast' entries with your hostnames and internal IP addresses. Even when the external IPs are bouncing around, the internal IPs should stay the same. auto_failback is optional, as mentioned above. Read the docs for more options.


server1 elastic-ip

Here, we set up a link between the  primary server (server1) and the script we want to run (elastic-ip). The wiki shows you what else you can do.

Putting it all together

Start heartbeat on both nodes, and server1 should claim the IP address.  Stop heartbeat on server1 (or if server1 crashes), and server2 will notice after 10 seconds  and claim the IP address. As soon as server1 is back up, it should claim it back too. You can run /etc/init.d/elastic-ip status to prove this:

server1:~$ sudo /etc/init.d/elastic-ip status
elastic-ip remapper: I have
server2:~$ sudo /etc/init.d/elastic-ip status
elastic-ip remapper: I do not have

Whatever happens, your elastic IP will always point to a good instance!

Postscript: what Heartbeat will not do

Heartbeat will notice if a server goes away, and claim the IP. However, it will not notice if a service stops running but the machine stays alive. Your good work may all be for nothing!

To solve this, I suggest monit, or if you're a ruby fan, bluepill. These will monitor a service, and restart it if it is not responding.

Migrating your servers to Amazon EC2: Load balancing

Monday, October 25th, 2010

When you run a large web site, you probably have a number of machines, across a number of different availability zones, but you need to present a single URL to the user. You distribute the load between your machines with (a redundant pair of) load balancers, and point your DNS to the floating IP of the balancers.

A number of options for doing similar exist for Amazon EC2 users: as a good balance between convenience and performance, we chose to use Amazon's Elastic Load Balancing (ELB) service offering, with a caveat listed below. While a good default position, this may not be for you; check the bottom of this article for some resources to help you choose.

ELB has some great features. As well as the regular load balancer feature of tracking of which backend instances are up, it proactively adds extra capacity (which I term 'nodes', so as not to get confused with backend instances) in the event of increasing load. You can also set ELB up to spin up more backend instances in the case of there not being enough to serve your requests. All this for a small per-hour and per-GB cost.

Side note: You may be thinking "Why not use round robin DNS, and put the IPs of more than one server?" This is a trap for young players; you actually make things worse, because any one of N machines failing means there's a 1/N chance a request goes to a broken instance.  There's a good writeup on Server Fault if you want more information.

Then and now

In the old world, our site sat behind a hardware load balancer appliance.  Being that we were using a shared device at a co-location provider; I never saw it, and thus can't give you the exact details: but the important part of this story is that when traffic got to our instance, its source IP was still set to the IP of the sender, or at least the last proxy server it went through on it's travels. This matters to us, because, just like Phil Zimmerman's brain, some of Symbian's code is export controlled, due to containing cryptographic awesomesauce. We need to know the source IP of all requests, in case they are requesting our restricted areas.

When you're in EC2, you're operating under their network rules which "will not permit an instance to send traffic with a source IP or MAC address other than its own". This also applies to the instances that run the ELB service. If you set up an ELB, your backend servers will see all their traffic coming from the IP addresses of your ELB nodes, telling them nothing about where it came from before that.

The story that is falling into place largely revolves around the X-Forwarded-For header, which is added to HTTP transactions by proxy servers. Our back-end servers are told the packet arrived from the load balancer, but if you tell ELB that it's using the HTTP protocol on this port, it adds the X-F-F header automatically: the backends can then look at the most recently added entry to the X-F-F and learn the source IP as the ELB knew it.1

Because the load balancer sits between the client and the server, who are either end of an encrypted transaction, it can't rip open a HTTPS packet and add an arbitrary header. So, we had a Heisenproblem: it was not possible to know where something came from, and have that same something happen securely. And, stuff you are only giving to certain allowed people is exactly the sort of stuff you probably want to distribute over SSL.

There were two possible solutions to this:

  1. Direct secure traffic directly to a backend instance
  2. Wait for Amazon to implement SSL termination on ELB

In order to go live, we did #1. It came with a bunch of downsides, such as having to instruct our cache to redirect requests for certain paths to a different URL, such that if you requested, you were taken to "But what happens when that server goes down?", you say! When I planned this article, it was going to include a nice little description of how we got Heartbeat sharing an elastic IP address, so that we always had our "secure" IP pointing to whichever one of (a pair of) our servers which was up. It's a useful trick, so I'll come back to it later.

However, I'm pleased to announce that since then, Amazon have introduced #2: support for SSL termination, so you can upload your certificate to your load balancer, and then it can add the X-F-F header to your secure packets, and you don't need to worry about it any more.2

I was similarly going to have to worry about how to handle database failover in EC2, but they introduced that between me looking and go-live. I surmise that if you wait long enough, Amazon will do everything for you, and now delay introducing anything! :)

Now we know all that, let's dig a little deeper into how ELB works.

A Little Deeper

Amazon is all about the short-TTL DNS. If they want to scale something, they do so, and change what their DNS server returns when you query it.

When you register an ELB, you get given a DNS name such as You're explicitly warned to set your chosen site name as a CNAME to this; and indeed if you use the IP as it stands now, one day your site will break (for reasons you will learn below.)

First oddity with this setup: you can't CNAME the root of a domain, so you have to make a redirect to, preferably one hosted somewhere outside the cloud, as needs to be an A record to an IP address. Some DNS providers have a facility for doing redirects using their own servers, which is an option here.

If you were to query that DNS record you would find that it has a 60 second TTL; thus if you query it twice, 2 mins apart, and you have more than one ELB node3 you may, at the discretion of Amazon's algorithms, get different results.  Try this:

$ dig 60 IN A
$ dig @ 60 IN A

Dude, where's my balancing?

When you register an ELB, you tell it the availability zones it should operate it. Each AZ has at least one ELB node, and that node will route you to instances in its own AZ, unless there are none available. That, along with the fact you are pseudo-randomly given a IP (with a minimum 60 second TTL), leads to a non-obvious conclusion. This actually happened to us - our policy is that odd numbered servers are in -1a, and even numbered servers are in -1b.

external:~$ ab -n 10
web1:~$ wc -l /var/log/apache2/access.log
10 /var/log/apache2/access.log
web2:~$ wc -l /var/log/apache2/access.log
0 /var/log/apache2/access.log
Lop-sided load

That is to say: if your servers are in multiple availability zones4, a single user doing requests in quick succession isn't load-balanced across your backend instances, so ELB doesn't appear to be working at all. Thankfully, it is, you just can't see it, because you're not looking from enough places at once. ELB is designed to work for a widely distributed client base, and in that case, you should expect about half the traffic on one instance, and half on the other. If you ran this test from a different location, you might see all 10 requests go to web2.

If you ask Amazon5, they can change the DNS for an ELB so that it presents all the IP addresses associated, not just one of them. This means your client has the choice to pick the IP each time it connects, and depending on how your application works, may be better for test servers.


The prime reason to use an ELB is that Amazon can transparently add more computing power to support your load if needed.  The converse of that is that when it is no longer needed, it will be removed. It bears mention that if they take an IP address out of the DNS, it will last at least 60 minutes before being taken out of service. Not everyone obeys a TTL on a DNS zone!

To reiterate: don't ever take what the name currently resolves to, and use that IP.  It's not yours and one day it will break.

Further reading

For this article, I have touched on some of the interesting parts of ELB. I didn't feel I needed to write a general introduction, as there are already several good resources out there:

Check back later for talk about databases, storage, security, mail and more!

  1. If you're worried about people spoofing the X-F-F, you can trust that the most recently added entry was yours, and throw away all the rest. 
  2. It's like my boss knew I'd been sitting on writing this post, and just had to pip me to the post! 
  3. Due to having more traffic than one node can service, or being hosted in more than one AZ 
  4. A good practice if you're trying to mitigate site failure - see "No single point of failure". 
  5. You may have to have commercial support for them to do this. 

Migrating your servers to Amazon EC2: Instance sizing

Monday, October 11th, 2010

One of the central tenets of cloud computing it's a cheap way to run large-scale compute jobs. If you're more concerned about starting small, and want to tackle the problem of growing big when you get to it1, then there's still a solution for you, though it might not be quite like the one you're used to.

If you're currently running on a hosted, virtualized platform, you are probably in one of two situations:

  • Your hosting provider buys servers for you, and runs something like VMware ESX on them
  • You're dealing with a VPS provider

If you're in the former bucket, as we were, you have a pretty fine-grained control over your instance (virtual server) scaling. You can add more CPU power (or weight one instance to be allowed to burst at the expense of others), and control, sometimes to the megabyte, how much memory is available to your application.

When you're in the latter bucket, you tend to get a number of discrete plans (such as the ones Linode offer), but your provider has a human element, and if you ask nicely, you can probably get a plan with very low disk but very high memory, by paying a little bit extra (RimuHosting tends towards the confusing with the amount of choice they offer!)

Social & Business Card Sizes

Amazon EC2, being an entirely automated provider, doesn't give you the option to customize your plans. They offer a selection of instance sizes, at various prices. Those are the choices, take or leave them.2 Because of the ease of creating and using multiple machines, and the relatively low extra cost,3 you have to consider if the cost of scaling up is best for you, compared to the cost of scaling out.

Our applications ran almost exclusively on 32-bit machines. There are a number of reasons, in both theory and practice, why 64-bit may not be for you: lack of vendor support, having to maintain software packages for both 32- and 64-bit architectures, slower performance/more memory use for simple tasks, etc. I prefer to stay with 32-bit across the board, which also suggests horizontal scaling.  If your application benefits from 64-bit computing, then you have a slightly different problem to the one I had, and your mileage will vary.

Some figures

Consider, for example, the 'default' instance for new users, the m1.small:

  • 1.7 GB memory
  • 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)

This instance costs 8.5c/hour to run.

Side note: With the launch of Canonical's new Ubuntu Server 10.10, they're announcing a "Try Ubuntu Server on our dime" promotion. It's worth noting that they get 1.5c change for that dime. :)

The next option up gives you about four times the performance, for about four times the cost. However, you don't get too much insight into what four times "Low" IO performance is, vs "High", and you don't get any redundancy. We decided that we'd rather have two small instances in one AZ, and two in another, to build resilience into our infrastructure for free.

It soon dawned on us that 1 "EC2 Compute Unit", which they claim is currently roughtly equivalent to a "1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor", is roughly equivalent to "not good enough for PHP".

The stolen generation


When you use VMware, you get given a virtual CPU, with a speedo that goes from 0 to 100.  With Xen (which is the hypervisor used by Amazon EC2), you can be given a certain percentage of the cycles on the parent CPU, but the gauge you see goes up to the number of cycles you are allowed on the parent CPU, not a percentage of a virtual CPU.

The practical upshot of this is that you end up seeing your CPU maxing out at a certain value (for us, around 40%) - but the other 60% of the cycles are stolen from you to feed other, more deserving, instances. This blog post from Axibase neatly sums up the issues, with graphs. You will see stolen CPU cycles in a new column in 'top':

Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 96.1%id,  0.1%wa,  0.0%hi,  0.0%si,  2.4%st

Not all tools are aware of steal time: you will see stolen ticks in vmstat -s, but not in the tabular vmstat output. You must have Xen-aware tools in order to get this information; Ubuntu provides them out of the box.

Thankfully, there happens to be a suitable instance for us.  Doubling the price from 8.5c to 17c/hour, we get the c1.medium instance:

  • 1.7 GB memory
  • 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)

This one is twice the price of the m1.small, but has 5 times the CPU. A worthwhile upgrade, and all of a sudden our Apache benchmarks are back up where we expect them.

You might have noticed that both the previous instances have a relatively small 1.7 GB of memory. Want more? You're moving up to 7GB, minimum.  If you want to stay with a small application, on 32-bit platform, the c1.medium instance is about where the line ends. We would love an instance type with 4GB of RAM; if you agree, please make yourself known to Amazon. The more customer demand they get, the more likely they are to implement it.

If we get to the point where it suits us, for example, to run two Size 4 machines, rather than eight Size 1 machines, we may consider moving to larger instances; we would save a little on EBS disks and inter-AZ transfer costs, but then a failure on one machine will mean we lose half of our hosting potential, rather than one eighth.

Planning for growth

You don't need to know all this up-front. If an instance is lacking in resource, upgrade it for a bigger/better one. Right?

Earlier in the history of EC2, you couldn't upgrade an instance, because root disks was on the instance-only, or ephemeral, store.  If you step back and think of EC2 as actually being a room full of servers, each machine has some (presumably) local hard disk space. That is the ephemeral ("short-lived"; from the Greek for "one day") store. It's big, and it's free. However, when you turn your instance off, it's wiped.

In contrast, EBS is permanent, paid, network-attached storage (think iSCSI).

Before late 2009, your only option was to turn off your current instance, and then spin up a new one, from your template image (AMI). Then, AWS announced an upgrade, which allows you to boot an instance from an EBS disk. This means you can turn your instance off, and the root file system stays there waiting.  You can use the web services to instruct that instance to be bigger, or smaller, when it returns. Because of the obvious usefulness of this, and the relatively low cost of a 10GB root disk, we're running all our instances on an EBS root.

When you upgrade your EBS-root instances, you are causing them to change class, which generally means, bring them up on a new physical host machine.

This means one obvious thing:

  • Your ephemeral disk (/mnt) is wiped

And two "less obvious" things:

  • Your internal IP address will change
  • Your internal IP address will change

Technically speaking that's only one "less obvious" thing, but I thought it was such a big one, I thought it was worth mentioning twice.

If you have an elastic IP address attached to that instance, your external IP address will remain the same.  However, your instance is now on a different physical host, with a different physical NIC in its host, so it will get a new IP address. For someone who is running a traditional application without cloud-awareness, changing IP can be something which requires thought. For example, if you are upgrading your private DNS server, you will have a problem here. You can't know what the IP address will be before you upgrade, so make very sure you have moved all the services off this machine before you upgrade it. Then, get the new connection details from the console, and reconnect.

As every machine needs an internal IP address, and they are not scarce (Amazon provides them from, meaning there should be no problem up to about 16 million instances), something that is really missing from EC2 for the "always on" use case we run is static internal IP addresses. Fire up your request e-mails.4

  1. I think I pretty much wrote Rework there! 
  2. Amazon do often add new instances sizes, so the information in this article may one day be superseded. 
  3. In the case of non-EBS instances in the same AZ, there should be no extra cost. 
  4. I even offer a suggestion on how to implement them: 5 for free per customer, and one more for every reserved instance you buy.  Then, they're issued by DHCP to any instance ID they are registered to. 

Migrating your servers to Amazon EC2: Initial design considerations

Friday, October 1st, 2010
From Powerhouse Museum on Flickr

Cloud architecture!

Even without making major changes to your application, you can make Amazon EC2 work for you.

Here are some things that I considered when designing our new setup:

No single point of failure

Any one machine should be able to go down - as Amazon CTO Werner Vogels says, "everything fails, all the time".  Guaranteed failure makes you think. The parts of the site that are identified as being most important should be able to run even if an entire datacentre fails.

Thankfully, EC2 makes this simple. Availability Zones (AZs) have been described to me as far enough apart that a disaster at one will not affect the other, but close enough that an engineer can drive between them in a reasonable time.

In my experience, the difference in ping times between our eu-west-1a instances and our eu-west-1b instances is less than 1ms. You do pay a "regional data transfer" rate of $0.01/GB  for transfer between instances in different AZs in the same region. At that price, it is cost-effective for us to run the system across two AZs. Our load balancing doesn't care which zone the machines are in, so even if one zone fails, then the site is still reachable.

No wasted cycles

You can turn on a machine and turn it off as you see fit; assuming you have an EBS-root instance (and you should), you only pay for the disk while the machine is off.  You can also attach that disk to a more powerful instance, should you have a need for a short-term boost of computing power!

Further to that, if we have a second machine running for failover purposes, it should be serving traffic, so that when we're in our good state, we have twice the performance available to us.

No private networking

Amazon network access is controlled by security groups. Instances are assigned to a security group at startup. You can then do things like say "proxy servers may access web servers on port 80", "the public may access proxy servers on port 443", "my office may access everything on port 22".

While Amazon instances know about security groups, your applications don't.  You can't allow access to something from the public Internet, and allow more access to it from a nominated network range, on the same port. I'll touch on this more when talking about security and mail servers later in this series.

Amazon offers a Virtual Private Cloud, which allows you to put more machines behind your firewall via an IPsec VPN.  It comes with an important proviso that is missing to many first-time readers: you can't access a VPC instance directly from the Internet. There's no way to use VPC as a management VPN, but have the instances on the public Internet - unless you want to accept traffic for those instances on your own servers, in which case you should have more redundant network connectivity than Amazon has, and you now pay for traffic in two places.

You can, of course, run a VPN server on your EC2 instances, or you can require your users have a VPN connection to your office, in order to get trusted access to your EC2 servers.

Size your instances as necessary

We started trying to run as many of our instances as we could on the smallest type (the m1.small), and quickly hit its limitations. However, remember that resizing instances isn't difficult. I'll touch on this later as well.

Use the right levels of redundancy

You can get a lot of benefits if you rethink your application and build it with the cloud in mind, but you can still get a great cost saving and a faster application just by treating EC2 as a big VM farm. For example, we're not using S3 at all, and barely using EBS.

Our root disks are on EBS, but our data is mostly replicated across multiple nodes, so using the ephemeral store - which is otherwise wasted - was perfect for us. Why pay extra to store a Mercurial repository, which has to be in sync across four machines, when each other machine already has a consistent copy by default?

Automate everything

You can register your own disk image (AMI) which you can create instances of.  By using a combination of configuration management and locally-developed deployment scripts, we haven't yet had the need to do this.

For us, firing up a new instance involves running a script with a wanted hostname and the instance ID we're given when we create it.  This will add the machine to the DNS, SSH to it, install Puppet, register with our puppetmaster and install the machine to the current spec. Our machines auto-register with our monitoring servers.

Once something is totally automated, it can be done automatically, as a result of an external stimulus. For example, when our ELB detects a spike of traffic to the site, you can have it auto-scale and create new instances in response. Even if you don't think you need this now, if you design your system right from the beginning, you're well placed to introduce it later.

Employ the principles of structured system management and your EC2 environment will pass the Joel Test for System Administrators.

Clouded House: How we moved our service to Amazon EC2 without it even knowing, and how you can too

Wednesday, September 29th, 2010

At the Symbian Foundation, we run a number of open-source applications and utilities written both in-house and by our member companies to support the Foundation's goals of building and promoting the Symbian platform. We have servers running LAMP1 applications such as Symfony, Drupal, MediaWiki, Bugzilla and Mercurial.  We also run some Tomcat applications such as OpenGrok and Bugzilla Metrics.

When the Foundation was set up we contracted a company to provide hosting services for us, which were provided on VMware ESX, on dedicated servers in a London datacenter. For various reasons, we decided some months ago it was time to part ways with our current hosting provider. This gave us the chance to make some moves we've been wanting to make for some time; a break from the now-unsupported PHP 5.2 to PHP 5.3 for our LAMP applications, a move to Ubuntu for the operating system, introducing some great configuration management and automation tools (more on those later), and removing some closed source components. It also had the side effect of allowing us to repurpose a rather large sum of money!

The goal in moving our infrastructure was to make the move seamless to the end user; I suspect that if you are reading this and you're an active user of Symbian's web applications, you couldn't even tell me what day this move was made. This project was undertaken with only minimal involvement from our web team, so there was no rewriting of the applications to use file storage in S3; we had to provide a system that worked the same as the previous hosting. This was my challenge, and it was a lot of fun.

What is AWS?

In updating the About page for the UK AWS users group, I wrote:

Amazon Web Services (AWS) is a cloud-based infrastructure platform, offering services such as computing capacity , database and storage, available on demand, priced as a utility, and controlled by web services.

People who like hard-to-pronounce acronyms2 have invented the new category of infrastructure as a service (IaaS) in order to separate AWS from other "cloud computing" offerings such as PaaS (think Google App Engine) or SaaS ( Through a web service call, you say "I want you to give me a VM"; it returns you the ID of your VM, and starts charging you for it.  The AWS offerings range from the low-level (VMs, disk, static IPs) to the high-level (message queues, MySQL databases)

Amazon invented this category and are the market leader; there are competitors, such as Rackspace Cloud and, but after some initial experimentation, the decision to go with Amazon was easy to make.

Lock-in isn't too much of a concern when you're just renting infrastructure - you have root access, and you put it all there yourself, so you know you could pick it up and re-implement it anywhere else if you chose.  (If we were using S3, it might be a concern.) However, you know you're the winner when you have an open source implementation of your API; the Eucalyptus project implements the AWS APIs and lets you run your own "private cloud" on your own hardware. Eucalyptus powers Canonical's Ubuntu Enterprise Cloud, which is a great way to get started without needing a credit card.

Does my application need the cloud?

If you are building an application from scratch, it's hard to say no to what cloud computing offers. You can scale from one machine all the way up to Farmville. Netcraft, while they're not busy confirming the death of things, claim that 365,000 web sites are hosted on EC2 as of May this year. It's also much better for your cashflow.

If you're migrating an existing application, your application doesn't run in the cloud today, so it obviously doesn't need what the cloud offers. However, rebuilding an application for a cloud environment (as opposed to just picking the disks up and putting them on virtualised hardware) makes you rebuild things in a way that should, if you do it right, make scaling easy.

Even if you don't need the scalability, think of the potential upsides; you can turn a machine on just to do some processing, and then turn it off again when you're finished.

Because we migrated an existing infrastructure, we aren't really using EC2 as anything more than a VPS provider. Albeit, a super-VPS provider where you can turn instances on and off yourself, clone and snapshot them, and scale them up and down your will. Now we have moved into the cloud, should our developers wish, they can use whatever other services take their fancy.


If you're running an application that runs 24x7, economically, cloud computing isn't necessarily going to be better, as Google's Vijay Gill commented recently. If your business relies on having thousands of servers, then you can probably benefit from the economies of scale of having your own setup, and if it's your core competency, you probably shouldn't outsource that. However, his figures don't consider the cost of reserved instances, which generally drop the per-unit cost of EC2 instances by at least half.

(I suspect Gill would suggest people with smaller needs use Google's PaaS offering, App Engine, which requires rewriting your application, and using either Python or Java. This series suggests that you don't have the ability or remit to do this.)

When I first drafted this article, I wrote "EC2 can't scale as low as Linode's $20/month plan": since then however, Amazon have announced "micro" instances, which cost next-to-nothing (when reserved, about $14/mo). The support is not exactly Fanatical - like many open source things, your support options are best-effort in the forums, or purchasing Premium Support, at 10/20% of your monthly spend.

However, if you believe, as we do, that cloud computing is the future, and you're running a few dozen servers, then even at non-reserved rates, EC2 probably works out cheaper than traditional commercial virtualised hosting, and offers you such benefits such as being able to use different, worldwide datacenters, at no extra cost.3

That there is no capital expenditure is a bonus for your finance department, but they will probably balk at having to pay by credit card. If your spend is enough, Amazon will put you on invoice billing; until then, fly under the radar, or convince your finance director that the savings are worth an exception to the policy.

A small note on open source vs proprietary software in the cloud

If your product costs more to use the more servers you run it on, cloud computing may not be for you. This reminds me of a post from Jeff Atwood; scaling out costs a lot more when you have to pay for your operating system or software licenses

(You can scale up instances - and if you have your root on EBS, you can do it with an existing machine - but the nearer you get to using a full unshared server, the closer Amazon have to charge you to the cost of a full unshared server, and so you may near the point where building your own server is a better deal.)

Stay tuned

Over the next few weeks I will post a series of articles talking about the problems we faced, the challenges we overcame, and how you can take your application or servers and move them into the EC2 cloud.  I will also touch on the things that are not-so-great about each component of the Amazon stack, and make suggestions to how you can work around them, or how Amazon could improve their service.  You'll be the first to hear about the brand new file system we've developed for quickly moving files around EC2 instances. In return, I'd love to hear from anyone who has suggestions on how they think we could have done better.

To make sure you get each post as they come out, subscribe to this blog by RSS, or sign up for updates by e-mail.4 You should follow me on Twitter here, but only if you're prepared for short bursts of awesome.

  1. using the extended definition to include PHP, Perl and Python 
  2. technically, an initialism, unless you want to get quite rude 
  3. Things cost slightly more - around 1c/hour - outside of their largest centre, Washington DC. 
  4. If you want to make ultra-sure you only read posts relating to our Amazon migration, and don't accidentally read a band review or something, there's a category for that

How do I change the DHCP subnet for NAT on VMware Fusion 3.0?

Tuesday, November 10th, 2009

There are a couple of helpful blog posts (Nilesh Kapadia and Max Newell deserve a shout-out here) which help you with changing the DHCP settings given to your NAT or host networks on VMware Fusion. However, it all changes in 3.0.

The file you now need to edit is /Library/Application Support/VMware Fusion/networking. In there, you will find these lines:


I believe the third octet (the 93 part) is selected randomly when you install; in any case, I wanted to give out addresses on, so I changed the configuration like so:


and restarted the network interfaces:

sudo "/Library/Application Support/VMware Fusion/" --restart

Now, make a note of the MAC address of your virtual network adapter in your guest OS, and you can assign an entry in the dhcpd.conf file (/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf).  Make sure you do it outside of the area that is marked "this will be overwritten"!

host developer-vm {
    hardware ethernet 00:0c:29:cb:dd:72;

and another service restart.

Mirror mirror

Friday, May 22nd, 2009

A colleague of mine just sent me this. Why, I may never know, but it's quite cool.

MMMMMM@@@MMMM@MW0ZaZaaZZa88aZa@MM0@M@0SSaa80BB0ZaZa2SSXSaBW@WW08aa2aaZ8ZZa222222SS22aZa800Z0@MW  XMM
@@M@@@@MM@MM; S@0SS@M@880aSaaZ2SSS22a8ZZ22S2SX2XXS2WB008Z22SXXS2aM2a8aaZ28aZZa2XSSZBB2aXX;7X28Z2ZZMM
@@M@@@@MMMMMM. aMZ;,BMMW08aa22a228Z2aS2SS2Z2Sa8X7XZWZSS2Z808a2aaBWaaaZ8BaZSXSXa0W@@M07Sr7rXXXZZ2aa@M

I sent him a link to this in return.

A more pragmatic, but less common, question about Office Open XML

Sunday, January 25th, 2009

Geek question.

Assume people collaborate using Microsoft Office documents around (probably by e-mail, but that's another discussion).  

The "97-2003" formats are closed, binary formats, but well readable by applications such as, and now documented by their author (albeit under a specification that the GPL people can't yet get behind).    The 2007 format is more interoperable in the sense it's XML and (mostly) documented and went through an infamous "standardization" process, but is still not widely accepted. I would rather see ODF succeed — potentially with patent-free extensions from Microsoft to allow it to support all the features of Microsoft Office — as the 'standard XML document format'.  

Now we've got all that aside, I prefer .doc to .docx, and it's nothing to do with any of that - it's purely pragmatic, with respect to users of previous versions of Office.

Office 2003 and 2007, side by side.

I believe that in the team I work in, maybe 2/3 are on Office 2003 and the other 1/3 are on Office 2007.  Our customers, being that we deal with very large companies, are overwhelmingly all still on Office 2003.  (Personally, I'd rather send a PDF to a customer than a DOC, but that's not a decision I can make company-wide.)

I have the converter pack installed, which makes my Office 2003 installation compatible with Office 2007 documents (though it does prompt me and say that some features may be lost in translation).  I can't assume that everyone does, however, so OOXML files inconvenience anyone who gets sent these documents that does not have the converter installed.  The argument could also be made that PDF files inconvenience people without a PDF reader: everyone just downloads one, so what?

By way of opinions placed in the comments, should I be encouraging colleagues who send me documents in Office 2007 format, to enable saving by default in Office 2003 format?

Two hits

Friday, January 16th, 2009

The question: "why do hash functions use prime numbers?"  I knew I'd seen someone suggest it wasn't necessary in researching this topic last week.  (They cited this, if you care.)

The Google search: "prime hash".

Result #1: Why do hash functions use prime numbers?

Result #2: Prime Hanger Steak with Crispy Hash Browns and Pierre Poivre Sauce

Mmmm, prime hanger steak with crispy hash browns and pierre poivre sauce!

Mmmm, prime hanger steak with crispy hash browns and pierre poivre sauce!

I think it was lunchtime at that point.

Also, did you know the reproductive cycle of cicadas is 17 years?