If you have a problem that Amazon's Elastic Load Balancing can't solve, you might want to do the old fashioned "two machine IP failover" cluster.
Amazon instances only have one internal, and one external, IP address at a time. Consider this:
- Instance 1: 256.256.256.4 [Elastic IP]
- Instance 2: 257.257.257.8
If you claim the elastic IP on instance 2, then a new IP will be allocated to instance 1:
- Instance 1: ¿?
- Instance 2: 256.256.256.4 [Elastic IP]
You won't know what it is unless you query the web services, or look at the console, for instance 1. Be sure you are aware of the implications of this before proceeding.
I found a forum post from Alex Polvi which, with some tidying, does the job nicely. When the slave node realises that its master mate has gone offline, it will claim the IP address; when the master returns, you can have the master claim it back, or you can have the slave just become the new master.
Claiming the shared/elastic IP
Your script needs a command that the master machine can call to claim the elastic IP address. Alex's example uses Tim Kay's 'aws' script, which doesn't require Java like the official Amazon ec2-utils.
You need /root/.awssecret to contain the Access Key ID on the first line and the Secret Access Key on the second line:
AK47QWERTY7890ASDFG0H 01mM4Rkl4RmArkLArmaRK14rM4rkL4MarKLar
You can now test this:
$ export AWS_PARAMS="--region=eu-west-1" $ export ELASTIC_IP=256.256.256.4 $ export MY_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id) $ aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID"
The MY_ID command uses the instance data service to get the instance ID for the machine you're running on, so you can use this script, unedited, on both machines.
This should claim the IP 256.256.256.4 for the instance on which the script is run.
In order for Heartbeat to be able to use this script, we need a simple init script. When run with 'start' it should claim the IP, and when run with 'stop' it should relinquish it. You will need to edit the parameters at the top (or better yet, put them in /etc/default/elastic-ip and source that in your file). Remember to ensure this script is executable.
/etc/init.d/elastic-ip
#!/bin/bash DESC="elastic-ip remapper" MY_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id) ELASTIC_IP="256.256.256.4" AWS_PARAMS="--region=eu-west-1" if ! [ -f ~/.awssecret ] && ! [ -f /root/.awssecret ]; then echo "$DESC: cannot find ~/.awssecret or /root/.awssecret" exit 1 fi case $1 in start) aws $AWS_PARAMS associate-address "$ELASTIC_IP" -i "$MY_ID" > /dev/null [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP associated with $MY_ID || echo $DESC: Could not map IP $ELASTIC_IP to $MY_ID ;; stop) aws $AWS_PARMAS disassociate-address "$ELASTIC_IP" > /dev/null [ $? -eq 0 ] && echo $DESC: IP $ELASTIC_IP disowned || echo $DESC: Could not disown $ELASTIC_IP ;; status) aws $AWS_PARAMS describe-addresses | grep "$ELASTIC_IP" | grep "$MY_ID" > /dev/null # grep will return true if this ip is mapped to this instance [ $? -eq 0 ] && echo $DESC: I have $ELASTIC_IP || echo $DESC: I do not have $ELASTIC_IP ;; esac
Heartbeat
Each server needs the heartbeat package installed:
$ apt-get install heartbeat
Allow heartbeat traffic between your instances:
$ ec2-authorize $group -P udp -p 694 -u $YOURUSERID -o $group # heartbeat
Heartbeat is configured by three files, all in /etc/ha.d, and in our case, all identical on both servers:
authkeys
auth 1 1 sha1 foobarbaz
The authkeys page on the heartbeat wiki offers a script to help generate a key.
ha.cf
# Log to syslog as facility "daemon" logfacility daemon # List of cluster members by short hostname (uname -n) node server1 server2 # Send one heartbeat each second keepalive 1 # Declare nodes dead after 10 seconds deadtime 10 # internal IP of the peer ucast eth0 10.256.256.4 ucast eth0 10.257.257.8 # Fail back, so we're normally running on the primary server auto_failback on
All pretty self-explanatory: set your own 'node' and 'ucast' entries with your hostnames and internal IP addresses. Even when the external IPs are bouncing around, the internal IPs should stay the same. auto_failback is optional, as mentioned above. Read the docs for more options.
haresources
server1 elastic-ip
Here, we set up a link between the primary server (server1) and the script we want to run (elastic-ip). The wiki shows you what else you can do.
Putting it all together
Start heartbeat on both nodes, and server1 should claim the IP address. Stop heartbeat on server1 (or if server1 crashes), and server2 will notice after 10 seconds and claim the IP address. As soon as server1 is back up, it should claim it back too. You can run /etc/init.d/elastic-ip status to prove this:
server1:~$ sudo /etc/init.d/elastic-ip status elastic-ip remapper: I have 256.256.256.4 server2:~$ sudo /etc/init.d/elastic-ip status elastic-ip remapper: I do not have 256.256.256.4
Whatever happens, your elastic IP will always point to a good instance!
Postscript: what Heartbeat will not do
Heartbeat will notice if a server goes away, and claim the IP. However, it will not notice if a service stops running but the machine stays alive. Your good work may all be for nothing!
To solve this, I suggest monit, or if you're a ruby fan, bluepill. These will monitor a service, and restart it if it is not responding.
If you use the newer heartbeats you can get service monitoring as well.