Craig Box's journeys, stories and notes...


Archive for September, 2011

Review of "Amazon Web Services: Migrating your .NET Enterprise Application"

Friday, September 23rd, 2011

Amazon Web Services: Migrating your .NET Enterprise Application
Rob Linton, Packt Publishing
2/5

(Review copy supplied by Packt Publishing.)

Amazon Web Services (AWS) is not a small topic. Just listed on their 'product summary' page are 28 different topics, most with an entire set of both product and API documentation behind it.

Condensing that into a book is not a trivial task, and it requires establishing a suitable narrative. This book has taken the angle of a ".NET Enterprise Application", and starts off well: a sample application, if a little trivial, is provided, and a goal stated to move the application from traditional server hosting to the Amazon cloud.

Good, but short, consideration is given to why you would put such an application in AWS rather than a platform solution. It then dives in to creating instances for deploying the application.

A book that takes you on a journey, as opposed to a general reference book, should not be afraid to make choices. Five pages are dedicated to the Import/Export service, which lets you post Amazon a hard drive. Shipping terabytes of data is a problem that users are unlikely to have up front - the book should acknowledge its existence, but it wastes time and confuses users by going in-depth on a subject which should be an appendix at best.

Similarly, Chapter 6 covers SQL Server, required for the example application, but then also covers Oracle, MySQL (RDS) and Amazon's key-value store SimpleDB, none of which are used or required. It is great to see that the notification (SNS) and queuing (SQS) are discussed in the context of how the application could be enhanced to use them, although using these services means you are "locked in" in much the same way you are on a platform service - somewhat undermining the point the book made in the beginning.

Many statements in this book are just plain wrong (such as Amazon.com not being hosted on AWS, or network (EBS) volumes being faster than instance disks - whole books could be written on this topic alone). Other sections of the book are have been made outdated as Amazon has rolled out improvements - the most major of which being the new license mobility options allowing the use of SQL Server Enterprise. While there is nothing the author or publisher can do about progress, there are occasions where the book is internally inconsistent - for example, referring to 4 regions in one section and 5 in another. In general, poor editing detracts from the reading experience.

One of the reasons Amazon is so much cheaper than regular datacenter providers is they allow you to build reliable solutions out of commodity hardware. However, this means you need to make allowances that are not at all discussed in this book. Deploying applications across availability zones is absolutely essential - Amazon is up-front in saying that they expect failures, which are widely reported by people who do not understand that AWS is not a traditional, expensive battery-backed-SAN-reliable datacenter. This book mentions availability zones, but doesn't show how to properly use them.

Redundancy is only briefly touched on - SQL mirroring and failover, possibly the most important topic this book could cover, is given two paragraphs and then offloaded to Microsoft. Even though there appears to be enough servers for a redundant architecture, the eventual service is riddled with single points of failure and there is no way that an application built to this model should be allowed into production on AWS.

Further, many best practices, especially those around firewalls, security groups and Active Directory, are described incorrectly, and are likely to lead to insecure or unnecessarily expensive deployments.

The author clearly understands both Windows/SQL Server and the basics of AWS, but taking 28 topics and picking out the important ones is a difficult task, and overall this book does a poor job of it.

Updating a manuscript to include new functionality means it would effectively never be published. The alternative is a 'living document', published online: hard to make money from, but guaranteed to be up-to-date. I am unlikely to bother reading another book on AWS.

 

Cloud pricing is hard

Tuesday, September 6th, 2011

One of the many benefits to cloud computing is the pricing model. Following Amazon's lead, any provider worth their salt lists their per-hour pricing on their website, and that is the price you pay, regardless of what you use.1 Gone are the days where you have to call for a custom price list, tailored for you by a man in a suit who is incentivised to charge exactly the maximum he thinks you will pay, no more no less. This means startups can get hold of scalable infrastructure at economies previously only available to the canny corporate negiotiator.

However, even in the automated, API-driven present, there are still different models for pricing which you can choose from. For example, Amazon has an on-demand price, reserved instances (pay up front to buy the right to run a machine for a cheaper rate) and spot instances (an instance market, where you bid a price and if the spot price is below that price, your instance runs). While spot instances sound like a curiosity for people doing queue-based distributed computing that can be started and stopped at will, James Saull points out they turn out to be an oddly cost-effective way to run your always-on infrastructure. You may not like the risk, and you are not getting the guarantee of instance availability that comes with reserved instances.

For the general case, once you understand what your infrastructure requirements look like on Amazon, you buy suitable reserved instances: you then save 34% or 49% on the cost of running the equivalent on-demand instance over 1 or 3 years.

Mull that over for a second. This morning, I came across a comparison of pricing between IBM SmartCloud Enterprise and Amazon EC2 (via Adrian Cockroft). I don't know a lot about the IBM cloud, but I do know bad math when I see it.

Lies, damned lies and estimated usage quotes

Amazon offer an online cost calculator. It's accurate, and always kept up-to-date, but admittedly it can be hard to use. For example, you have a small drop-down box at the top of the page which dictates which region you're in; if you are adding infrastructure in multiple reasons, it's easy to get lost.

The author of the IBM article, Manav Gupta, has obviously lost his way around the AWS calculator. His first estimate comes in at over $10,000 a month, as  "Amazon has included costs for redundant storage and compute in Europe". Amazon do no such thing. No data crosses a region unless you specifically request it - an important thing to note for compliance with data protection law. What is more likely is Gupta has started pricing his infrastructure in Europe, noticed his error, and continued in the US, without realising that AWS offers five global regions (six if you include the new US GovCloud) and you can easily provision infrastructure in all of them. In fairness, the IBM calculator seem to be much simpler; I can't find information on where IBM host their SmartCloud.

Quote 1 is replaced by quote 2, which comes in at $6370.62. Ignoring the obvious-but-insignificant errors (how does an application which does 20GB of inbound data per week do 120GB/week through its load balancer?) However, a quick look at the bill tab shows storage allocated in US-WEST, where everything else is allocated in US-EAST. Gupta's quote includes 7GB of S3 storage which is not mentioned on the post (or accounted for in the IBM quote). Not only that, it's charged twice: once in US-EAST and once in US-WEST! Assuming that's an error, I removed both allocations, and in order to be fair to what has been requested, added 300GB of snapshot storage for the EBS volumes to the correct page of the calculator.

Our new estimate - only correcting for errors, and without touching the compute cost - is $4211.90.

I've already beaten the published IBM price, but why stop there? As I mentioned above, sensible cloud purchasing almost always involves instance reservations. Because the pricing appears to have changed since the IBM article was published (I can't find a way to make IBM instances cost the same as shown in the calculator screenshot), I can't tell what reservation was used (if any) in the initial calculation. However, IBM offer 6- and 12-month reservations on a 64-CPU pool, with the note that "reserved capacity may not be economically attractive with the low monthly usage you have selected above".

Let's go for a 12 month reservation on AWS, in case our habits change. (And if they do, remember that reserved instance pricing can apply to any instance in the same availability zone on the same account.)

Our monthly cost has dropped to $2738.04. We do have an up-front reservation cost to pay, but if we amortize that over 12 months (as IBM does in their calculator) we are down to $3420.54 per month. Why not throw in Gold Premium Support? It's only another $341/month.

With regard to Gupta's criticisms about not having a PDF export on the Calculator, I find it easy enough to hit "Print to PDF" on a web page myself, and the fact I can export these quotes and publish them on this blog, far outweighs that hassle.

On the topic of software licensing

Pricing is even harder when you have to factor in the price of licensing. In fairness to IBM, the quoted Amazon costs do not include Red Hat Linux licenses. However, I suspect the only reason they were included, aside from IBM being a Big Support kind of company, is that commercially licensed software (RHEL, SUSE, Windows) is the only option you have on SmartCloud Enterprise.

If you want to run Oracle applications on EC2, why not run them on the freely-licensed Oracle Enterprise Linux? Or the most popular operating system for the cloud, Ubuntu Server?

Alternatively, if the requirement for Red Hat Linux is hard-and-fast, then there is an option to run Red Hat on-demand with Amazon EC2. Reserved instance pricing is not currently available for RHEL, therefore you would be better advised to bring your own RHEL licenses to the cloud with Red Hat Cloud Access.

In the interests of full disclosure, the on-demand RHEL price is $4519.34/mo, vs the $4211.90 above.

Did I mention the "everything else?"

Amazon have defined the cloud computing marketplace - at least for infrastructure - with EC2. As Adrian Cockroft points out in his excellent write-up on using clouds vs. building them, no-one can even come close to the price and performance, let alone the global scope, of EC2. If I were building Manav Gupta's web application, I would have the benefit of resiliency by balancing the application between multiple Availability Zones, and the benefit of reduced maintenance by using RDS for the database tier. And the price would probably be even lower, too.

The cloud provides great benefits to those who can make their application fit its ways. This is not a trivial task - sometimes even working the calculators can be too hard. If you want help with this, I am the Head of Cloud Services at Stoneburn in London, and I'd love you to get in touch. (And follow me on Twitter.)

Update: Manav Gupta has commented and provided a much neater explanation for why his first quote was vastly over-provisioned: there is a sample 'web application' option in the AWS calculator, which assigns a bunch of sample infrastructure over and above what was included in the IBM sample web application. The moral of the story is to ensure you are comparing like for like (as much as possible with differing size options between cloud providers) when making provider comparisons.

 

  1. Or, tiered options are clearly laid out, as with AWS data transfer.