Content Top
Should the Amazon Web Services EC2 outage impact Cloud adoption?
Should the Amazon Web Services EC2 outage impact Cloud adoption?
By SMBWorldAsia Editors | May 5, 2011
On 21 April 2011, Amazon Web Services Elastic Compute Cloud (EC2) had an outage that impacted multiple Availability Zones. In a recent blog post, Trend Micro highlighted key learnings and what companies need to know while adopting the Cloud.
Amazon issued a status update indicating that the outage was based on problems with replication mirroring:
"This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it’s difficult to create new EBS volumes and EBS backed instances."
A certain amount of service outage is to be expected. However, this incident raises a couple of different concerns. One is that the Amazon Availability Zones did not work as represented. Amazon provides computing resources from different geographic reasons. In addition, each geographic location offers different Availability Zones which are supposed to be engineered to be insulated from failure in another Availability Zone. However, in the recent outage, multiple Availability Zones were impacted, showing that they are not acting as advertized.
In addition, this incident went beyond just an availability issue. Amazon was not able to recover all of the volumes affected. On April 25, Amazon issued the following in an update: “We’ve determined that a small number of volumes (0.07% of the volumes in our US-East Region) will not be fully recoverable.” This seems like a small amount, but 0.07% of what? Depending on the amount of Amazon’s overall services, this could be considerable. And if you’re one of the customers impacted, you don’t care how small the number is overall.
Should this outage with Amazon give cause for concern? Should it make businesses limit their cloud adoption? These are two separate questions. Yes, it should give cause for concern, but this should be a cautionary tale that influences how companies approach the cloud, not if they approach the cloud.
This was certainly a significant outage. However, Amazon has generally done a good job of service availability. And Amazon has built out their infrastructure with failover and load balancing beyond what most businesses are able to deploy in an on-premise data center. Although using an on-premise data center may give companies a feeling of more control, especially when an incident like this occurs, in truth they will most likely get better availability through a provider that is dedicated to offering on-demand computing services.

0 comments
Digg
Print