An apparent DDoS attack on Amazon Web Services (AWS) over the weekend left a web-hosting code service down for about 20 hours before the problem became resolved. The attack raises questions about how fast Amazon responds to its clients in times of attack and what level of trust customers should place with one cloud service provider.
Bitbucket hosts everything on Amazon EC2. They also use EBS service for storage of everything from their database, logfiles, and user data.
Bitbucket first recognized the attack on Friday night when their network storage became virtually unavailable. According to the detailed account on their blog, the site crawled to a halt.
After the service went down, AWS was contacted. After more than five hours of back and forth about the extent of the issue, the conversation moved to Twitter and you guessed it, that's when Amazon realized perhaps this might be a bigger problem than they thought.
Some of Bitbucket's large customers contacted Amazon and the problem climbed up the customer support ladder pretty fast.
Up until this point, Amazon maintained they did not have a problem with the service. That line soon changed as the issue became more severe. By this time senior executives were on the phone, engineering specialists were being called in and Bitbucket had Amazon's full attention.
Twenty hours later, the service had been restored. But the after effects are still apparent. Jesper Noehr created Bitbucket. He has been tweeting continuously since the problem began. You can tell he is spending a good bit of this time working with customers who are none too happy. Do a search on Twazzup and the complaints about Bitbucket's problems run down the page.
It's a trust game to this point in the cloud services world. A lag in diagnosis prevented Bitbucket from getting back online. They took a big hit. Not surprisingly, they are considering other services.
But for customers out there, it's time to look more deeply at how much faith you put in one cloud services provider.
From The Register:
"The lesson here is: 'Don't bet the farm on a single cloud provider,'" says Craig Balding, founder of cloudsecurity.org and a security practitioner at a Fortune 500 company. "It's common sense really. But people get lulled into thinking they site is always going to be available [when they host with a single provider]."
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteEnterprise posts
This is exactly why I prefer Rackspace or hosting companies that are HOSTING companies.
Sure, Amazon knows how to build an awesome system for serving up their Book catalog, but that doesn't make them a hosting company.
I wouldn't ask Barnes & Noble to hold onto my portable hard drive backup until I need it again, and I won't ask Amazon to host and serve my online files.
Its good that its back, but such problems should be avoided in future.
I think it’s irresponsible and alarmist to claim that EC2 “went down.” It was a “set of racks” in a “single availability zone.” The purpose of exposing the ‘availability zone’ concept to developers is to allow them to ensure their own availability even during events such as this.
The S3 outage broke all use of S3; this was a connectivity loss to a fraction of EC2. The two cannot be compared. Everyone was perfectly able to launch new instances to replace the out-of-commission ones.
Is an “EC2 is down” GigaOM post to be expected for every AWS status dashboard update? To suggest that EC2 as-a-whole was ‘knocked out’ or in a ‘nose-dive’ is really quite inaccurate.
@Ben Mc
You sure have no idea what are you talking about!
The entirety of EC2 did not fail. I have servers on EC2 that were doing just fine. Please be more specific about the failure in the future. To state that AWS was down is misleading.
The service did get attacked and the repercussions were significant for Bitbucket. This is an issue about customer service as it is about attacks. To be fair, Bitbucket does deserve responsibility. They had everything hosted on AWS. It's an issue that other companies relying entirely on AWS may want to examine considering the repercussions that can result.
This is actually one of the biggest challenges with hosted or 'in-the-cloud' services. Within your own infrastructure, you can set up firewalls, IDS systems, etc to make it easier to identify and respond to security attacks and incidents. Within a hosted environment, it can take a lot longer to realize there is an issue, whether it is targeting you (or the platform), confirm the issue, raise an external support ticket to resolve it (rather than mobilizing your own IT). Generally, the response times will be slower unless you do your homework. On the positive side, for a very small shop, the hosted solution benefits from the security expertise and capabilities of the larger hosting organization (Rackspace is a great one, as mentioned by Ben Mc).
Michael Argast, Security Analyst, Sophos
I think it's important the ask whether downtime is more common with Amazon, or in one's own infrastructure. I think you'd find that companies individually, especially smaller companies likely to use the Cloud, have much greater incidence of downtime due to all causes than the EC2 Cloud--and, of course the Cloud didn't go down here, but merely one small component of it.
We need news, not FUD
Alex, can you clarify whether or not BitBucket had Amazon Gold support? I'd expect a better SLA from Amazon is they did.
I can relate with bitbucket. Just go through the AWS EC2 forum to get a feel of the support amazon gives to its PAYING customers. They even have dedicated trolls on the lists barking at complaining customers (and then peddling their PREMIUM support). As for their EC2 SLA read it... hell will have to literally freeze over before they will even begin giving rebates.
Thank you for your sharing,I like this.
Oh No!!! we were thinking of moving our operation for binfire.com to Amazon EC2 and S3. Need to go back to drawing board!