ReadWriteWeb

Reaching for the Sky Through The Compute Clouds

Written by Alex Iskold / February 18, 2008 1:20 PM / 17 Comments

On Friday, a massive outage occured at Amazon Web Services that generated a wave of negativity and criticism in the blogosphere. Not long ago, Rackspace, one of the world's largest hosting companies, experienced a outage that resulted in a similar reaction. When the backbone collapses, so do our favorite services. This makes us mad. It makes us say things like: well, maybe we shouldn't be using the cloud. Or things like: why can't we get 99% uptime? Or: isn't this what an SLA is for?

Software and hardware, like any system, can never be perfect. When power outages happen we get frustrated, but we understand that this is a fact of life. Any sufficiently complex system, like a power grid or Amazon Web Services, is bound to go down. There is little that can be done to assure us that it never will. Because of this, single outages are not good measures of quality of service. Albert Wenger said something to me recently that stuck in my head: We live in a stochastic world, but people fail to grasp it because all they experience is right now.

So is it really true - is cloud computing a bad idea? Of course not. It is a wonderful, powerful idea. In this post, we explore the ideas behind cloud computing and argue that it will be an integral part of our future.

Clouds vs. LAMPs

This generation of web services got their start from LAMP - a stack of simple, yet powerful technologies that to this day is behind a lot of popular web sites. The beauty of LAMP is in its simplicity; it makes it very easy to get a prototype out the door. The problem with LAMP is in its scalability. The first scalability issue is fairly minor - threads and socket connections of the Apache web server. When load increases and configuration is not tuned properly you might run into problems. But the second problem with LAMP is far more significant: the MySQL relational database is the ultimate bottleneck of the system.

Relational Databases are just not good at growing beyond a certain capacity because of the way they represent information. And so when you reach a certain scale, they become difficult to manage. A way around it is a technique called data partitioning. If it is possible to split your data into N independent sets, then you can scale with the LAMP approach indefinitely. But if this is not the case, then your only way is to abandon the relational database for a distributed one. And this is the path through which you break into the clouds.

The Basics of Cloud Computing

The idea behind cloud computing is simple - scale your application by deploying it on a large grid of commodity hardware boxes. Each box has exactly the same system installed and behaves like all other boxes. The load balancer forwards a request to any one box and it is processed in a stateless manner; meaning the request is followed by an immediate response and no state is held by the system. The beauty of the cloud is in its scalability - you scale by simply adding more boxes.

In the diagram above, the compute cloud consists of three basic elements: a web server/application layer, a distributed storage layer, and a distributed queue layer. Each one of these layers is a cloud itself - meaning that boxes are all identical and perform the same function. In the simplest scenario, the web tier is the same as the bits in the LAMP stack. The web server can still be Apache and it can be running PHP code - the application. The fundamentally different bit is the database, which is no longer MySQL, but instead a distributed storage system like Amazon S3, Amazon SimpleDB, or Amazon Dynamo. The queue piece is optional, but it is needed in cases when real-time handling is impossible or not necessary.

The real advantage of the cloud is its ability to support on demand business computing. An application written to run on the cloud scales from 1,000 users to 10,000 and then to 10,000,000 just by expanding the number of boxes. From a business perspective this is very attractive because it is easy to calculate growth and scalability costs.

Do Clouds Really Work?

You bet! The best example is Google. The king of the web is reigning with a farm of hundreds of thousands, if not millions of boxes. To race along with the web, Google constantly increases the size of its cloud, incorporating new web sites, and expanding its index.

Of course, Google isn't the only one operating in a cloud. All major web players including Amazon, eBay, Yahoo! and Facebook are running some sort of massive computing cloud. Amazon in particular has been perfecting the art for the past fifteen years. The company has world class expertise and top notch talent in distributed computing, led by CTO Werner Vogels. Obviously, it is not an accident that Amazon is making a major bet and launching into the web services infrastructure vertical. They believe that clouds will be the future of computing, that they can make a business out of it, and that they can do it better than you can do it on your own.

You vs. Them

Every time we have an outage, like the one that happened on Friday, people sit back and think: How can I possibly rely on these guys? I bet I can just code this up myself and it will be fine! For decades the software industry has been suffering from the 'I can do this better' disease. We keep re-inventing programming languages, we keep on re-writing the APIs, and we keep thinking that we're smarter than the guys who came before us. 99.9% of the time we are wrong. The truth is that we cannot do it better than Amazon. They spent a massive amount of money, talent and most importantly time, trying to solve this problem. To think that this can be replicated by a startup in a matter of months, assembled, be cost effective, and work properly is just absurd. Large-scale computing is an enormously complex problem, that takes even the best and brightest engineers years to get right.

In this day and age, build vs. buy is a complete no-brainer, especially for startups. Whatever is part of your core business you build. Everything else you buy. If your business really does require a custom cloud solution, then you have to build it. But the chances that the Amazon Web Services stack, once fully built out, would not fit your needs are slim to none. By focusing on what truly makes you unique and different you have the chance to beat the competition. Otherwise, if you keep on reinventing the wheel you won't have the time and resources to advance your real product.

SLA vs. Common Sense

But maybe last week's failure is not about clouds but about SLAs (Service Level Agreements)? If the SLA says that you will be up 99.99% of the time, then how can you go down for 3 hours? But here's the truth about SLAs. Whatever they say, they still don't mean that the service is not going to go down. You can't prevent power grid outages and you cannot prevent cloud outages. You can take all the precautions and backups, but still you cannot be completely certain that failure would not occur. First order catastrophes happen.

So the problem is that we should not be looking at the SLA, but instead we need to consider common sense. It is not a single failure of the system that is indicative of the performance. It is the frequency of failures that we should look for. If AWS goes down once a year each year for 3 hours, then it is nothing short of cloud computing paradise. If this happens every quarter, it's alarming, every month - unacceptable. The point is, as Albert Wegner explained, we need to think about this stochastically.

Yes, it is difficult for people to be off the grid. Yes, it is difficult to explain to our users why we are down. But we need to be transparent about our abilities here - we are all humans working as hard as we can to make things work. Everybody gets that. The crux of the problem is transparency.

Any company that wants to own our hearts, ignite our imaginations, and power the next generation of our computing infrastructure needs to be transparent. If there is a problem, come out and say it. Put the sign up: "we are working hard on fixing it." Email the developers: "hey guys, something is seriously wrong on our end, we are investigating and will keep you posted." Transparency and openness from infrastructure providers, from the company all the way down to the consumer, is the key to having piece of mind. Because everyone knows that when we apply ourselves in a genuine and passionate way, there are no problems that we cannot solve - the cloud will be up and running shortly.

The Future: Through the Clouds into the Sky

The incident last week is in no way going to deter cloud computing or even Amazon Web Services. We are witnessing a fundamental shift in our ability to compute and this is just the beginning. Amazon is at the forefront of making massively parallel, web scale compute services available to the world. Free from the need to solve the scalability problems, startups are able to focus on the specific problems that their product or service is trying to solve. All of this is happening while the cost of hardware, bandwidth and services overall keep dropping.

Truly, we are reaching for the sky through the computing clouds.


1 TrackBacks

Listed below are links to blogs that reference this entry: Reaching for the Sky Through The Compute Clouds.

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/3341

Venture Capital WearAlternative Reality GameZYB | Bring mobile data to lifeThe forum site Tangler now allows you to distribute your forum as a widget.YouTorrentMixed fortunes for airlines in first online word-of-mouth studyNo emails on Wednesday - I&rsquo Read More

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Meanwhile, Sun is looking to substitute 'LAMP' with 'SAMP' :)

    http://virtualization.com/acquisitions-acquisition-takeover/2008/02/18/sun-aims-to-virtualize-web-2.0-startups.-from-lamp-to-samp/

    Posted by: Robin Wauters | February 18, 2008 2:29 PM


  • You can't scale thousands of small, temporary, transient joins even in a cluster of MySql boxes, or, for that matter, Oracle, or SQL Server farms. They all write saturate.

    You need distributed file by user ID cluster, and RDBMS references to the clusters, but not the data.

    The latest Object databases from http://www.db4o.com/default.aspx

    Are an area that application developers better get familiar with. Danga.com also has some good thoughts.

    We have inherited a generation of LAMP stack journeymen that are very bright, but have been lulled into a state of complacency regarding the utility of the Relational Database, to the point where scaling has become a cause celebre (Twitter).

    Posted by: Alan Wilensky | February 18, 2008 2:31 PM


  • The "common sense solution" is simple 'black boxes' with the minimum of linkages. In short, less to go wrong. But things will still go wrong, so allow for backup servers / sites / clouds (see how it works ? if you us server based computing, you need a backup server; if you use cloud based computing, you need a backup cloud...). Not only must these backup clouds be 'safe', they must be available.

    The implication, for 100% uptime (and true vendor independence), is that your service needs to be able to run on AWS and EMC and..

    This will cost $$ (developer time if nothing else). Someone (i.e the customer) needs to decide how far doen the route you go before the spend outweighs the value.

    Posted by: martin english | February 18, 2008 3:24 PM


  • well you're right i don't know why poeple got down so hard on amazon like that? but the "You vs. Them" paragraph is total bullshit , if every body did that no body would do anything we wouldn't have seen google because "yahoo spent millions of dollars on thier search engine that no startup could do better than them".

    Posted by: yehia | February 18, 2008 3:35 PM


  • i agree with yehia. i think the solution could be found in months, not years, and it doesn't have to be somebody whose core competency lies elsewhere. We need specialized people working on infrastructure like amazon does. we just need someone to do it better.

    Posted by: Coleman Foley | February 18, 2008 4:22 PM


  • You shouldn't lose the entire cloud due to a single failure, or even multiple failures.

    Posted by: Stu | February 18, 2008 4:44 PM


  • SLA are here to protect the companies, and c'mon 99% is actually a pretty decent percentage when you know that this percentage is often higher in real life - when using cloud computing.

    Posted by: Vincent Cassar | February 18, 2008 5:28 PM


  • Databases are usually the bottleneck. And it's hard to scale them horizontally. I've read some about hypertable, and that seems really cool.

    Posted by: Joris Verschoor | February 18, 2008 11:41 PM


  • Another interesting point of view of the same fact (Amazon WS outage), also stressing the transparency need in clouds:

    http://stage.vambenepe.com/archives/165

    My two cents!

    Regards,

    ------
    Fermín

    Posted by: Fermin | February 19, 2008 6:08 AM


  • Cloud computing enthusiasts, check out Cloud DB:

    http://couchdb.org

    Written for documents, but extrapolate the concept out to whatever.

    Posted by: Todd | February 19, 2008 7:49 AM


  • Nice slides and clear illustrations. Good read.

    Posted by: CloudFans | February 19, 2008 8:30 AM


  • it's good to see cloud computing gaining so much attention. have you looked into force.com? i'm still scratching my head about why salesforce.com isn't getting more attention for their move into the PaaS/cloud solutions space.
    http://www.positionmakers.com/2008/01/18/salesforcecom-expands-its-core-business-paas-and-the-future-of-app-develpment/

    i see salesforce.com's move as more important than amazon's if only because salesforce has a focus on b2b where as amazon's focus is more b2c. i'm sure for all "c" facing applications, amazon will be great. from a b2b perspective though, integrating the CRM as well as customizing the APIs for any internal IT development needs would position salesforce over amazon.
    i'd love to get your feedback and perspective.

    Posted by: messels | February 19, 2008 11:54 AM


  • Alex - great post. You really put things in proper perspective.

    g

    Posted by: GraemeThickins Author Profile Page | February 20, 2008 4:00 PM


  • You nailed it on the "I can do this better disease" bit - I shudder when I think of all the things our company's been asked to do over the years when perfectly viable alternatives already existed from established companies.

    This is more from a business than tech angle, but a lot of the time, "better" usually means "cheaper" instead of "more reliable" and I think there's an ego play at work where someone wants to go "ha ha it took company X millions to do this and I did it for $4000." My ego, on the other hand, has shrunk dramatically and I'm happier for it: I've taken to responding to these requests with "I'm simply not smart enough" and not feeling bad at all. I've bookmarked this post for some more sensible counter-points, thanks :)

    Posted by: Jason | February 25, 2008 11:17 AM


  • Good Article.
    It is to remember that without the convergence of grid, virtualization and SOA concepts, the cloud implementation cannot be done. In fact, the Cloud Computing concept is a Grid based business model that provides utility Computing services and/or SaaS services

    Terminology Synch:
    Grid provides the Service-Oriented Infrastructure Virtualization (SOIV) that enables IT scalability and flexibility
    Service Orientation - Service-orientation is a design paradigm that specifies the creation of automation logic in the form of services. It is applied as a strategic goal in developing a service-oriented architecture (SOA).
    Virtualization - a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources.
    Utility Computing – Pay-per-Use for network based Compute and Storage services
    Software as a Service (SaaS) - Pay-per-Use for network based software applications’ services

    Posted by: Avner Algom | March 2, 2008 4:00 PM


  • You're right that some little group can't put in more effort in their technology than Amazon can, but that doesn't mean they can't leap-frog them.

    Small changes in abstraction or technology can jump way past huge efforts. At each new level, on a fraction of the work is required to get the same results. Our technology is so immature that one expects there are plenty of 'big leaps' still remaining to be found. Proof: Amazon has put in less work than Oracle.


    Paul.
    http://theprogrammersparadox.blogspot.com

    Posted by: Paul W. Homer | March 10, 2008 7:50 AM


  • great diagrams about cloud computing. curious if geo server load balancing vs. generic slb helps in the cloud?

    Posted by: Tim | March 10, 2008 8:21 AM




RECENT JOBS


RWW READERS


TEXT LINK ADS


RWW PARTNERS

adaptiveblue

Yahoo Buzz