ReadWriteWeb

Amazon Rents Out MapReduce Power with EC2, S3 and Hadoop

Written by Phil Glockner / April 2, 2009 11:00 AM / 9 Comments

Amazon announced today that it is bridging two of its web computing services, EC2 and S3, with Hadoop, an open-source project that brings the same distributed data processing power as Google's MapReduce. In fact, it is calling the new service Amazon Elastic MapReduce. The new service will allow its EC2 customers to perform distributed MapReduce queries on enormous datasets stored in S3, paying only for the computation time they need.

Hadoop has been an open-source project in the making for the last few years, inspired by Google's white paper on its version of MapReduce. The technology is an almost perfect fit with Amazon's growing web services, matching distributed CPU time with vast data storage requirements, both things that fit well with the cloud model.

The way MapReduce works is a fairly straightforward concept: You take a problem that requires working with a giant (and we're talking massive - sometimes petabytes) dataset, distribute working with the dataset over thousands of separate processes (called mapping) and then taking the thousands of results you get back and reducing those results into a single master result. For certain tasks, MapReduce can vastly improve the efficiency of these types of tasks, and adding more computing power gives you a linear improvement in speed.

Yahoo! has been using its own version of Hadoop for a while now. And even before this offering, larger Amazon Cloud Computing customers have already begun to use Hadoop in EC2. This is from Wikipedia's article on Hadoop:

As an example The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth).

As Amazon says on its blog, "After a while [developers] tend to report that they begin to think in terms of the new style, and then see more and more applications for it." Which we believe means that MapReduce is the new, big hammer, and as developers start looking around, every dataset starts looking like a nail. This is good news for Amazon as it only stands to profit.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Does this announcement put Aster Data System www.asterdata.com/ a Sequoia backed company out of business?

    Posted by: COP | April 2, 2009 2:08 PM



  2. I think this was move expected

    Posted by: Anuj Mehta | April 2, 2009 9:06 PM



  3. This is great validation that MapReduce is going mainstream for data processing. Aster Data's vision is to bring the power of MapReduce to a whole new class of developers and mission-critical enterprise systems. Whether that's on-premise on in the the cloud is just a question of "platform".

    Aster's core product is a high-performance analytic database for on-premise data warehousing. The nature of the Aster nCluster DBMS makes it very performant in the cloud - and it's available on both Amazon and AppNexus.

    When would you use Aster’s In-Database MapReduce vs. a system like Hadoop? http://www.asterdata.com/blog/index.php/2009/04/02/enterprise-class-mapreduce/

    Posted by: Steve Wooledge | April 3, 2009 11:44 AM



  4. I attended an AWS event in Los Angeles in September and there were many people asking for Map Reduce supports. As always Amazon is listening the the community and adding innovative services. We are very excited!

    Posted by: Andy | April 4, 2009 1:23 AM



  5. Does this announcement put Aster Data System www.asterdata.com/ a Sequoia backed company out of business?

    Posted by: Hiphop | July 24, 2009 8:33 AM



  6. I attended an AWS event in Los Angeles in September and there were many people asking for Map Reduce supports. As always Amazon is listening the the community and adding innovative services. We are very excited!

    Posted by: Rap | July 24, 2009 8:34 AM



  7. I agree with you Steve..
    Todd DiRoberto
    http://www.newsguide.us/art-entertainment/movies/Todd-DiRoberto-of-American-Satellite-Hosts-Independence-Day-Charity-Event-for-Operation-Bigs/

    Posted by: amsat | July 31, 2009 1:52 PM



  8. I attended an AWS event in Los Angeles in September and there were many people söve asking for Map Reduce supports.

    Posted by: söve Author Profile Page | August 12, 2009 7:54 AM



  9. While the benefit of EC2 is elasticity the benefit of colo is the ability to run your own hardware config and have dedicated support to admin your own cluster.

    We see MUCH better support when we can have in-house admin vs external.

    If I hire someone in-house and they don’t perform I can fire them. ;)

    Further, the quality is much higher when you have a full time admin.

    I think a hybrid model would work better if you can get your data into EC2 quickly. It doesn’t work for everyone of course.

    We have a pretty static configuration so an EC2 migration isn’t on the roadmap until they lower their prices by 4x.

    Posted by: Tea Bags | December 5, 2009 9:28 PM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  

If you think Twitter is big, check out the Real-Time Web
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS



RWW PARTNERS