hadoop - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/hadoop en Copyright 2012 Richard MacManus readwriteweb@gmail.com Mon, 13 Feb 2012 19:17:22 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Top 10 Enterprise Cloud Apps and Services of 2011 BestOf2011.pngIt seems like just like summer, Bill Murray used to sing, and that's because it was. This year, for the first time, ReadWriteWeb expanded its coverage of the technologies that change our world through the Web, with new emphasis on cloud-based services to consumers and cloud technologies for businesses. Cloud services are more than just hosts for apps. They're resources that you can provision for your changing needs, and which you can scale up or down as necessary.

Certainly 2011 was dramatically different from 2010 for businesses for one critical reason: In a very short time, suddenly true scalability for every IT service appeared within their reach. A market that was almost non-existent by the end of last year, has grown past what many analysts would consider the point of adolescence, with the shakeout subsiding and brand dropouts declining. Someone should remind these cloud service folks there's a recession going on.

]]> For ReadWriteWeb's first list of top choices in enterprise cloud services, we took a cue from one of our winners and evaluated the social data stream, to see which services you talked about - and which ones got the favorable tweets, especially for customer service. Hosting companies aren't covered on this list, even if they're cloud host providers, unless they offer some form of value-add that makes their service more useful to a targeted class of customers.

Not every candidate for this list offers its own cloud platform; we considered services that are offered on others' cloud platforms. And we considered the platforms on which they're offered, too, if they presented a value-add that made them unique. Beta services were not eligible; we stuck with general availability only.


10. Xeround. This service, which premiered in late 2010, quite easily (judging from customers' comments) deploys new or existing MySQL databases on cloud platforms such as Amazon EC2, Rackspace, and Heroku. Rather than just shoving off databases into the cloud space, Xeround manages storage on the back end with built-in scalability.

111220 Xeround database architecture.jpg

The back-end storage, whose configuration is kept transparent from the user, is managed by a homegrown system that scales itself by means of virtual partitioning as opposed to sharding. Each partition is maintained on independent nodes using shared-nothing architecture (meaning, no single point of failure leads to cascading node elimination). Your database application never sees the partitioning take place, and isn't even involved; and scaling takes place with the goal of zero downtime - which thus far appears to have been achieved.

Usually a company's own predictions for the coming year are rather self-serving, so Xeround's #1 event for 2012 may come as something of a shock: "The Cloud will disappear," writes Xeround's Itamar Haber, "not in the sense that it will be gone (far from it) but rather in that that it will become all-encompassing and omnipresent. We will no longer see The Cloud as a trend or even an alternate model for using compute resources - it will become transparent as it will be the de facto default means of choice for nearly all IT-backed projects across industry sectors." I guess that makes this the last year for this Top 10 list as well, eh?


9. Bime might not have qualified as a cloud service per se, except for the clever way it serves as storage for business intelligence. It's not a particularly sophisticated data store, but it doesn't have to be for its main purpose: visualizing and analyzing data from large databases. Many businesses used to rely on exporting data to Excel for this purpose, in what often became one of the clumsier exercises in database versioning.

111220 Top 10 Ent Cloud Apps 02.jpg

Bime Analytics makes itself visible to the world through interesting, if often fairly trivial, sample renderings of data that's fascinating to someone. My daughter, for example, will find interesting this "data cloud"-style subdivision (above) of the various villains that appeared in Doctor Who, organized by relative area in proportion with their appearances throughout the entire franchise history, and organized in blocks by the year of their first episode. It's an example of the Bime visualization engine putting itself through its paces using innocuous data (though I gather the Sontarans could start a cosmic war to avoid being sublimated by the Cybermen).

Essentially, Bime is a data visualization firm, which ordinarily would render it just an app. What makes Bime's business model not only cloud-centric but also more viable are two things: 1) If you're going to visualize your own data, you need a way to get it into the system. So Bime can afford to charge a premium for storage ($60/month for the first gigabyte, $120/month for up to 10 GB) since what you'll really be using it for anyway is visualization. Imagine paying $60 a month for something that only billed itself as a graphics app. 2) As my friend and colleague Pam Baker showed us last August, Bime offers an API for developers to build their own apps that utilize the visualization functionality, that in turn uses the data you're paying Bime a premium to store for you. It is an excellent business concept, and a way to use cloud resources to twist an otherwise point-and-shoot SaaS app into a viable business.


8. Okta. Almost four years ago, this service was founded with the notion of providing identity federation for a handful of familiar SaaS apps like Google Apps and online services like Facebook. Well, Facebook wants to be a major player in the identity game in its own right. Besides, simply federating single-sign-on access to an expanding list of players isn't an exciting business model when that number of players begins to dwindle.

So earlier this year, as David Strom told you in August, Okta put a new cloud-style spin on its service: In adding a new layer of Active Directory integration, it made itself into a kind of all-purpose AD app. Your single sign-on takes place through this app, rather than through some complex federation service on-premise. And because all the users in a business are checking in through one place, Okta becomes the dashboard for delegating privileges and permissions to those users. Through Active Directory, the same password that enables users to enter the network in the first place, logs them onto Okta as well.

Last year at this time, developers and administrators alike were in a quandary over how to manage the identities that users must assume in order to access multiple applications, both on-premise and through the cloud. Okta is not yet the total solution to this problem, but the fact that it has taken a serious bite out of the problem in so short a time points to the possibility of a cloud-based solution to the broader identity management problem.

Next page: Service-as-a-service...

111220 Top 10 Ent Cloud Apps 03.jpg7. PubNub. One of the hilarious fallacies of "walled garden" service architectures is the illusion of exclusivity. By making users sign an agreement explaining exactly how they must use their services, complete with terms of use, carriers and service providers are betting that they'll presume they can't find some way to utilize the same (or better) functions some other way.

PubNub is clearly a different way, and the service it provides is one that many users may have thought they were prohibited from doing themselves: self-service push notification. If you're building an app that needs the capability to alert the user, through his operating system (PC or smartphone), PubNub provides you with a pair of dirt-simple API calls. You implement one in the app that will be listening to your PubNub channel for alerts - this is the subscribe part of the operation. The other for the server side, simply to send text alerts through that channel. For the most part, save for a half-page of syntax, that is the instruction manual.

Through the pay-as-you-go scheme, users pay 1/100 of one cent for each alert sent or received. Of course, with fully deployed apps on a global scale, you could be racking up charges fairly quickly, so alternately PubNub offers paid-in-advance service tiers beginning at $129.99 per month for 1 million alerts per day. The service is so straightforward that it's now being used for active notifications for real-time gaming, as well as for quick-and-dirty chat rooms.


SendGrid (150 sq).jpg6. SendGrid. Though Salesforce's Marc Benioff has publicly proclaimed e-mail "dead" (along with Microsoft and something he calls "the false cloud," which bears a curiously red tint to it), e-mail is actually far from dead. If you look at the colossal size of my own inbox, you might proclaim it "resting," but it's not dead.

SendGrid (a frequent sponsor of ReadWriteWeb) is an e-mail distribution service for individuals and companies that utilizes a cloud pricing model, with the idea that you're paying for outgoing bandwidth (measured in messages sent). Its value-add is that it sends marketing e-mails systematically, in a deliberate effort to avoid being tagged by e-mail services as a spammer. It calls its strategy "IP address warming," where all addresses start cold and work their way gradually towards sending higher volumes, as servers establish reputations for them. So while SendGrid's pricing is based on scale, it actively encourages customers to start at the bottom, spacing out monthly distributions of newsletters throughout the month rather than all on the same day.

"Deliverability is a secret crisis facing any business that relies on e-mail communications," reads a recent SendGrid white paper (PDF available here). "Most companies don't think about deliverability until they have a major issue where thousands or in some cases millions of e-mails fail to arrive. People assume that an e-mail is delivered if they don't receive a bounce notification. The reality is very different: 20% of commercial e-mail sent never arrives as intended. If you're an online business you need to take steps today to increase the reliability of your e-mail communications."


5. Twilio. As many of our long-time readers are well aware, Twilio is essentially a telecommunications back end whose API functions may be called by any app. It's like a chassis for Skype without the baggage of Skype. Granted, Twilio has been growing at a steady pace. But last July, it added what history may record as its breakthrough feature: a JavaScript SDK. This makes a Twilio client available directly from a Web browser, so you can have direct phone communications inside your browser app, as well as your stand-alone app.

twilio diagram.jpg

From here, the service started building an ecosystem - which may have just become the necessary ingredient for any cloud service to be competitive going into 2012. In Twilio's case, as Dan Rowinski reported in September, the service announced it's partnering with venture capitalists to fund the development and deployment of an entire community of apps that just happen to communicate over Twilio's platform. As Rowinski wrote, "By moving communications to the cloud and providing pay-as-you-go billing that does not hang over the developer's head, Twilio has a chance to grow rapidly in both the consumer app development community and the enterprise."

As private developer Patrick McKenzie recently put it, "I think Twilio is, far and away, the most exciting technology I've ever worked with... Smartphones aren't smart because of anything on the phones themselves, they're smart because they speak HTTP and thus get always-on access to a universe of applications which are improving constantly. Twilio radically reduces the amount of hardware support a phone needs to speak HTTP - it retroactively upgrades every phone in the world to do so. After that, all you need is the application logic."


MongoHQ logo (150 sq).jpg4. MongoHQ. In politics of late, when left with few or no other options to slam an opponent, the one way to make a charge stick is to make it so ludicrous or so audacious that folks will conclude it has to be true to have been made in the first place. (For more, see "Kerry, Sen. John.") So you have to wonder how it is that MongoDB made the news recently for having been the subject of a hugely popular discussion thread (to use the phrase loosely) on Hacker News. Here is the lead post for that thread, in its entirety: "Don't use MongoDB." (A lot of research done there.)

Our Joe Brockmeier looked into the matter, and found direct evidence of substance somewhat lacking. MongoDB, by the way, is a scalable data store for loosely structured (as opposed to unstructured) data using MySQL. MongoHQ, which made our list this year, is effectively the monetization engine for MongoDB, making the database platform accessible to literally anyone on self-service terms.

As a developer, you only pay for MongoHQ once your storage size scales up to 256 MB - meaning, once your database becomes legitimately used by others. Large database instances are served up at $49 for 5 GB, although for high availability, apps developers may choose instead to pay $149 per month for a replica set, which includes full failover.

Next page: Yop.

3. Radian6. Back in 2009, RWW's Marshall Kirkpatrick examined Radian6 for the first time - a kind of social media dashboard offering live heuristics for the instances that certain terms (especially brand names) get mentioned on social services like Twitter. At the time, it looked like an interesting curiosity, especially for people whose hobbies are to "keep score at home." What was it really for? A Radian6 product manager told Marshall that the direction for the product was "adding CRM."

110831 Dreamforce keynote 019.png

Maybe someone at Salesforce.com was using Radian6 at that very moment. Though it might not have seemed so at the time, Salesforce's acquisition of Radian6 ended up being one of the most important enterprise news stories of the year. If anyone has proven it knows how to avoid burying an acquired product under the rug, it's Salesforce. Now the product has become a genuine platform, with apps being built around it and distributed on Salesforce App Exchange.

Now, Radian6 is more than just a dashboard for the curious. Social marketing specialists can utilize the tool to alert them when online chatter around a certain topic has reached a critical threshold. It can trigger actions en masse, such as couponing individuals who seed positive comments on a product. And its instantaneous demographics that align tweeters and Facebook users with their respective market segments, has already crossed the borderline into scary territory.


Hortonworks logo (new, 150 sq).jpg2. Hortonworks Hadoop. It wasn't quite two years ago when Hadoop was essentially a side project at Yahoo, incorporating some code from a side project at Google. Today (although it's a terribly mixed metaphor) the Hadoop elephant has leap-frogged over high-availability database projects at Oracle and Microsoft to become the de facto solution for hosting big data in parallel.

It gets confusing when an open source project is stewarded by more than one commercial provider, as our Joe Brockmeier reported in October. The Yahoo engineers largely responsible for its creation have formed Hortonworks, and it is they who launched the Hadoop cloud platform in November. Now, with the assistance of commercial partners such as Informatica, developers can deploy extraordinarily large data sets in a cloud configuration with high reliability - much more reliable than the hardware it runs on.

"Hadoop was built from the bottom up to be built on very low-cost commodity hardware," Hortonworks CEO Eric Baldeschweiler told RWW in October. "It's built with the assumption that the hardware it's running on will break, and that it must continue to work even if the hardware breaks." Which implies a lot, given that one of the other biggest enterprise cloud news stories of the year was Hortonworks' partnership with Microsoft to bring Hadoop into Windows Server, possibly as a role for System Center.


heroku-1.jpg1. Heroku. In little more than a year, the service that had been considered the pioneer platform-as-a-service, Microsoft Windows Azure, was left in the dust as an also-ran by yet another Salesforce acquisition.

Heroku quite literally tries to fill the large, and widening, void left over when computers stopped including built-in BASIC interpreters. Rather than presenting itself as a battery of stand-alone servers built for doing scheduled jobs on complex agendas, Heroku is a managed code interpreter for a growing bouquet of languages, some of which (like Clojure and Scala) are only just now entering the common vernacular. Such languages are clearly better suited for quick deployment of Web-based functionality than the dependency-heavy C++, C#, and Visual Basic which comprise the brunt of the .NET languages Microsoft supports in Azure.

What's more, the Heroku add-ons ecosystem is already in full bloom, even with support for apps and services like MongoHQ that utilize cloud platforms other than Heroku itself. Last October, Heroku CEO Byron Sebastian told me that his goal for the service is simply to enable developers to push code to the cloud using git push and have it run. There's no agenda for the promotion of any single language platform (including Ruby on Rails, on which the service was founded, and whose creator is now a Heroku employee). "We differentiate ourselves at the level of our cloud app platform, rather than at the level of languages and frameworks," Sebastian said, "because we believe that openness is an important principle in our industry. It's in our blood."

Well, the enterprise cloud is in our blood now, too. And at this rate, it will probably be in our bones and muscles before the end of 2012.

]]> Discuss]]>
http://www.readwriteweb.com/archives/top_10_enterprise_cloud_apps_and_services_of_2011.php http://www.readwriteweb.com/archives/top_10_enterprise_cloud_apps_and_services_of_2011.php Best of 2011 Mon, 26 Dec 2011 08:15:00 -0800 Scott M. Fulton, III
Daily Wrap: Scott Berken's Mindfire Free Until November 3 and more Scott BerkenScott Berken's book, Mindfire, is free until November 3, 2011. All of this and more in today's Daily Wrap.

Sometimes it's difficult to catch every story that hits tech media in a day, so we thought it might be helpful to wrap up some of the most talked about stories. Assuming this goes over well, we're going to give you a daily recap of what you missed in the ReadWriteWeb Community, including a link to some of the most popular discussions in our offsite communities on Twitter, Facebook, LinkedIn and Google Plus as well. This is a new feature at ReadWriteWeb so we covet your feedback. If you have suggestions, please leave them in the comments below or reach out to me directly at robyn at readwriteweb.com.

]]> Scott Berkun's "Mindfire" eBook Free Until November 3rd

Scott is a popular speaker, and the author of a favorite book of many of the staff, Confessions of a Public Speaker. You only have a few hours left to grab a copy of his latest book, Mindfire, and all he wants in return is your email address.

From the comments:

Screen shot 2011-11-02 at 9.23.38 PM.png

Here are a few more must read posts, chosen by your fellow community members.

Screen shot 2011-11-02 at 10.01.29 PM.png

Screen shot 2011-11-02 at 9.38.30 PM.png

Screen shot 2011-11-02 at 9.48.20 PM.png

Screen shot 2011-11-02 at 9.40.26 PM.png

Screen shot 2011-11-02 at 9.43.57 PM.png

Screen shot 2011-11-02 at 9.51.45 PM.png

ReadWriteWeb Worldwide Meetup

Make plans to be at the ReadWriteWeb Worldwide Meetup on November 15. Reach out to our community manager, Robyn Tippins, at robyn at readwriteweb.com if you have any questions.

]]> Discuss]]>
http://www.readwriteweb.com/archives/daily_wrap_scott_berkens_mindfire_free_until_novem.php http://www.readwriteweb.com/archives/daily_wrap_scott_berkens_mindfire_free_until_novem.php Community Wed, 02 Nov 2011 19:30:00 -0800 Robyn Tippins
Amazon Rents Out MapReduce Power with EC2, S3 and Hadoop Amazon announced today that it is bridging two of its web computing services, EC2 and S3, with Hadoop, an open-source project that brings the same distributed data processing power as Google's MapReduce. In fact, it is calling the new service Amazon Elastic MapReduce. The new service will allow its EC2 customers to perform distributed MapReduce queries on enormous datasets stored in S3, paying only for the computation time they need.

]]> Hadoop has been an open-source project in the making for the last few years, inspired by Google's white paper on its version of MapReduce. The technology is an almost perfect fit with Amazon's growing web services, matching distributed CPU time with vast data storage requirements, both things that fit well with the cloud model.

The way MapReduce works is a fairly straightforward concept: You take a problem that requires working with a giant (and we're talking massive - sometimes petabytes) dataset, distribute working with the dataset over thousands of separate processes (called mapping) and then taking the thousands of results you get back and reducing those results into a single master result. For certain tasks, MapReduce can vastly improve the efficiency of these types of tasks, and adding more computing power gives you a linear improvement in speed.

Yahoo! has been using its own version of Hadoop for a while now. And even before this offering, larger Amazon Cloud Computing customers have already begun to use Hadoop in EC2. This is from Wikipedia's article on Hadoop:

As an example The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth).

As Amazon says on its blog, "After a while [developers] tend to report that they begin to think in terms of the new style, and then see more and more applications for it." Which we believe means that MapReduce is the new, big hammer, and as developers start looking around, every dataset starts looking like a nail. This is good news for Amazon as it only stands to profit.

]]> Discuss]]>
http://www.readwriteweb.com/archives/amazon_rents_out_mapreduce_power_with_ec2_and_hado.php http://www.readwriteweb.com/archives/amazon_rents_out_mapreduce_power_with_ec2_and_hado.php News Thu, 02 Apr 2009 11:00:00 -0800 Phil Glockner
Web Apps are the new black Rands in Repose has written the best post about web apps I've read this year. I'll pick out the highlights here and finish with some thoughts on re-inventing the page metaphor. Also you may want to check out the Web Apps Compendium v1.0, a great attempt at listing out all the main web apps on the Web today. 

What is a web app? Simply defined, it's a software program that runs in a Web browser (proper definition here). What are they good for? Rands explains that there are two main advantages of web applications: 

1) Zero installation and no upgrades for the user.

2) Access anywhere with an Internet connection (which Rands terms "no baggage")

The main benefits of web apps then are: they're cheap to maintain and they empower users. So why, Rands asks, "aren't they everywhere?"

Good question, but then I've met loads of developers who still think web apps are too limited in functionality, compared to desktop apps (applications you install on your PC). And that really is the main drawback of web apps - they're constrained by the limitations of the browser. But wait, Rands might say - this is where Ajax comes in. 

I like Rands' concise definition of Ajax: "improved interactivity within web pages". He believes that due to Ajax, "the interface of web applications can vastly exceed your expectations." That's certainly true of Gmail and Google Maps, still the two quintessential Ajax apps. 

One thing I'm wondering though: with all the current activities around synchronization for desktop apps, is that lessening the gap between desktop apps and web apps in terms of "no baggage"? When I say synchronization, I mean desktop apps that use Internet connectivity to allow users to synch their data over more than one PC or application - which solves the "access anywhere" issue for desktop apps. An example is Newsgator Online. I'd be interested in hearing some developers opinions on that...

We're Not in Pages Anymore, Toto!

The best part of Rands' essay, for me, was this statement:

"Stop thinking of a web application as a collection of pages.

The back button is not a bug in Ajax, it's a flaw in the browser metaphor."

This was one of the themes of the Web 2.0 for Designers article I co-wrote recently with Josh Porter. We wrote that the Web is no longer a collection of "pages", but a flow of “microcontent” units distributed over dozens of domains. Rands refers to "objects" instead of microcontent, because he's talking about web apps in a programmatic sense. I'm looking at it more from an information unit sense. But we're essentially on the same, er... page.

In summary - web apps today are aggregators, remixers, search interfaces, tagging and bookmarking apps, news services, and much more. It's all microcontent and so I have to agree with Rands and say that the back button is less relevant in web apps today. Often we don't want to go back to the previous page - we want to re-aggregate information, or re-contextualize, or do another search, or remix data, etc. In Web 2.0 we need an interaction framework that overcomes the "page" metaphor and recognizes that we're dealing in much smaller and more fluid units of information.

]]>
http://www.readwriteweb.com/archives/web_apps_are_th.php http://www.readwriteweb.com/archives/web_apps_are_th.php Web Development Wed, 25 May 2005 22:19:07 -0800 Richard MacManus