ReadWriteWeb

The danger of running a remix service

Written by Richard MacManus / September 15, 2005 3:52 PM / 6 Comments

Populicio.us was a service that used data from social bookmarking site del.icio.us, to create a site with enhanced statistics and a better variety of 'popular' links. However the Populicio.us service has just been taken off air, because its developer can no longer get the required information from del.icio.us. The developer of Populicio.us wrote:

"Del.icio.us doesn't serve its homepage as it did and I'm not able to get all needed data to continue Populicio.us. Right now Del.icio.us doesn't show all the bookmarked links in the homepage so there is no way I can generate real statistics."

This plainly illustrates the danger for remix or mash-up service providers who rely on third party sites for their data. del.icio.us can not only giveth, it can taketh away. Now, it appears as if del.cio.us celebrated its second birthday by re-designing its homepage. I'm curious if they intended to take away the data that populicio.us needed to operate, or was it an unintended consequence?

As Andy Baio pointed out, "the Delicious New Popular is pretty good". Hmmm.

I'm sure it was unintended, but the fact is del.cio.us effectively hobbled populicio.us with its re-design. Who controls the data is something that is still being explored in the Web 2.0 world - and I bet we see some more high profile examples of this 'giveth, taketh away' in the near future.

UPDATE: Fair points by both Dare Obasanjo and Ian Davis, who say that HTML scraping doesn't have the same level of obligation as an API.

Dare wrote: "An API is a service contract which is unlikely to be broken without warning. A web page can change depending on the whims of the web master or graphic designer behind the site."

Ian wrote: "This shows the distinction between content designed for human consumption and that designed for machine consumption. The human format will change for all kinds of reasons often simply as the site matures and its users become familiar with its workings. Machine interfaces change on a completely different timescale and generally stable over the long term."

Both great points. Populicio.us still lost their service because their reliance on del.icio.us fell away, but the lesson here is that screen scraping HTML comes with those risks by nature. del.icio.us or any other data silo still controls their APIs, but one would think they're more stable. Thanks Dare and Ian.


4 TrackBacks

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2500

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Hi there,

    what about API ? Clearly, it's not very wise to have to rely upon a homepage's html structure to develop a service. html code is meant to be displayed, not to be reused.

    In the other hand, del.icio.us warn they can change their API at any moment. But I think it can be discussed if the situation is balanced : if many services rely upon an API and if these many services are useful to the provider, they can be compelled to negociate before changing anything.

    Posted by: Piotrr | September 16, 2005 12:48 AM



  • Good point Piotrr, HTML scraping is much riskier than using an API.

    Posted by: Richard MacManus | September 16, 2005 3:35 AM



  • Here is a slightly different twist. Sure, there sometimes is a contract about not changing an API. What there often is no contract about is that the data behind the API is free. Imagine, for example, if Amazon, Google, or craigslist charged those using their API, if they wanted to access their respective databases. There are services sprouting up all over the web that rely heavily on others' data (indeed.com comes to mind). Even if these data giants charged a nominal fee per kilobyte, mash-up and remix services would be in serious trouble.

    Posted by: Ken | September 16, 2005 4:18 PM



  • Populicio.us was a favorite of mine, anyone would be lucky to have Xabi. The Del.icio.us team has made a huge mistake by not accommodating him.

    Posted by: John Wehr | September 17, 2005 12:06 AM



  • Hi Ken,


    I think one of the main rule of web 2.0 is that information is free. We can reasonably assume that Google, Amazon, del.icio.us, flickr, eddb, and so on will never charge for access to their service because their business model is built on free access to their data.

    Therefore, the question is : how do they make money ? There are several ways, I guess : Google's model is advertising (it's a media model). Amazon's model is selling stuff (by the way, A9 is full Web2.0 as you may retrieve data from their API, but also put your own search engine in it). Yahoo's model is unclear in my opinion : we see perfectly well that their aim is to integrate (Yahoo 360 and the recent test to forcibly integrate flickr). I'm not sure Yahoo!'s model is web2.0 ; it looks rather like old school media borg model, I think. Future will tell us.

    Posted by: Piotrr | September 17, 2005 12:42 AM



  • Piotr, I'd disagree with the premise that a "main rule" of Web 2.0 is that data is free. What makes Google, Google and Amazon, Amazon is the data they bring to the table. One of the reasons people use each of these services is that they like the results they receive - they can find a relevant site or relevant book, for example. Sure, right now, the business model is not to leverage the data itself (the data is a means to an end) but as you can see from Amazon's plans for its Alexa Web Data, not charging for data will not always be the case:

    "The Alexa Web Information Service is free at this time. We anticipate charging for this service once it is officially released. We haven't finalized pricing yet, but we expect to make this information available to you at a very reasonable price."

    (visit the help page for the Alexa Web Information Service)

    Posted by: Ken | September 19, 2005 7:46 AM




RECENT JOBS



TEXT LINK ADS


RWW PARTNERS


RWW READERS