ReadWriteWeb

Could Wikipedia's Future Be as a Development Platform?

Written by Marshall Kirkpatrick / February 23, 2009 1:21 PM / 15 Comments

Content creation at Wikipedia is slowing down. The already small number of active regular editors is on the decline and Jimmy Wales has called for live edits to be held for approval on many pages, a step sure to slow contributions even further.

The tapering of fresh content doesn't have to mean Wikipedia's death, though. The site contains a gargantuan amount of human created and tended but largely machine readable and structured data. That's a potential gold mine in terms of a potential pay-off in innovation. Wikipedia can offer developers opportunities to glean analysis, supplemental content and structured data from its years-old store of collaboratively generated information. All of that is possible, but Wikipedia as a platform can't be taken for granted.

SlumdogMillionaire.jpg

Above: Edit history via the WikiDashboard browser add-on, by Paul Irish.

If the sun is setting on Wikipedia's time as a fast-growing collection of user-contributed knowledge, maybe that part of the site's life was just its adolescence. Wiki inventor Ward Cunningham told us he thinks the moves by Wales to require approval before displaying edits are an "inevitable" maturing of the site, though not one he's necessarily happy about or believes is consistent with The Wiki Way. Nonetheless, the huge mass of knowledge amassed by the world's biggest wiki now offers developers and other websites all kinds of value that has only begun to be explored.

There is no formal Wikipedia Application Programming Interface (API) but the data there is relatively accesible anyway. It can be downloaded and proccessed locally. This spring a project called WikiXMLDB began offering a thoroughly XML-ified database of Wikipedia as well. We shouldn't fail to point out DBPedia, as well, where people are collaborating to make structured data available from Wikipedia. People are accessing the data in a variety of ways and are beginning to find good uses for it. One or more formal APIs from Wikipedia, though, would be exciting in ways similar to how it's exciting that the New York Times is opening up a number of APIs.

What Would People Do With Wikipedia Data?

Wikipedia as a tool to identify key sources of knowledge. Mainstream media coverage of Wikipedia in its early days often focused on the seemingly random contributors to the site. Some old guy with a beard down to his knees and living in a trailer park in New Mexico likes to edit entries about astronomy and the culinary arts. Isn't that quirky?

Wikipedia has managed to set free in a big way the knowledge stashed away in the minds of people all over the world. Identifying those people in a systematic way is just one example of the kind of value add that can be built on top of Wikipedia. Identifying key influencers online is a fast emerging industry and Wikipedia is one more place that can happen.

The Palo Alto Research Center recently built an application called WikiDashboard, a service to analyze recent changes and editors of any Wikipedia entry. Paul Irish, who incidentally is the editor of one of the best music blogs on the web, turned that data into a Greasemonkey script that gives one click access to the data from any page on Wikipedia (image above).

That's just the beginning of what could be done with contributor data, though there are so few active participants on Wikipedia that the user data may be more limited than you'd think.

wikiragescreen.jpg

Wikipedia as news radar. Wikipedia puts a great emphasis on current events, but the opposite is true as well - current events are reflected in Wikipedia. The site WikiRage treats Wikipedia edits like signals of significance - its subtitle is "Monitoring the Hive Mind Through Wikipedia Edits."

We've written here about non-advertising-based forms of data mining that could be huge in the future and how big a Facebook sentiment engine could be. Wikipedia edits number much, much lower than Twitter and Facebook updates, but they may be of higher value, and at the very least they seem like an important complement to a social media data mining strategy.

The Best Use Case: Leveraging Wikipedia's Structured Data

Last month we wrote here that Google appears to be exposing some semantically structured data in some of its search results. Some of that data may be originally analyzed at Google, but a lot of it is clearly coming in from Wikipedia. That's structured data that many, many companies could take advantage of.

Recommendation service MSpoke has been doing just that. (Disclosure: MSpoke's Sean Ammirati is the long-time producer of our podcast ReadWriteTalk.)

This business news tracking service uses Wikipedia to train its recommendation engines. Ammirati says that Wikipedia's disambiguation pages are very helpful in helping the company's technology know that there are, for example, two famous Michael Jordans - one of whom is a basketball player and the other is a statistician. That kind of distinction makes all the difference when you're in the business of recommendations.

By using a subset of Wikipedia's hierarchy of terms, MSpoke has been able to get an immediate foundation for its own taxonomy and quickly understand the content of articles it finds around the web.

This is the kind of thing that Metaweb and Powerset have tried to do in the past, as well. Powerset was absorbed by the Borg, and we're hearing rumors that things aren't going well at Metaweb. It's one thing to build added value from Wikipedia, it may be another to make it what you bet the farm on.

There could be something here, though. Wikipedia could do quite well for itself becoming less a destination site focused on public editing and more an open database, built up and still maintained after years of formerly frenetic public editing.

There's a chance that Wikipedia still isn't populated enough to be able to make that leap, that its political turbulence and waning enthusiasm are coming too soon. Only time will tell, but we have high hopes.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. I think you're right on this one. I was going to say if they don't do it someone else will, but I'm obviously a little behind current events. It's still true though, I guess. They can be what everyone uses or someone else can. And there's a ton of things their data is nice for.

    In terms of the speed of edits, it might be just the right time to sort of clean up their data with slower more considered edits. They got great value out of being open, but if the benefit of that, fast additions, is going away, then they need to shoot for accuracy. It might not totally 'fit' with their roots but they aren't getting the benefit of that openness anyway.

    Interesting stuff though, nice work.

    Posted by: Morgan | February 23, 2009 4:08 PM



  2. I see one of the problems in the flexibility (!) of the wiki templates. On the one hand it makes things easier for the editors, but there is always a huge backlog of "old ways" of doing things. There are for example around 10 ways to specify coordinates in Wikipedia syntax.

    Efforts around the world for extracting Wikipedia contents are mostly limited to info boxes, special templates or only very simple sentences. The huge amount of text is often ignored. At my project Factolex.com, we try to use relatively simple methods for extracting facts from these long texts.

    Also problematic for programmatic approaches into leveraging the knowledge is the combination of terms onto a single page, for example a name of a character in a movie redirects to the page of the movie, because somewhere within the page there are more details about him or her. This causes problems.

    The power of the freeform wiki is what defines Wikipedia. People can express themselves in powerful and human ways, as the focus is on making the experience better for people. I think other companies will make knowledge accessible for computers, as I don't see an urge from the Wikipedia community to revamp things. They have to deal with quite an amount of problems already.

    Posted by: Alexander Kirk | February 23, 2009 4:19 PM



  3. Alexander, that's a very well informed and thoughtful comment - thanks.

     Posted by: Marshall Kirkpatrick Author Profile Page | February 23, 2009 4:24 PM



  4. A decline in editing participation does not (yet) have the direct causal effect of a slowing in content generation, if you define content generation as growth in number of articles.

    With article creation still happening at a steady rate of thousands per month, and our three millionth article appearing very soon, content creation on Wikipedia isn't slowing down at all. In other words, there is still a huge amount of work being done regularly, it's just that less people are doing that work.

    This is a very forward-thinking post, and I agree that even one developer properly leveraging structured data from Wikipedia would be a dream come true. I just don't think that a decline in editing and article creation is necessary for Wikipedia's advent as a development platform. The explosive growth at Twitter certainly hasn't stopped people using the API in fascinating ways (i.e. more than just clients).

    Posted by: Steven Walling | February 23, 2009 4:47 PM



  5. I agree that there is a huge potential to build applications on top of the amazing work the the Wikipedia community continues to do. In fact, much of what you've suggested is already possible to with Freebase data.

    For example, using Metaweb's application development platform (Acre), I was able to build a simple mash-up called Freebase Sets that leverages Wikipedia data to emulate some of the features of Google Sets.

    Alexander's point about Wikipedia community being focused on making the experience better for people versus computers is really important. What's good for programmers is not always good for the community.

    Posted by: Shawn Simister | February 23, 2009 6:20 PM



  6. MyWikiBiz.com is making an attempt at semantic data structures (using Semantic Mediawiki, which features the XML and RDF compatible architecture).

    So within that directory, there are pages detailing medieval philosophers (with sortable or query-searchable data for the "year flourished") or listing commercial airliner disasters (with same sortable or searchable data about distance traveled, number of survivors, etc.)

    MyWikiBiz has caught on enough, that it is the fourth-largest site deploying Semantic Mediawiki... nearly 39,000 pages now. The business model is workable, too -- the site takes revenues from footer ads, while the contributors take revenues from ads they may embed on their own Directory pages. It's worth checking out.

    Posted by: Gregory Kohs | February 23, 2009 6:57 PM



  7. There is no formal Wikipedia Application Programming Interface (API) but the data there is relatively accesible anyway

    How about http://en.wikipedia.org/w/api.php

    Posted by: Mathias Schindler | February 24, 2009 2:52 AM



  8. Thanks

    Posted by: izmir matbaa | February 24, 2009 4:34 AM



  9. At Webzzle, we try to make a good use of Wikipedia to build a Web Explorer in 1-Click.
    We try to change the paradigm about Semantic Web.

    Webzzle enables you to explore the knowledge Web in 1-Click and get Webzzle & Wikipedia quality results followed by Google enhanced results.

    From all knowledge from Wikipedia, or more generally from any other information contained at a specified Web address, you can find, in 1-Click, similar information on other Web sites.

    Posted by: Vaucois | February 24, 2009 5:22 AM



  10. http://www.webzzle.com for the english version
    http://www.webzzle.fr for the french version

    Try the plugin (still in the sandbox of Firefox) :
    https://addons.mozilla.org/en-US/firefox/addon/9929

    Posted by: Webzzle | February 24, 2009 5:27 AM



  11. Anytime content is opened up to developers, good things are going to happen. If Wikipedia were to allow its content to be repurposed around the web, I could see this content being appended to news articles and blog posts.

    Posted by: Adam | February 24, 2009 6:12 AM



  12. Since Wikipedia already makes their data freely available for parsing, it will almost certainly be done in multiple competing and/or complimentary ways. But having an API come from within Wikipedia would have the benefit of promoting the creation of more semantically parseable content within the site among contributors and editors. Over a few years, that could make the existing content set more valuable in an API by a significant factor.

    Posted by: Mat Wiseley | February 24, 2009 6:48 AM



  13. Marshall -- you wrote "This is the kind of thing that Metaweb and Powerset have tried to do in the past, as well. Powerset was absorbed by the Borg, and we're hearing rumors that things aren't going well at Metaweb."

    Since I'm starting to invest time into working with Metaweb's Freebase, I'd like to hear more about what makes you think that things are not going well at Metaweb.

    Posted by: Raymond Yee | February 24, 2009 7:29 AM



  14. "The already small number of active regular editors"

    According to Special:Statistics, there are around 166,000 active (registered) editors currently. This is not exactly a small number, even if you take into account that "active" means only "at least one edit or action a month". :)

    "There is no formal Wikipedia Application Programming Interface (API)"

    Yes, there is an API, and it's available here. Sure, it could be improved, but it's a solid base that replicates most of the basic functionality.

    --

    Ultimately, I echo Steven Walling's comment above to some degree: slowing of content creation isn't necessary for application development.

    What I'd really like to point out is rather that an active community is beneficial to a development base: they iron out problems, inconsistencies, and other issues that make development easier. You would not believe the number of times that I have fixed "hatnotes", the "other uses" notes at the tops of articles or sections, just to standardize them to a basic, templated form that uses CSS classes rather than direct italicization. If someone out there were to try to leverage hatnotes, it would be much easier for them if all hatnotes were of the same general form, rather than the hodgepodge of different syntax that might be added manually. A community can facilitate development through standardization and maintenance.

    I question, rather, whether the reverse is true. Could use of Wikipedia as a development platform improve or increase community participation?

    Posted by: Nihiltres | February 24, 2009 7:39 AM



  15. Nice post.
    PHP is used mainly in server-side scripting, but can be used from a command line interface or in standalone graphical applications. Textual User Interfaces can also be created using PHP.

    Posted by: Perception System - Taufik | July 27, 2009 2:39 AM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  

If you think Twitter is big, check out the Real-Time Web
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS



RWW PARTNERS