linked data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/linked data en Copyright 2012 Richard MacManus readwriteweb@gmail.com Wed, 15 Feb 2012 10:45:03 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss New York Times Longitude: Linked Data + Location Earlier this month the New York Times launched a beta testing playground called Beta620. It's a site for the news organization to try out new web experiments, some of which may graduate to become full-fledged New York Times products.

An interesting Semantic Web experiment went live this week, called Longitude. As the name suggests, it presents a geographical interface for accessing content from The Times. It uses The Time's large store of metadata, along with Linked Open Data from the Web.

]]> Longitude displays a set of "Times T" pins plotted out in a Google Map. According to an explanatory blog post, the locations for these pins were all derived from Geonames - a worldwide geographical database. Clicking on a pin pops up a balloon containing ten recent Times articles relevant to that location.

Additionally, some locations have one or two additional tabs: "Natives" and/or "Companies." Clicking on those tabs presents you with list of locally-born people and locally-headquartered organizations.

It's a relatively small project, but this type of functionality may become a part of your future news reading experience. For a national (indeed, international) publication like The Times, it's often interesting to see what stories local to you have been published. Also which local people and companies have been in the news recently.

It's encouraging to see Linked Data continuing its push into commercial areas like this.

Disclosure: ReadWriteWeb is a syndication parter of the NYTimes.

]]> Discuss]]>
http://www.readwriteweb.com/archives/new_york_times_longitude.php http://www.readwriteweb.com/archives/new_york_times_longitude.php Semantic Web Tue, 30 Aug 2011 22:03:02 -0800 Richard MacManus
Whatever Happened To... Google Base The dream of creating a structured database of the world's content has eluded Google's grasp. But at least it got a bunch of useful retail data out of it.

The Web world is nothing if not Darwinian: it's survival of the fittest and products need to evolve with the times. Some Web products fly and some don't. Those that don't fly either die out, or evolve into something new. The latter is what happened to Google Base, which in 2011 is a shadow of its former self - and is even about to lose its API. It did however spin off a more successful offering, in the form of the Google Merchant Center for retailers. In this post, we look back on the initial vision for Google Base and then analyze what it actually evolved into.

]]> How Google Base Started Out in 2005

For those of you whose memories don't go back as far as Web 2.0, Google Base is an online database that launched in November 2005. At that time it enabled you to upload any type of information, along with "attributes" that described it further. Early examples of Google Base content included reviews, events, products and jobs. Some - but far from all - of the data was then surfaced in Google properties like search, Froogle and Local.


A snapshot of Google Base homepage, circa January 2006 (source: Wayback Machine)

Despite a clunky user interface and initial mixed reviews, Google Base was a promising product in late 2005 and early 2006 because of its potential to better structure Web content. At the time Google appeared to be aiming to use the product to improve its main search engine, but also to create other Google properties in verticals like e-commerce and classified ads. So there was a lot of talk at the time about it challenging eBay and Craigslist.

This is how I summarized Google Base back in November 2005:

Google Base may not be pretty to look at and it may be a centralized database, but the potential is there to turn it into a hugely valuable directory of structured content. Plus if they add APIs and start aggregating outside RSS feeds, then they could easily extend Google Base and remove the issues around it being a 'walled garden'.

In about April 2006, Google began to expand the number of vertical niches it covered in Google Base. The product was expanding, but how good was the data it was receiving and was there any sign of it being productively used?

Google Base in 2011 & its More Successful Spin-off

Fast forward five years to April 2011 and Google Base still exists (and incidentally still labeled a "beta" product!), but it's been scaled back in two big ways.

Firstly, commercial product data has been separated out into a new Google property called Google Merchant Center, which launched in September 2009. Google Base is now restricted to "non-product data." Data in the Merchant Center is being used in Google Commerce Search, the third generation of which was announced last week. As the name suggests, Google Commerce Search is a search engine for retail - a market which Google obviously sees huge potential in.

The other big difference between the 2011 Google Base and its 2005/2006 version: there's no Web frontend for people to search the data. Whether Google ever even wanted to make Google Base into a long-term frontend product is debatable, although what's not debatable is that it was a user interface mess from the start.

What Happened to the Structured Content Dream?

Looking at how Google Base evolved, one has to ask: was retail data the only valuable thing that came out of Google Base? Did the grand dream of all types of structured content never pan out?

The main problem with structured content has always been that you can't rely on people to upload reliable data for a variety of reasons ranging from motivation to accuracy. As I wisecracked back in 2005, "Google could in fact be building the world's largest database of structured shite."

That's not to say that structured content isn't alive and well. But nowadays the main initiative around this is the W3C's Linked Data, which is getting good uptake from governments and commercial companies alike.

Google is obviously still innovating with new types of data, structured and otherwise, but it no longer views Google Base as a good source of that data. Or at least, not good enough to actively foster - because it's about to deprecate the Google Base API.

Google Base to Lose its API

In December last year, Google announced that it is deprecating the Google Base Data API and replacing it with two new APIs: the Content API for Shopping and the Search API for Shopping. The Google Base Data API will be "fully retired" on June 1, 2011.

What this means for Google Base is unclear, but it's obvious that Google is far more interested in developing the shopping spin-off, Merchant Center. It appears to be giving up on actively developing its database for other types of structured content. In an FAQ about the API deprecation, Google stated:

"The new Shopping APIs won't support use cases that require uploading & searching data for non-shopping applications like Real Estate, Events, Jobs, Activities and other custom structured content applications."

While Google Base lives on, it's a much diluted product. Retail data was clearly the only valuable user-generated content that Google got out of it. Let us know in the comments whether you've used Google Base, in the past or now. What are your thoughts on its evolution and future?

Photo credit: Wikipedia

]]> Discuss]]>
http://www.readwriteweb.com/archives/whatever_happened_to_google_base.php http://www.readwriteweb.com/archives/whatever_happened_to_google_base.php Google Tue, 05 Apr 2011 22:00:33 -0800 Richard MacManus
Augmented Reality Field Trips & the 150th Anniversary of the U.S. Civil War augmentedreality_scope.jpgApril 2011 will mark the 150th anniversary of the first hostilities of U.S. Civil War, and museums, municipalities, and historic sites are making their preparations for the events and exhibits to commemorate it. And while, no doubt, times are tough for funding cultural heritage projects, there's a lot of excitement and momentum building around the sesquicentennial, making it a great opportunity for those exploring how technology can make history more interactive.

"A more valuable field trip" - that's the argument that Pennsylvania high school social studies teacher Jeff Mummert makes, pointing to the increasing accessibility of both mobile and augmented reality technologies as ways to "offer deeply interactive projects for students and the general public."

To that end, Mummert has created the Civil War Augmented Reality Project (which recently evolved to become HistoriQuest). Aimed at giving both students and the general public a richer experience, the Civil War Augmented Reality Project wants to build apps that will use augmented reality to connect primary documents and photographs to local historic points of interest.

]]> Knocking Down the Museum Walls with Mobile AR

cwarproject.jpgThe Civil War's sesquicentennial provides both challenges and opportunities for many local historic sites. It's estimated, for example, that Gettysburg, Pennsylvania will receive some 3 to 4 million visitors in 2013, the 150th anniversary of the battle and of Lincoln's famous address. How can mobile technology and AR provide better, smarter, more active experiences - inside and outside the museum walls? How can building localized apps encourage the public to do more than just walk through a battlefield or a visitors' center?

Mummert walked me through one app under development: a body of an identified Union soldier was found in the town of Gettysburg on one of the first days of the invasion in 1863. At the spo where the body was found, the mobile app triggers a CSI investigation, of sorts, where Gettysburg visitors can follow clues (a photograph of a wife and child found on the body) through various points of interest in the town: to the churches that served as hospitals during the battle, to the David Wills House - now a museum, and the site where President Lincoln stayed the night before he gave the Gettysburg Address - to battlefield site and the Gettysburg National Cemetery, and eventually to the soldier's grave-site.

sharp.jpg

The Sesquicentennial: The Opportunities for Mobile, AR, Linked Data

Mummert's Civil War Augmented Reality Project is one of many efforts underway to commemorate the 150th anniversary through technology. The Civil War Data 150 Project is one example - a partnership that aims to support and connect linked data across local, state, and federal institutions so that information can be found and utilized, no matter the collection, the archives, or the library in which it's housed. The Civil War Data 150 Project will help pull together the open data upon which developers can build the sorts of apps that Mummert and others envision.

Although a fundraising effort on Kickstarter last summer was unsuccessful, Mummert is moving forward with his plans for the Civil War Augmented Reality Project. He believes the 150th anniversary of the Civil War will be an important moment for historians, educators, archivists, and technologists. It's a nice round number to build a celebration upon, of course. But just as importantly, Mummert argues, we're at a key moment in the adoption of mobile and augmented reality technologies, a new way to help invite and engage the public and students in a more engaging and interactive experience with Civil War history.

]]> Discuss]]>
http://www.readwriteweb.com/archives/augmented_reality_field_trips_the_150th_anniversar.php http://www.readwriteweb.com/archives/augmented_reality_field_trips_the_150th_anniversar.php Augmented Reality Sun, 20 Feb 2011 16:10:00 -0800 Audrey Watters
Top 10 Semantic Web Products of 2010 Every year ReadWriteWeb selects the top 10 products or developments across a range of categories. We kick off the 2010 'Best Of' series with our selection of the top 10 Semantic Web products and implementations of the year.

This year we've chosen 5 products by semantically charged startups and 5 implementations by large organizations. The startups represent the cutting edge of Semantic Web. Each has made an impact on the Internet this year, with user growth and innovation. The organizations we've selected - which include Facebook, Google and the BBC - offered the best examples of large scale deployment of semantic technology.

]]> A note on terminology: we are using 'Semantic Web' and 'Semantic technology' somewhat interchangeably, although many people believe that the term Semantic Web (upper case) should only be applied to W3C-approved technologies such as RDF and SPARQL. The fact is that a good portion of our top 10 use technologies that are either not approved by the W3C (the Web's governing body, led by Sir Tim Berners-Lee), or they've been tweaked in some way - for example, Facebook's use of RDFa. So we've chosen to use the term 'Semantic Web' in its broader, more inclusive, sense. In a nutshell, these are products that add meaning and context to data.

Here then is our list of the top 10 Semantic Web products or implementations of 2010 (in no particular order).

Freebase

googlemetaweb_jul10.jpgIn July Google acquired one of the leading Semantic Web companies, Metaweb. Metaweb runs Freebase, an open, semantically marked up database of information. It looks similar to Wikipedia, but Freebase is all about structured data and what you can do with it.

Google already had a relationship with Freebase, pulling in its information to provide intelligent search results within Google News. With the acquisition of Metaweb, Google can now leverage the company's tools and data even further, especially within basic Web search results.

Freebase was one of our top 10 Semantic Web products last year and being acquired by Google validates its potential.

GetGlue

This year was a turning point for GetGlue, the service where users "check in" to watching TV shows, reading books, listening to music and more. Last November, GetGlue changed its branding and launched a new website. It changed almost overnight from a geeky browser add-on called Blue Organizer to a destination website called GetGlue. Mobile applications followed soon after, enabling its users to interact with GetGlue while watching TV or at an entertainment venue.

The changes have been good for GetGlue. It's experienced strong growth this year, reaching over 600,000 users by the end of September.

Disclosure: GetGlue's founder and CEO, Alex Iskold, used to be a regular contributer to RWW.

Flipboard

The launch of the iPad in 2010 triggered a new round of innovation in the startup community. Few startups utilized the touchscreen UI to create a unique user experience more than Flipboard, a magazine reading application built specifically for the iPad.

It turns out that Flipboard isn't just a pretty face, it's also using Semantic technologies.

In July, Flipboard acquired semantic technology startup Ellerdale, whose intelligent data-parsing algorithms had previously been used to create a real-time search engine and trends tracker. Ellerdale's technology was used by Flipboard to design a more personalized real-time experience - determining what social updates are important to you and presenting them in its now familiar magazine-like format.

Hunch

Hunch started out as a Q&A service, but in August it re-positioned as a personalization service. It's a recommendation engine that shows you movies you want to see, books you want to read, vacation destinations you want to go to, and much more. The company is on a mission to "map every person on the Internet to every object on the Internet, be that a product, a service, or a person."

Co-founder Caterina Fake told us in October that Hunch uses a decision tree model, as an alternative to search, to provide more personalized information to users.

Apture

Apture is a semantic contextual search service which continues to iterate strongly (it made our top 10 list last year, too). In August, Apture launched Apture Highlights, a plug-in that allows you to dive deep into any topic you discover on almost any page around the web.

When we first noticed Apture several years ago, it was a service that required publishers to load up linked pop-up widgets with multimedia of their own choosing. The company removed that barrier to entry with its August release. Everything is now automated and it's available almost everywhere. Indeed we liked it so much, we started using Apture on ReadWriteWeb (there is no commercial relationship, we just think the product adds to our site's user experience).

Next Page: Top 5 big organization implementations of Semantic Web technology. Featuring Facebook's Open Graph, Google's semantic search, and more...

Facebook

Arguably the biggest Semantic Web news of the year came in April, when Facebook announced a large-scale new platform called the Open Graph. The stated goal of the Open Graph protocol was to enable publishers to "integrate [their] Web pages into the social graph." Essentially, each web page can now become an 'object' in Facebook's social graph (which is Facebook's term for how people connect to each other in its network). This means that pages can be referenced and connected across social network user profiles, blog posts, search results, Facebook's News Feed, and more.

The Open Graph is a wide-ranging platform which includes features such as 'Like' buttons and publisher plug-ins. It also includes a simple, RDF-based markup. This requires publishers to include at least 4 metadata properties in each object: title, type, image, URL. There are a few additional properties which may be optionally added, such as site_name and description.

See also: Facebook Open Graph: The Definitive Guide For Publishers, Users and Competitors

Google Squared

The holy grail in web search technology is to be able to ask a simple question, in natural language, and get a simple answer. In May, Google announced that Google Squared was coming to its search results. Google Squared, which launched in 2009, adds additional information to search results.

The functionality was added to Google's traditional search results in two ways. Firstly, simple queries such as Catherine Zeta-Jones' date of birth elicited useful data within the search results:

squared-example-result.png

By clicking "show sources" on the Squared-provided result, a list of sources appears showing you how Google arrived at this answer.

Secondly, Google Squared is being used to provide a new feature in Google's sidebar (another innovation by the search giant in 2010): "Something different". This feature provides a list of related searches that may be of interest, determined by looking at your current search term.

This year Google also reported strong growth in its Rich Snippets feature, which adds extra information to Google search results too - in this case, data like review ratings.

Best Buy

One of the themes of 2010 was the increasing usage of Semantic Web technologies by large commercial companies like Facebook and Google. Leading U.S. retailer, Best Buy, was another large company to impress in 2010 with its adoption of semantic technologies. Specifically, Best Buy used a Semantic Web markup language called RDFa to add semantics to its webpages.

Jay Myers, Lead Web Development Engineer at BestBuy.com, told ReadWriteWeb in an interview earlier this year that the primary goal of using semantic technologies was to increase the visibility of Best Buy's products and services. With data such as store name, address, store hours and GEO data being marked up using RDFa, search engines are now able to identify each of those data components more easily and put them into context. The use of semantic technology, Myers told us, led to increased traffic and better service to its customers.

Data.gov.uk

In January, Data.gov.uk launched to make non-personal data held by the U.K. government available for software developers. It arrived six months after the U.S. government launched its Data.gov site, but from the start the U.K. site had more than three times as much data. At launch, Data.gov.uk had nearly 3,000 data sets available for developers to build mashups with. By the end of the year, that had increased to over 4,600.

Data.gov.uk was one of the highlights of the year in Linked Data, which is when organizations or governments upload data to the Web in a format enabling it to be re-used and built on. Linked Data is a subset of the wider Semantic Web movement.

See also: The State of Linked Data in 2010

BBC World Cup Website

The biggest sporting event of the year was the soccer World Cup, which was widely covered in the media. The BBC World Cup 2010 website used "dynamic semantic publishing" technology to enhance its daily World Cup reporting.

The site featured over 700 webpages and was powered by a semantic publishing framework. It boasted a comprehensive ontology (a map of concepts), that output "automated metadata-driven web pages" created on-the-fly. It was an impressive demonstration of how a large, mainstream website can add meaning and structure.

There you have it, ReadWriteWeb's selection of the top 10 Semantic Web products and implementations of 2010! Let us know in the comments whether you agree or not with our top 10.

]]> Discuss]]>
http://www.readwriteweb.com/archives/top_10_semantic_web_products_of_2010.php http://www.readwriteweb.com/archives/top_10_semantic_web_products_of_2010.php 2010 in Review Wed, 29 Dec 2010 15:17:00 -0800 Richard MacManus
LookBackMaps - Building a Location-Based Time Machine LookBackMaps_logo.jpgHumans' relationship with "place" is a lot richer and more complex than simply "checking in" for mayorships, badges, or free Gap jeans. All the buzz about location-based apps fails to address why we deem some places sacred, why we put up historical markers, and why we seek to preserve certain historical sites. Sure, the growth of location-based social networks points to the importance of "who goes there." But our curiosity about "who was there" extends back farther in time, I'd contend, than simply which people Facebook or Foursquare may list as recent visitors.

And as our scrapbooks and our archives show, the photographs we take do not serve simply to mark who we are with, but where we have been. Photographs serve to capture time and place, and they foster personal memories and provide historical records of how places have changed.

]]> What Was Here Before Us?

stanford.jpgLookBackMaps offers a way to link the history and the present of a particular location. By geo-coding historical photographs, the project allows us to easily see what a place used to be like when we visit it today.

LookBackMaps offers both web-based maps, as well as an iPhone app. And the latter boasts an augmented reality feature that overlays a historical photo on top of the view from your camera, giving you a glimpse of city streets and buildings as they were in the past. The app will guide you to the site of a particular photo, so through your camera you can see "then" and "now" simultaneously.

Even better, perhaps, it also allows you to snap your own photos with this view, so you can, in effect, place yourself in a photo of turn-of-the-century San Francisco or in a photo of a Civil War regiment.

LookBackMaps, Linked Data and the Civil War

iphone_lookbackmaps.pngAs a San Francisco resident, LookBackMaps founder Jon Voss has understandably done a lot of work to organize photos from that city. But LookBackMaps's latest project has a much larger scope, in terms of regions and history and in terms of cultural preservation and linked data.

To mark the 150th anniversary of the U.S. Civil War in 2011, LookBackMaps has launched the Civil War Data 150 Project. In partnership with the Archives of Michigan, the Internet Archive, Freebase, and the University of Richmond Digital Scholarship Lab, the project aims to share and connect Civil War data across local, state, and federal institutions.

By utilizing Linked Open Data, the project will help connect the data in such a way that it can be found, no matter the collection, the archives, or the library in which it's housed. And the project also provides an opportunity for citizen-archivists, of sorts, to go through and help identify and tag photos - be they of battlefields, regiments, or soldiers.

If part of the purpose of LookBackMaps is to help find the connections between places and photographs, the Civil War Data 150 Project has a far grander goal about location, history, and our cultural heritage - one that demonstrates a failure of imagination in those that think that the "location" battle has been won by any Internet company.

Photo credits: California Historical Society

]]> Discuss]]>
http://www.readwriteweb.com/archives/lookbackmaps_-_building_a_location-based_time_mach.php http://www.readwriteweb.com/archives/lookbackmaps_-_building_a_location-based_time_mach.php Location Sat, 06 Nov 2010 12:50:06 -0800 Audrey Watters
SPARQLZ Shines as a Vision for Linked Data Made Easy sparqlzlogoSPARQLZ is a stealth technology project aimed to provide a graphical user interface for everyday users to assemble, edit, share and mash-up modular, persistent, real-time searches across the web of Linked Data. It's a side project by an independent team within a large data corporation, with dreams of spinning their work off as a startup.

It's a pretty hot idea: it's like Yahoo Pipes, for Linked Data - but easier to use and already populated with big sets of valuable information to mashup and parse. Linked Data is a growing field of datasets that are categorized with standardized markup, tied together and easily cross referencable by machines. The US and UK governments, news organizations, music data bases, social networks and other organizations are participating in the official W3C Linked Data community. Now SPARQLZ aims to make all that data easy to construct future-facing search queries for.

]]> sparqlz2

SPARQLZ is named after the SPARQL query language for structured data. The service also uses technologies like Yahoo Query Language and real-time push format PubSubHubbub. Search results can be delivered to SMS, Email, Webhooks, a Feed Reader or other SPARQLs.

Linked Data is growing fast and has incredible potential as a development platform. Sir Tim Berners-Lee, widely credited as the key inventor of the World Wide Web, is now focused on the growth of Linked Data as the web's next step.
On top of that technical stuff is an attempt to make things easy. You want to know when a 3 bedroom house goes on sale in South East Portland, Oregon for under $250,000? SPARQLZ says it will make setting up sophisticated alerts like that easy. You want to know when a house like that goes on sale anywhere in the country where there's a high concentration of outdoor sports enthusiasts living and the temperature is within a range you're comfortable in? Just snap some SPARQLs together and you can set up a search and alert for things like that, thanks to the availability of structured, linked data from government and private sources.

How would I use a service like this? Just imagine stringing if/then statements together through the cloud of Linked Data (see below). As a technology publisher, I'd like to receive notification if and when SPARQLZ finds a photo (from Flickr or CORDIS) of a person from a known tech company (defined as the companies listed in Crunchbase), that's headquartered in a country with a lower than world-median GNP (per the CIA World Fact Book). Could SPARQLZ do that? In theory, it could do things like that all day long - and you and I could trade queries like that back and forth like legos to piece together whatever kind of stream we were looking for. It's a really exciting vision.

sparqlz2

Linked Data is growing fast and has incredible potential as a development platform. Sir Tim Berners-Lee, widely credited as the key inventor of the World Wide Web, is now focused on the growth of Linked Data as the web's next step.

But in order to get beyond the borders of Wonky-land, Linked Data needs a good User Interface. For many people, the ability to set up dynamic queries, mix and match them and have them deliver alerts to various devices could scratch an itch that many of us didn't know we had.

Ok, so let's be honest. This is probably not going to be a mainstream phenomenon. But are there thousands, tens of thousands or maybe a larger number of people who could create value for themselves using a query construction and publishing model like this? Who could not or could not so easily create that kind of value on top of Linked Data today? I think there are. Maybe there are even millions of people who could capture some of the latent value in Linked Data thanks to a tool like this.

SPARQLZ is certainly intended as a way to democratize creation and use of something rich with value but previously too technically inaccessible for many people to use who might like to. In that it's of the same vein as Blogger and WordPress were to text publishing, as YouTube is to video publishing and Twitter and Facebook are to social activity feeds.

Might this be the project that makes Linked Data hacking something that far more of us can engage in? That would be great, but first the team will need to gather support and launch itself as a company. For the sake not just of this small team of data-loving dreamers deep inside a big corporation - but for the sake of all us data-loving dreamers who would love to use their tool, I hope they can do it.

sparqlz3
Below: The Options, A Picture of the Linked Data Cloud

The latest version of the Linking Open Data dataset cloud, as at July 2009, maintained by Richard Cyganiak and Anja Jentzsch.

]]> Discuss]]>
http://www.readwriteweb.com/archives/sparqlz.php http://www.readwriteweb.com/archives/sparqlz.php Data Services Wed, 11 Aug 2010 14:31:07 -0800 Marshall Kirkpatrick
The Modigliani Test for Linked Data: Results In a recent post, I outlined a kind of layman's test for the Semantic Web. I wrote that the tipping point for the Semantic Web may be when anyone can query a set of data about a historical figure and get a long list of structured results in return. I called this 'The Modigliani Test,' after my favorite artist Amedeo Modigliani. To pass this test, you must deliver - using Linked Data - a comprehensive list of locations of original Modigliani art works around the world.

A developer named Atanas Kiryakov gave the test a good crack. In doing so, he illustrated the core issues facing the Semantic Web currently.

]]> The challenge of this test is that there isn't currently enough linked data on the Web about Modigliani. Also the key data in this test is the locations of art works, which probably isn't one of the main data fields for art data when it's uploaded to the Web (artist name and art work title would be the two key data fields).

Kiryakov wasn't the only person who attempted to pass the test; and in fact his results mirror what can be found already on the popular open database Freebase. However Kiryakov, who is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he faced.

The Current State of Linked Data Queries

The result of Kiryakov's attempt is a relatively short list of locations of Modigliani paintings around the world. He admits that the list isn't long enough, but says that it's the closest he could get - not just because of the limited amount of data in the Linked Data Web, but because it's "hard to query and use today."

Essentially Kiryakov created code to query a few known Linked Data sets, with custom manipulations to output location data. This is what he came up with:

PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>

PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
  ?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
     fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
     ot:preferredLabel ?painting_l.
     ?ow ot:preferredLabel ?owner_l .
  OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
  OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
  OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}

That query was executed in a tool called LDSR, a "Linked Data Semantic Repository" created by Kiryakov's company Ontotext. He calls LDSR a "search engine for part of the linked data web." Ontotext's LDSR includes data from existing Linked Data repositories such as DBPedia, Freebase, Geonames, UMBEL and Wordnet.

Here is a screenshot of Atanas Kiryakov's attempt to pass the Modigliani Test. He spent over an hour formulating the code used to generate this result.

As you can see, the resulting list was just 8 items long and most of the locations are in major U.S. cities. This falls well short of a comprehensive list of Modigliani art work locations. For example, there's no data about Modigliani paintings in Europe - where Modigliani lived all his life.

Other Sources of Modiglidata

Kiryakov wrote that most of the data returned in the Modigliani example came from Freebase. Indeed, as RWW commenter Brian Karlak pointed out in our original post, you can get much the same result within Freebase itself. Another commenter, Michael, pointed to a non-technical results page. Kiryakov's result has a little more data, but not much more.

However the point of Kiryakov's attempt and blog post was to point out the difficulty of passing the Modigliani Test right now. He noted that "getting useful information from LOD [Linked Open Data] quite often requires a lot of efforts to analyze and post-process them in order to get reasonable answers to structured queries." In other words, it's much more than just inputting a natural language query (note that the Freebase example was provided by a user there named masouras, so it's not something an average user could do).

I should also mention that in the comments to the previous post, Bruce Wayne pointed to his company Factoetum's effort to pass the test - which had 7 results, including some different ones to Ontotext/Freebase. Like Kiryakov, Wayne noted that it's "nearly impossible" for non technical people to use the current solutions.

Finally, to address an issue that some commenters raised in the previous post: yes it would be possible to pass the Modigliani Test with some manual human effort to track down location data. But that's cheating - we want to see this done using Linked Data. And not just for Modigliani works, but for any other artist.

Much Work to Be Done

Atanas Kiryakov concluded that "there is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge."

While that's true, I thank Atanas for giving the Modigliani Test a crack. At least now I know to visit the Museum of Modern Art when I next go to New York!

Let us know your thoughts on the Modigliani Test in the comments. Or perhaps you're a developer willing to take on this challenge?

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php Structured Data Mon, 26 Apr 2010 01:56:34 -0800 Richard MacManus
The Modigliani Test: The Semantic Web's Tipping Point In our recent posts about Structured Data, we've emphasized that most of the current initiatives have been around uploading new data to the Web - whatever the format. The U.S. and U.K. governments have led the way with their 'open data' websites, but much of that data isn't 'linked' yet. In other words, it's online - but siloed. So how do we get to the next stage of the Semantic Web, linking disparate data sets together so that people can begin to use that data?

The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return. I've decided to term this 'The Modigliani Test.'

]]> Amedeo Modigliani is one of my favorite artists. He was moderately famous during the early 20th century and has something of a cult following nowadays. But he's not Da Vinci or Picasso famous. What I'd like to do in a Semantic Web is type the following query into a search engine and get back a large list of results: tell me the locations of all the original paintings of Modigliani.

As of today, there's no place to type that query in and get a list of structured data. The closest I can find to doing that is the Artcyclopedia entry for Modigliani, which has a list of locations for Modigliani artworks. It's great that they have the location data listed on one web page. However it's not structured data, so we can't query it. There's also not much order to the data, we have no idea if this is a comprehensive list, it's not verified data, and so on.

In summary, there's a lot of data on the Web about the location of original art works - but much of it is in traditional 'document' web pages. What we're after is a giant database of art works, which anybody can query and re-use.

Here's an early, overly geeky view at what a Linked Data of painting locations would look like (hat-tip @dakoller):

The above is a far from comprehensive list of art works by Hieronymus Bosch (a search for Modigliani, by the way, brought up zero results). Plus of course we need a much more intuitive UI, so that non-geeks can use it too.

What do you think, when will The Modigliani Test be passed on the Web?

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_modigliani_test_semantic_web_tipping_point.php http://www.readwriteweb.com/archives/the_modigliani_test_semantic_web_tipping_point.php Structured Data Fri, 16 Apr 2010 00:06:00 -0800 Richard MacManus
10 Ideas For Web of Data Apps At the end of last week, we posted an open thread asking what application you'd build (or would like someone else to build) using linked data or open data. The thread was inspired by Georgi Kobilarov. In this post, we list 10 of the best ideas we received.

A number of the suggested apps were for social good, for example apps for improving sustainability and finding missing persons. Other apps were more lifestyle-oriented, for example for cooking and genealogy. A few were business focused, such as a brand marketing app and a point-of-sale system. Of course a couple were just plain ol' geeky, which we love too! You can find all 10 ideas below.

]]> Firstly, a quick refresher course on the terminology. Linked data is data that has been uploaded to the Web and linked to other sources, but is not necessarily open for other developers to re-use. Often when people use the term "linked data," they mean data that has been uploaded in a structured format, for example RDF. Open data is data that has been uploaded to the Web and is freely available to use, but isn't necessarily linked to other data sources. The term "open data" is often used for unstructured data, for example CSV files (spreadsheets). The ideal, of course, is data that is both linked and open. We should note however that these definitions are not universally agreed on, but they're good enough for the purposes of this post.

Missing Persons

Juan Sequeda, co-founder of Semantic Web Austin, has an idea for using linked data "to integrate data from displaced populations, specifically in Colombia." He references a BBC report from September 2009, about using semantic Web technology to enable people to search currently incompatible databases of missing persons in Columbia.

Sustainability

Bernard Vatant from Guillestre, France, wants to see "the Web of Data enable people anywhere in the world to find out smart, sustainable and low-cost solutions to their local development issues." For example, success stories in farming, water supply, energy, education and health "in environments similar to mine, anywhere in the world."

In short, Bernard wants a linked data equivalent to WiserEarth - an online community for people interested in sustainability.

A Better World

Aldo Bucchi from Chile wants an app to tackle "negligence, corruption and lack of accountability." Specifically he mentioned a recent 8.8 magnitude earthquake in Chile, which resulted in hundreds of deaths. Aldo believes that some of those deaths were avoidable, because of what he claims was "corruption and malpractice in the construction business." He thinks that a Web of data would help identify such things, as well as help "rebuild the country faster and in a more agile manner [with] the "loose-coupled coordination" that is naturally derived from a shared data substrate and a single world view."

Genealogy App

Sherry Main from Orange County, California would like an app for genealogy. Wrote Sherry:

"It would be amazing to be able to map and locate where your family is from, has been, and what notable events happened. If there was an application on a mobile device that pinged you when you are within a particular radius of say, my great-grandmother's birthplace, as I walked around a town, that would make real-world experiences more meaningful [...] As photos become geo-tagged going forward, imagine being able to get a push notification that showed an important family or historical photo to you as you stood or walked by that location."

Cooking App

Bart Stevens wants to be able to "select a (difficult) recipe and submit this to a service." He wants the following information back:

1. Where can I find the ingredients.

2. Place an order/make a reservation (@bakery, butcher or fish shop) for certain ingredients.

3. A route (street) map, per store.

4. Maybe a payment system.

Point-of-Sale & Inventory System

Daniel O'Connor would like to see a point-of sale-system and inventory system, for example for a small office supplies store.

He beckons us to imagine this: "I receive a new product, scan the barcode of it. My system queries the web for the supplier name, product data, etc [...] recognizes the supplier and hits their URI for the product. It assimilates all of the recommended price information (ie: good relations); depictions and populates my system." You can read the full scenario in his extended comment.

Brand Marketing

John Davidson suggests that linked data can be used to assist brand marketers, specifically to find out more about their customers. He offers this example:

"A customer becomes a fan of a popular hair care brand on Facebook. She separately opted-in on the brand site to receive email alerts for new products, promotional offers that she can redeem in stores, etc. Are these distinct, separate events or are they somehow connected? By integrating these streams from the "Web of Data" the brand marketer can understand that she is an advocate for their brand. She also has several dozen or more friends she regularly interacts with in social channels. The marketer can engage her with special offers to promote their cool new products with friends in her network. The subsequent buzz and chatter sends friends to their stores to buy the new hair care products and the cycle repeats."

Research Assistant

A comment on Georgi's blog suggests an app to review literature. "Professor Aloha" wrote that he/she would create an application that could "take any research topic and backtrace (through articles, dissertations, presentations, and their accompanying reference lists) all published research articles on that topic, sorting them by year of publication, author, country of origin, journal and major findings."

Enriched People Profiles

Atif Latif from Austria would like to build an aggregator for all of the possible resources related to a person on the Web. The end result, said Atif, "will be [a] highly semantified and enriched profile of a person." Atif is working on this as we speak, with a beta app named CAF-SIAL. Good luck Atif!

In a separate comment, Kingsley Idehen of semantic Web company OpenLink Software mentioned "Verifiable Identity," noting that "all databases (including the Web of Linked Data) need verifiable identity."

Website-less websites

Nathan suggested a number of things, our favorite being "Website-less websites". Nathan wrote that "when all the data is typed and in a single format (let's say rdf) then the need for websites and webpages can completely be disposed off, rather we can view the information in an array of clients side applications each with there own benefits (like we do currently with twitter clients), The entire web can theoretically and quite easily just be one big API."

Bruce Wayne of Factoetum wrote in a separate comment that he is developing "services that will have in impact in bringing about a Website-less web." He gives an example of a list of book titles.

Those are 10 suggestions from the ReadWriteWeb community. Perhaps some enterprising entrepreneurs or developers will pick up a few of these ideas for their next startup!

]]> Discuss]]>
http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php Structured Data Thu, 15 Apr 2010 00:05:02 -0800 Richard MacManus
Open Thread: What Would You Build With a Web of Data? Recently we looked at the state of Linked Data in 2010, noting developments such as governments putting public data online and Thomson Reuters putting structure around commercial data using OpenCalais. In a follow-up post, we explained the distinction between Linked Data, Open Data and the Semantic Web.

Georgi Kobilarov, who runs a Linked Data startup from Germany called Uberblic Labs, recently issued an interesting challenge on his blog. He asked: if we had a Web of Data, what would you build? Not to steal Georgi's thunder, but we think this is a great question to put to ReadWriteWeb readers too.

]]> Here's Georgi's idea:

"If we had a Web of Data, I would built an application for painless travel planning. It would integrate flight plans, train timetables, bus routes, car rental offers, etc. And the user would be able to just say: I want to go from A to B: Find me the best/cheapest/fastest routes. [...] With a Web of Data, an application could do all that combining for me, the same way flight booking sites do that today for just flights."

Here's my idea for an app that uses the Web of Data. I'd like a web site or app that allows me to discover the locations of original art works by my favorite artists, and then create travel itineraries for me to see some or all of those art works (most famous artists have their art works scattered around the world, in various museums and galleries). It's possible that there is a web directory of artists somewhere that has some or even all of this data already, but if so I haven't found it.

I ask for this because every now and then I search the Web for a painting that I saw in a book. A recent example was a Modigliani painting that I was attempting to create a copy of, for my beginners acrylic painting class. The original painting was called "Portrait of Madame Hanka Zborowska." One of the results from Google told me that the original painting is located at the National Gallery of Modern Art, Rome, Italy.

I could potentially spend hours hunting down the locations of Modigliani's paintings, using Google - and it's likely that some of the data isn't currently online. So it would be great if I could query one web site or app: tell me where all the originals of Modigliani's paintings are in the world, and draw me an itinerary for visiting all or some of them. Heck, maybe even book my flights and hotels!

That's my example of what I'd build from a Web of Data. Now tell us what site or app you would like built, if the data was available on the Web.

]]> Discuss]]>
http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php Open Thread Thu, 08 Apr 2010 22:15:43 -0800 Richard MacManus
It's All Semantics: Open Data, Linked Data & The Semantic Web Yesterday we summarized some of the main developments in the Linked Data world over the past year. Linked Data is a W3C-backed movement that is all about connecting data sets across the Web. It can be viewed as a subset of the wider Semantic Web movement, which is about adding meaning to the Web. However, there is some confusion in the Semantic Web community about the crossover. To add to the confusion, there is a term called 'Open Data' that is being bandied around too. This commonly describes data that has been uploaded to the Web and is accessible to all, but isn't necessarily "linked" to other data sets.

So what's the beef with all of these terms? In this post we seek clarity!

]]> The Difference Between Open Data and Linked Data

In the discussion over yesterday's post, a few people tweeted that the U.K. government's public data website Data.gov.uk is mostly populated with 'Open Data' and not 'Linked Data.' But what does that mean? It means that much of the data on the site is available to the public, but it doesn't link to other data sources on the Web. It could be data that has been uploaded in CSV format (i.e. spreadsheet data), which Sir Tim Berners-Lee said in an interview with me last year is a common occurrence with government departments. Or it could be data in another non-Web format.


Screen from a Tim Berners-Lee presentation on Linked Data, circa 2008

Titti Cimmino put it nicely: Open Data is simply 'data on the web,' whereas Linked Data is a 'web of data.'

However, the idea of Open Data is to turn it into Linked Data. As John S. Erickson pointed out, the first priority of Data.gov.uk (and its U.S. counterpart) is to publish lots of Open Data. The next step is to work towards linking it all up. This is already starting to happen. Answering a question I posed on Twitter, Kingsley Idehen confirmed that Data.gov.uk is currently a combination of Open Data and Linked Data.

Linked Data and The Semantic Web

So may we then suggest that the idea of Linked Data is to turn it into a Semantic Web? Or are they the same thing already?

Lorna Campbell from the University of Strathclyde in Scotland tackled those and other questions in an excellent post earlier this month. She started by warning of the potential for another "holy war" about terminology. I won't delve into that in this post, however this excerpt from Campbell's post gives you a flavor of the terminology angst:

"Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case "linked data", or "linkable data", for data that uses other technologies."


Even Wikipedia can't define Semantic Web...

Campbell quotes from a number of other articles, in trying to come to a conclusion about how Linked Data and the Semantic Web relate. Perhaps the best definition she found was this one by Paul Walk:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked"

Why This Matters

So there you have it, Linked Data is NOT the same as the Semantic Web. It's also not necessarily open, in other words accessible to developers.

Whatever the definitions, the key points about all of Open Data, Linked Data and the Semantic Web, are:

  1. data is being uploaded to the Web that wasn't online before (e.g. much of the data on Data.gov.uk).
  2. structure is being added to the data using Linked Data and/or Semantic Web technologies.

The bottom line is that the more data we have on the Web that is linked and has defined meaning, the smarter our web applications will be. This is why these activities are so exciting, despite the terminology confusion!

Image credit: Semantic Web Rubik's Cube, dullhunk

]]> Discuss]]>
http://www.readwriteweb.com/archives/open_data_linked_data_semantic_web.php http://www.readwriteweb.com/archives/open_data_linked_data_semantic_web.php Structured Data Wed, 31 Mar 2010 23:00:00 -0800 Richard MacManus
The State of Linked Data in 2010 In May last year we wrote about the state of Linked Data, an official W3C project that aims to connect separate data sets on the Web. Linked Data is a subset of the wider Semantic Web movement, in which data on the Web is encoded with meaning using technologies such as RDF and OWL. The ultimate vision is that the Web will become much more structured, which opens up many possibilities for "smarter" Web applications.

At this stage last year, we noted that Linked Data was ramping up fast - evidenced by the increasing number of data sets on the Web as at March 2009. Fast forward a year and the Linked Data "cloud" has continued to expand. In this post we look at some of the developments in Linked Data over the past year.

]]> Governments Get on Board

The most high-profile usage of Linked Data over the past year has come from two governments: the United States and United Kingdom.

The U.S. was first to open up some of its non-personal data for use by developers, with the May 2009 launch of Data.gov. In January 2010, the U.K. government announced Data.gov.uk - with the help of Sir Tim Berners-Lee, the inventor of the World Wide Web. At launch, Data.gov.uk had nearly 3,000 data sets available for developers to build mashups with. At the time it was more than three times as much data than the U.S. site offered.

Following on from the launch of Data.gov.uk, U.K. Prime Minister Gordon Brown announced a new British Institute for Web Science along with $45 million in government backing. The Institute will be led by Berners-Lee and prominent researcher Nigel Shadbolt. This was great news for Linked Data, because according to Prime Minister Brown, the Institute "will help place the U.K. at the cutting edge of research on the Semantic Web and other emerging web and internet technologies."

Commercial Applications

There have been commercial success stories too, such as OpenCalais for media, MusicBrainz for music and GoodRelations for e-commerce. There are also many commercial sites tapping into the general knowledge data store at dbpedia.org.

However it's relatively early days for commercial applications of Linked Data. We're beginning to see smart people explore potential use cases, such as this list for news organizations, but much of the early implementation is being done by publicly funded entities such as the U.K.'s BBC.


The latest version of the Linking Open Data dataset cloud, as at July 2009, maintained by Richard Cyganiak and Anja Jentzsch.

Just Get The Data Up There

To reiterate, Linked Data is data that has been connected to other data sets using Semantic Web technologies such as RDF (Resource Description Framework) or RDFa (a simpler variation). Minus the acronyms, Linked Data is simply structured data.

However, one of the reasons the Semantic Web hasn't yet been widely adopted, at least commercially, is that it's often difficult or time consuming to mark up data semantically. RDF in particular has a reputation for being painful to code. With that in mind, the past year has been as much about prompting governments and organizations to put their data up on the Web in whatever form they can.

Indeed when I interviewed Berners-Lee last July, he told me that he'd be happy if governments "just put data up in whatever form it's available." He mentioned that "Comma separated values (CSV) files are remarkably popular." He'd be much more happier if it was semantically marked up data, using the likes of RDF, but conversion can happen after it's been uploaded to the Web.

So overall, Linked Data is still early in its adoption curve. However it's undeniably become a solid on-ramp to the wider Semantic Web and world of structured data.

For a good technical overview of the current state of Linked Data and the Semantic Web, see this presentation by Davide Palmisano.

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_state_of_linked_data_in_2010.php http://www.readwriteweb.com/archives/the_state_of_linked_data_in_2010.php Structured Data Wed, 31 Mar 2010 01:30:55 -0800 Richard MacManus
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the five biggest, most cutting-edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]>

Editor's note: This story is part of a series we call Redux, where we'll re-publish some of our best posts of 2009. As we look back at the year - and ahead to what next year holds - we think these are the stories that deserve a second glance. It's not just a best-of list, it's also a collection of posts that examine the fundamental issues that continue to shape the Web. We hope you enjoy reading them again and we look forward to bringing you more Web products and trends analysis in 2010. Happy holidays from Team ReadWriteWeb!

Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php 2009 Redux Sat, 26 Dec 2009 14:00:00 -0800 Richard MacManus
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the 5 biggest, most cutting edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]> Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php Trends Mon, 07 Sep 2009 05:30:00 -0800 Richard MacManus
ReadWriteWeb Interview With Tim Berners-Lee, Part 2: Search Engines, User Interfaces for Data, Wolfram Alpha, And More... In part 2 of my one-on-one interview with Tim Berners-Lee, we explore a variety of topics relating to Linked Data and the Semantic Web. If you missed it, in Part 1 of the interview we covered the emergence of Linked Data and how it is being used now even by governments.

In Part 2 we discuss: how previously reticent search engines like Google and Yahoo have begun to participate in the Semantic Web in 2009, user interfaces for browsing and using data, what Tim Berners-Lee thinks of new computational engine Wolfram Alpha, how e-commerce vendors are moving into the Linked Data world, and finally how the Internet of Things intersects with the Semantic Web.

]]> Semantic Web and Search Engines Like Google, Yahoo

RWW: You've been talking about the Semantic Web for many years now. Generally the view is that Semantic Web is great in theory, but we're still not seeing a large number of commercial web apps that use RDF (we've seen a number of scientific or academic ones). However we have begun to see some traction with RDFa (embedding RDF metadata into XHTML Web content), for example Google's Rich Snippets and Yahoo's SearchMonkey. Has the takeup of RDFa taken you by surprise?

TBL: Not really, but the takeup by the search engines is interesting. In a way I was happy to see that, it was a milestone for those things to come out of the search engines. The search engines had typically not been keen on the Semantic Web - maybe you could argue that their business is making order out of chaos, and they're actually happy with the chaos. And if you provide them with the order, they don't immediately see the use of it.

"The search engines have not been keen on the Semantic Web [...] their business is making order out of chaos, and they're actually happy with the chaos."

Also I think there was misunderstanding in the search engine industry that the Semantic Web meant metadata, and metadata meant keywords, and keywords don't work because people lie. Because traditionally in information retrieval systems, keywords haven't proven up to the task of finding stuff on the Web. One of the reasons is that people lie, the other is that they can't be bothered to enter keywords. So keywords have gotten a bad reputation, then metadata in general was tarred with this 'keywords don't work' brush. Because a lot of Semantic Web data included metadata, then people thought that with Semantic Web data -- again, that people will lie and won't have the time to produce it.


Google rich snippets example; image credit: Matt Cutts

Now I think there's a realization that when you're putting data online, that people are motivated NOT to lie. For example when your band is going to produce its next album, or when your band is going to play next downtown, you're motivated to put that information up there on the Semantic Web. There's an awful lot of cases when actually data is really important to people; and it's on the web anyway. So I think it's great that some of the search engine companies are starting to read RDFa.

Does this mean that they [search engines] will start to absorb the whole RDF data model? If they do, then they will be able to start pulling all of the linked data cloud in.

"The web of linked data and the web of documents actually connect in both directions, with links."

Will they know what to do with it? Because when it's data in a very organized form, I think some people have been misunderstanding the Semantic Web as being something that tries to make a better search engine - i.e. when you type something into a little box. But of course the great thing about the Semantic Web is that you can query it, you can ask a complicated query of the Semantic Web, like a SQL query (we call it a SPARQL query), and that's such a different thing to be able to do. It really doesn't compare to a search engine.

You've got search for text phrases on one side (which is a useful tool) and querying of the data on the other. I think that those things will connect together a lot.

So I think people will search using a search text engine, and find a webpage. On the front of the webpage they'll find a link to some data, then they'll browse with a data browser, then they'll find a pattern which is really interesting, then they'll make their data system go and find all the things which are like that pattern (which is actually doing a query, but they'll not realize it), then they'll be in data mode with tables and doing statistical analysis, and in that statistical analysis they'll find an interesting object which has a home page, and they'll click on that, and go to a homepage and be back on the Web again.

So the web of linked data and the web of documents actually connect in both directions, with links.

User Interfaces for Semantic Content

RWW: At the recent SemTech conference, Tom Tague of Thomson Reuters' Calais project suggested that user interfaces for semantic content are key in getting more take-up. With that in mind, I wonder if you've seen some great interfaces or designs for semantic applications in recent months - if so which ones and why did they impress you?

TBL: I think that whole area is very exciting at the moment. The only piece of hacking I've done over the past few years has been on a thing called the Tabulator [a data browser and editor], which is addressing exactly that. Partly because I wanted to be able to look at this data. And now there are lots of different ways that people need to be able to look at data. You need to be able to browse through it piece by piece, exploring the world of data. You need to be able to look for patterns of particular things that have happened. Because this is data, we need to be able to use all of the power that traditionally we've used for data. When I've pulled in my chosen data set, using a query, I want to be able to do [things like] maps, graphs, analysis, and statistical stuff.


W3C Tabulator, a data browser/editor; Image credit: wiwiss.fu-berlin.de

So when you talk about user interfaces for this, it's really very very broad. Yes I think it's important. There's also the distinction we can make between the generic interfaces and the specific interfaces.

There will always be specific interfaces; for example if you're looking at calendar data, there's nothing else like a calendar that understands weeks, months and years. If you're looking at a genome, it's good to have a genetics-specific user interface.

"I want to be able to do maps, graphs, analysis, and statistical stuff."

However you also need to be able to connect that data, through generic interfaces. So if my genome data was taken during an experiment which happened over a particular period, I need to be able to look at that in the calendar - so I can connect the genetics to the calendar.

So one of the things I hope to see is domain-specific things for various different domains, and the generic user interfaces. And hopefully the generic interfaces will be able to tie together all of the domains.

Next Page: Wolfram Alpha; e-Commerce and Linked Data

Wolfram Alpha and Natural Language Interfaces

RWW: An interesting new product was launched this year called Wolfram|Alpha, described as a 'computational knowledge engine.' It's kind of a mix between Google (search) and Wikipedia (knowledge), and its key attribute is that enables you to compute something. The founders think that 'computing' things on the fly is something we're going to see a lot of in future. What's your take on Wolfram|Alpha?

TBL: There are two parts to that sort of technology. One of them is a sort of stilted natural language interface. We've seen those sort of natural language queries for years. Boris Katz [from W3C] created a system called START [a software system designed to answer questions that are posed to it in natural language]. I think with the Semantic Web out there, those sorts of interfaces are going to become important, very valuable, because people will be able to ask more complicated things. The search engine has traditionally been limited to just a phrase, but some of the search engines are now starting to realize that if they put data behind them and have computation engines, then you can ask things like 'what's this many pounds in dollars?' and so on. So yes, those interfaces will become important.

"Those sorts of interfaces will become important [...] people will be able to ask more complicated things."

Conversational interfaces have always been a really interesting avenue. We've had voice browser work in W3C, that has been an interesting alternative avenue. It's possible that as compute power goes up, we'll see a prolifieration of machines capable of doing voice. It'll move from the mainframe to being able to run on a laptop or your phone. As that happens, we'll get actual voice recognition and pattern natural language at the front end. That will perhaps be an important part of the Semantic Web.

We talked before about what a great challenge the Semantic Web is going to be from a user interface point of view. Conversational interfaces are going to be part of [solving] that. Of course it's also going to be really valuable to have compositional interfaces - for the visually impaired and so on.

Wolfram|Alpha is also a large curated database of data sets. Obviously I'm interested in the big data set which is out there, which is Linked Data. This everybody can connect to. I don't really know a lot about the internals of Wolfram|Alpha's data set. I don't know whether they're likely to put any of it out on the web as Linked Data - that might be an interesting addition. I imagine that quite a lot of it may have come from the web of Linked Data.

e-Commerce and Linked Data

RWW: There have been reports recently that both Google and Yahoo will be supporting the Good Relations ontology and linked data for e-commerce. Companies such as Best Buy are already putting out product information in RDFa. What would be your advice to e-commerce vendors right now, to help them transition to this world of structured data on the Web. The same question could be asked across many verticals, but e-commerce seems like one area which has some momentum right now. Would you advise them just to put out their data as Linked Data?

TBL: Yup! Certainly this year is the year to do it. I've been advising governments to do it and when you look at an enterprise, you find that a lot of the issues are the same. But when you put your data from government or enterprise out there, make sure you don't disturb existing ecosystems. Don't threaten those systems, because you've spent years building them up.

Maybe there's an analogy with when the Web first started and the first bookshops went online. They were more or less a flyer, saying 'hey we have a great bookshop at 23 Main St, come on down!'. Let's say that a person named Joe owned one of these early online bookshops. If somebody had suggested to Joe that he should put his catalog online, Joe would've felt that that was very proprietary data. And he'd be worried that other bookshops would see where he was weak, so they'd be able to advertise themselves as filling that niche he's weak in.

"When you put your data out there, make sure you don't disturb existing ecosystems."

But when his competitors Fred and Albert put their catalogs online, then Joe can check which books people are browsing at Fred and Albert's websites. So Joe would [finally] be pursuaded to put his book catalog up online. But he doesn't put up the prices... until Albert and/or Fred does. And even if catalog and pricing is up there, nobody puts their stock levels online. And there was a period of time when nobody [i.e. online booksellers] had their stock levels up. But people got fed up with ordering stuff that wasn't in stock. So the first book shop to actually tell you about stock levels suddenly was then unbelievably attractive to its customers.

So there's this syndrome of progressive competitive disclosure. This happens when people realize that if you're going to do business with somebody, if you're going to have your partners up and down the supply chain, really it's useful to check the data web - and life goes much more quickly and open.

Best Buy may be what starts the ball rolling [among e-commerce vendors]. Now if I want to look out for what [products are] available, I can write a program to see what there is. If somebody wants to compete with Best Buy, to my program they'll be invisible unless they can get their data up in RDF. Doesn't matter whether they use RDFa or RDF XML, as long as it maps in a standard fashion to the RDF model, then they will be visible.

Next Page: Internet of Things; Conclusion

The Internet of Things

RWW: I'm fascinated by how the Internet is becoming more and more integrated into the real world. For example the Internet of Things, where everyday objects become Internet connected via sensors. Have you been following this trend closely too, and if so what impact do you think this will have on the Web in say 5 years time?

TBL: It connects very much with Semantic Web [and] with linked data. With Linked Data you've got the ability to give a thing a URI. So I can give a URI to my phone, and I can say that's my phone in Linked Data. And also the company that made it can give a URI to the model of the phone. They can also put online all the specs of the phone, and then I can make a link to say that my phone is an example of that product. So now any system which is dealing with me and has access to that data will be able to figure out the sorts of things I can do with my phone, which actually is really valuable. Especially if the phone breaks.

"The Semantic Web is a web of things, conceptually. Tying an actual thing down to a part of the web is the last mile."

The Semantic Web has already given URIs to things, and to types of things. When the things themselves have an RFID chip in them, then I think it's a very exciting world. One can take that RFID chip, go to the Internet and find out the data about the thing. Whether we'll be able to do that, whether the manufacturers will be open enough to allow me to turn data about the identifier of the thing into data about the thing, is yet to be seen. But it's a very exciting idea.


Pachube, an example of the Internet of Things (see ReadWriteWeb profile)

Similarly, I'd like to be able to scan a barcode and get back nutritional information about what's in - for example - a can of food. But we don't have that yet. To get that sort of thing, which is very powerful, we need to build look-up systems, which allow you to translate an RFID code or a barcode into an HTTP address.

The Semantic Web is a web of things, conceptually. Tying an actual thing down to a part of the web is the last link - the last mile. Give the thing a notion of its own identity in the web.

Conclusion

RWW: The over-riding message in both Part 1 and 2 of our interview with Tim Berners-Lee, is for companies and organizations to make their data available online. Preferably as Linked Data, which uses a subset of Semantic Web technologies. But Berners-Lee noted, in Part 1 of our interview, that he'd even be happy with the data in CSV (comma separated values) format.

It's clear that we've seen a lot of progress in linked data already in 2009. In upcoming posts on ReadWriteWeb, we'll continue to track this trend and explain how organizations can contribute their data.

]]> Discuss]]>
http://www.readwriteweb.com/archives/readwriteweb_interview_with_tim_berners-lee_part_2.php http://www.readwriteweb.com/archives/readwriteweb_interview_with_tim_berners-lee_part_2.php Interviews Thu, 09 Jul 2009 06:00:00 -0800 Richard MacManus