During my recent trip to Boston, I had the opportunity to visit MIT. At the end of a long day of meetings with various MIT tech masterminds, I made my way to the funny shaped building (see photo right-below) where the World Wide Web Consortium (W3C) and its director Tim Berners-Lee work. Berners-Lee is of course the man who invented the World Wide Web 20 years ago.
This was my first meeting with the Web's creator, whose work and philosophy was a direct inspiration for me when I launched ReadWriteWeb back in 2003.1
After shaking hands, I told Tim Berners-Lee that this blog's name was in part inspired by the first browser, which he developed, called "WorldWideWeb". That was a read/write browser; meaning you could not only browse and read content, but create and edit content too. It was a shame then when Mosaic, a read-only browser, became the first mainstream web browser in the mid-90s. It wasn't until the rise of Web 2.0 that the read/write philosophy gained widespread acceptance.2 On that note, we launched into the interview...
Note: the interview will be published in two parts, with Part 1 today on the topic of Linked Data. Part 2 will explore other topics and will run tomorrow.
UPDATE: Part 2 of this interview is now available.
RWW: Earlier this year you gave an inspiring talk at TED about Linked Data. You described Linked Data as a sea change akin to the invention of the WWW itself - i.e. we've gone from a Web of documents to a Web of data. Can you please explain though how Linked Data relates to the Semantic Web, is it a subset of it?
TBL: They fit in completely, in that the linked data actually uses a small slice of all the various technologies that people have put together and standardized for the Semantic Web.
Linked Data uses a small slice of the technologies that make up the Semantic Web.
We started off with the Semantic Web roadmap, which had lots of languages that we wanted to create. [However] the community as a whole got a bit distracted from the idea that actually the most important piece is the interoperability of the data. The fact that things are identified with URIs is the key thing.
The Semantic Web and Linked Data connect because when we've got this web of linked data, there are already lots of technologies which exist to do fancy things with it. But it's time now to concentrate on getting the web of linked data out there.

Web inventor Tim Berners-Lee and ReadWriteWeb founder Richard MacManus
RWW: Linked Data has had a lot of grassroots support, which you mentioned in your TED speech. This is something Semantic Web technologies, such as RDF, have struggled to get over the years. Has the W3C been pushing the more bottom-up Linked Data world, because of the frustration over lack of take-up of top-down Semantic Web?
TBL: A lot of the initial RDF and OWL projects came out of the academic world; and some of them were projects to show what you could do in a closed world. And the files were zipped up and left on a disc. While they were interesting projects, and while the systems were useful systems, the semantic web community maybe missed the point of the 'web' bit and focused too much on the 'semantic'. However the work that's been done in the Semantic Web, the standards, was really valuable. It's relatively recently for example that SPARQL [an RDF query language] has been developed.
"It's time now to concentrate on getting the web of linked data out there."
Somebody drew an analogy the other day: can you imagine trying to promote a world of databases without SQL? Even though it's not an interoperable protocol, it's just a query language. So similarly, all that's been put into RDF, rdfs and OWL is very valuable to the linked data community.
The Linked Data community tend to use a subset of that [Semantic Web technologies], of OWL for example. But they certainly use SPARQL. So you could argue that really it wasn't ready to be deployed widely.
Linked Data started as a very informal Design Issues note that I put in; it was a grassroots movement from very early on. So yes W3C has been emphasizing the importance of Linked Data. It's been the Semantic Web Interest Group of course, and various [other Semantic Web] activities, which has been pushing it. But also Linked Data has been seized on - a group of people for example put together DBpedia.3 That wasn't commissioned, that was that they just thought it would be a really cool idea.

Graph of Linked Data sets on the Web, as at March 2009
RWW: In a recent Design Issues note, you urge governments to put their data online as Linked Data (although you'd also be happy for governments to just make available the raw data - presumably so that others can then structure it). What do you realistically expect, for example, the U.S. or U.K. governments to do over the next year? And in the near future, do you foresee different governments interconnecting their Linked Data sets?
TBL: One can't generalize, governments are (like most big organizations) fascinatingly diverse inside them. So you'll find that there are places inside governments where you get a champion who gets linked data and who's just written a script and produced some linked data. So in the UK government for example, you'll find there's RDFa [in the code of its website] for civil service jobs. So if somebody wants to make a database of all the jobs, they can do that very easily.
"The first step of actually putting the data out there is the one that nobody else can do."
There are other cases where the easiest thing for somebody to do is to just put data up in whatever form it's available. Comma separated values (CSV) files are remarkably popular. They're exported sometimes from spreadsheets. It's remarkable how much information is in spreadsheets. Or sometimes pulled out of a database and then put up on the web. It's not as good, not as useful to the community, as if Linked Data had been put up there and linked. But the first step of actually putting the data out there is the one that nobody else can do.

Data.gov, a catalog of public data, was launched in May by the U.S. government
The way to go is for government departments to go the extra step and convert [their data] into Linked Data. One of the nice things about Linked Data, when they have a pile of it, is that they could run a SPARQL server on it. SPARQL servers are a commodity product, a solution for all of the people who say 'but actually I wanted to have XML.' A SPARQL server will generate an XML file [and] allow somebody to write out, effectively, a URL for the XML file.
"Linked Data is the backplane, it's the thing that you connect to in both directions."
In fact, I don't see why SPARQL servers shouldn't provide CSV files, something which as far as I know isn't in the standards. But I'd recommend it, certainly in government context, because CSV files are what people have and what people want.
So the message [for government] is to use RDF. Linked Data is the backplane, it's the thing that you connect to in both directions. As a [web] producer your job is to make sure that you produce Linked Data one way or another. And as a consumer, there are lots of ways to consume that data once it's out there as Linked Data.
In Part 2 of this interview we discuss: how previously reticent search engines like Google and Yahoo have begun to participate in the Semantic Web in 2009, user interfaces for browsing and using data, what Tim Berners-Lee thinks of new computational engine Wolfram Alpha, how e-commerce vendors are moving into the Linked Data world, and finally how the Internet of Things intersects with the Semantic Web. Read Part 2 here.
Footnotes:
1. The very first sentence written on this blog, on 20 April, 2003, was: "The World Wide Web in 2003 is beginning to fulfill the hopes that Tim Berners-Lee had for it over 10 years ago when he created it."
2. For more on read/write browsers, you can read another early RWW post entitled What became of the Browser/Editor.
3. DBpedia is a community project to extract structured information from Wikipedia; see ReadWriteWeb's profile of this and similar resources.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Great scoop and great article Richard
Thanks Richard for continuing to publicise the whole linked data and semantic web movement. Now when I tell people that it's the next big thing, more of them are starting to believe me!
Thanks Richard for an interview with Lee. As you would expect, Lee's answers are fairly complex. As a RWW reader, I hope that you would do a detailed post with examples (they help the most) on explaining what Lee tries to convey here to the uninformed reader. I hope RWW will help us in grasping the linked web.
Great post, Richard. It's terrific that Prime Minister Brown has been able to get Tim Berners-Lee involved with the UK's efforts to publish their data.
I second Tim's pragmatism about getting data online as the first and foremost task. There's so much data that's locked in paper or PDF formats; we have to make it online, accessible and searchable. From there we can convert it or augment it with RDFa. Over time we need to ensure that data is accessible by the widest range of human and machine audiences in a multitude of interactive, downloadable and API-accessible formats.
I'm very supportive, but see
http://www.dh.gov.uk/en/Aboutus/HowDHworks/DHrecruitment/DH_4105999
Big change required. Good luck!
I agree with Rohin. The idea is very interesting but what I'm seeing is that a lot of data is already out there and I'd like to see examples of how linked data is being used.
Great article, thanks!
Semantic Web and Linked Data may be a big step of evolution of the Web. It may be better to take a small step down-to-earth in practice. Legacy contents are enormous fortune which should be formatted into structured data and be mined. I have spent two years to try Semantic Web in practice and code a toolkit, MetaSeeker, to format Web contents and manage them. From the toolkit, MetaStudio is a data schema modeler, which can also generate data extraction and formatting instructions automatically. DataScraper is a data extractor which is fed with the instructions generated by MetaStudio to extract and format data. SliceSearch is an object management system. It can also be viewed as an object search engine. It takes a smart approach to manipulate lifetime information, so it is very suitable for storing, indexing and searching structured contents in instant.
All the programs and their source codes can be checked from http://www.gooseeker.com
It's too bad there's nothing new that hasn't been already re-hashed in this interview. You should have asked him more revealing questions, instead of Semantic Web 101.
Ok, what if I run a travel guide? What's my motivation to put all the information that makes me have a competitive advantage out there for search engines to use?
I think the reason why search engines haven't taken on with the linked data concept was just that they were waiting for wide-spread acceptance for the phenomenon.
With normal data they make order out of chaos.
With linked data they'll be able to make the actual content out of order, thus putting out of business all the smaller companies who follow the hype and put their data readily available. Or am I wrong?
In the history of really bad ideas the semantic web has got to rank as one of the greats. This is an idea which can not possibly work. It seems that the people who cooked this up are unable to do simple math. The cost of managing this would probably quickly take up the GDP for any country which took it on. -Dash
Oh BTW., he's on Twitter now. Let's welcome and thank @timberners_lee for all the awesomeness started 20 years ago, using this hashtag: #thankstimforthewww
Posted by: Ignacio Rodriguez de R,
|
October 23, 2009 10:45 AM
this is true exactly