There is some controversy floating around the blogosphere about the nature of the next web.
We got a clear signal from
Tim O'Reilly that there is no need to continue the versioning fad and call it "Web 3.0," but still, people disagree about what's coming next. To me, what is
coming is not a single thing, but a web that is characterized by several major themes.
Among the evolving aspects of the new web are Semantics, Attention (Implicit Behavior) and Personalization. Regardless of what we are decide to call this next web, the information in it is going to be more meaningful, more automatic, and more tailored to each of us.
A critical piece of the next web evolution is the introduction of structured information. This concept is so basic to us as humans, that we completely overlook the fact that it is quite foreign to computers. When a person looks at a book on Amazon, she sees a book, with the author, ISBN number, publisher and the publication date. To a computer that page on Amazon is nothing more than a bunch of HTML. Increasingly, information on the web is becoming more and more structured. This process is happening via several major movements:
In this post we'll look at how these movements collectively help power a more structured web.
To understand the basic issue with unstructured information consider the following example - a description of a book in HTML and XML. Here is a typical representation that you find if you look at the source of a web page:

Compare this with a representation typically found in XML:

The HTML does not capture the structure of the information, and mixes the information with the representation. XML, on the other hand, is focused on structure only and does not say anything at all about how information should be presented. Billions of web pages today contain unstructured information. To people, this is a non-issue because we are good at semantics and we do not need primitive XML annotation to make us understand. But for computers, lack of structure is a deal-breaker - they can't interpret unstructured, non-standardized information very well.
Way before people created the web, they created relational databases - the platform on which many corporations and web sites are built today. A great thing about relational databases is that they represent the information in a structured way.
The query language know as Structured Query Language (SQL) supports fetching the information from a single database table. More importantly SQL allows queries that correlate or select information from multiple database tables. Simply speaking, SQL allows the data to be remixed. The only condition for this is that the data must be structured.

On the other hand, if the information is not structured, it is effectively stuck in a proprietary silo - closed and immobile. Its representation is only understood by the creator, and it is not readily consumable by any other application or a web service. In a way, this is sort of wasteful because it can not be remixed with the rest of the information on the web.
1. The Rise of APIs. APIs are in fashion these days. Since del.icio.us, the web sites that have defined the social web era have offered interfaces to access their proprietary databases. This effectively accomplishes two things. First, APIs make it easy to fetch information. Second, most APIs these days emit the information as XML, so it is automatically structured. For more about the impact of APIs on the web read our "When Web Sites Become Web Services" post.
2. Top-Down Semantic Applications. We've written recently about the proliferation of top-down semantic applications. In addition to creating utility by extracting meaning from content, these applications do another very important thing: they automatically transform unstructured content into structured information. It happens because after extracting the info, the services offer an API or structured RSS feeds, effectively injecting the structured information back into the web.

3. Classic Semantic Technologies and Microformats. The main goal of the Semantic Web is to make information structured. XML-based languages like RDF and OWL are designed to capture information so that not only things, but also their attributes and the relationships between them, are represented clearly. The classic approach, however, is running into many difficulties. People are enthusiastic about the prospects and theory behind it, but lack of consumer focus and business value, as well as technical difficulties, have made the implementation of classic ideas elusive.
In the mean time, microformats, a more simple approach to information annotation, has gained some momentum. The idea behind microformats its simple: embed markup that indicates the structured information within HTML pages. What's good about this approach is that annotations are compact and can be interpreted by web browsers as well as any other program that reads the HTML page. The approach also has issues, though. First, the number of things that can be described by microformats is limited. A popular microformat is hCard, which describes contact information about people, organizations, and places.

The diagram above is from the Microformats web site.
A bigger issue revolves around how microformats are intended to be used. According to the designers, they are not a new language, not infinitely extensible and open-ended, nor are they made for defining the whole world. Rather, microformats are an evolving solution, initially aimed at designers as a "set of simple open data format standards that many are actively developing and implementing for more/better structured blogging and web microcontent publishing in general." Despite its simplicity, microformats are doing a lot of public good by adding structure to unstructured content and pushing the envelope along with other solutions.
4. RSS As A Delivery Mechanism. There is a common misconception about RSS - people think that it is a structured language. It is not. Basic RSS is a simple format for delivering news. Each RSS entry contains a title, a link and a description. In addition, RSS allows flexible embeds to deliver things like images, video and podcasts.
However, what is true is that RSS, like the XML language, is extensible. What has been happening is that companies have started using RSS extensions to deliver results from their APIs. For example, as we've written, Wine.com does exactly that - its API calls return RSS.

What does this mean? In addition to the standard RSS attributes, Wine.com outputs proprietary ones like id, sku and price. Any application that would like to interact with this API can leverage the additional attributes. It is likely that companies will continue to use RSS like this in the future. That's because RSS is already well known and the bottom line is that it doesn't matter what XML-based language we use. Technically speaking, any RSS extension is just XML and does not really have much to do with RSS. But if the world wants to think that it is RSS and is willing to agree on a standard - so be it!
So what happens if we take all of this and put it together? Something really profound - a structured web. Possibly a precursor to the Semantic Web, the structured web would be much more readily remixable. It truly will be the web as a database. Yes, a good old relational database, but instead of tables we would be remixing web sites and web services.

Probably the most interesting thing to note about the structured web of the future is that it will still be non-standard. Just because information is represented as RSS or XML does not mean that two different services will have the same representation of a book. However, the problem of mapping one representation onto another is generally not difficult, as long as the information is structured (financial companies have been doing this for decades). So structure promises to bring nearly automatic interoperability.
Another outcome, is that the web where information is structured is much more amicable to be transformed into what is currently envisioned as the Semantic Web. The ontologies and relationships are much more readily overlayed on top of structured information. Likely, RDF and OWL would be used to do just that, as they were originally intended, except on top of the new structured layer. Then the coming next web becomes a direct precursor to the Semantic Web. The leap of faith that we are now being asked to make would disappear, and instead, the jump to semantics becomes obvious.
The next web is not just about one thing, its about many themes. However, what is fueling the web of the future is structured information. As we discussed in this post, many different technologies, in their own way, are gradually transforming the web from its current HTML chaos into a structured XML heaven. It has already happened in quite a few places and over the next few years we will be seeing more and more structured information online.
The benefits? We hope that some of the promised semantic tools will be able to take advantage of the structured information. We look forward to smarter search, and mashups that bring us exciting remixes that were not possible in yesterday's world of unstructured HTML silos.
If you enjoyed this article, please digg it here:
Listed below are links to blogs that reference this entry: The Structured Web - A Primer.
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1686
In his post 30 Thoughts At 30,000 feet, Fred Wilson referred to Alex Iskold as “a freak of nature.” Fred supports this by saying “He writes code, runs a company, and does amazing blog posts for Read Write Web that are bette... Read More
Sponsor: Here is a summary of the week's Web Tech action on Read/WriteWeb. Note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email. Web News Newsvine Acquired By MSNBC The week started... Read More
The microformats website itself give the simplest definition: “Microformats are small bits of HTML that represent things like people, events, tags, etc. in web pages.” That’s it. Microformats are a normalised way to describe web pages... Read More
My colleague Gary Krakow was telling me about seeing a recent performance by Sharon Jones and the Dap-Kings and passed me a few links describing their production philosophy. Most striking to me was a line in this Mtv interview in which one of the guys ... Read More
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
I kind of think we're a lot further down that road than this would imply, and for the most part (in my job, at least) I rarely care whether data is structured or unstructured, for the very reason you mention above: "Probably the most interesting thing to note about the structured web of the future is that it will still be non-standard."
That's true today, it'll be true tomorrow, and it'll be true next year. So there's no need to wait. I deal with XML, JSON, serialized PHP, text formats, and straight messy HTML to get the data I need, and aside from serialized PHP, they are all about the same to deal with. They all require monitoring and updating when they change, and they all require me to impart meaning on the data I grab.
Just a quick example: in the book industry, Barnes & Noble has no API to retrieve book information. That has next to zero effect on how we interact with them, because they have unique book information that Amazon does not, and we want to display their price information. We pull the data we need from their HTML pages, and in fact it has been more historically reliable and static than 90% of the APIs we use. If there is a change, it's 10 minutes to fix, but that's no different than the APIs that change over time.
The bottom line is, unless every data source you want to use uses the same standard or a common standard (which I doubt will happen anytime soon), you will need a human to correlate disparate data sources and their different meanings to a computer.
Posted by: Morgan | October 10, 2007 12:36 PMThis is a great overview and perspective analysis of What's Next. The quality of this blog is outstanding.
This would make a great CommonCraft Paperworks video.
Posted by: Kevin Makice | October 10, 2007 12:58 PMGreat article! I would also highly recommend checking out the following article:
What is the Structured Web?
Posted by: Eric Blue | October 10, 2007 3:33 PMhttp://www.mkbergman.com/?p=390
Wow, great article, Alex.
Yeah, big, big data-matching problem. I am excited about the 'wisdom of crowds' effect creating some organic commonality of xml tags much like delicious tags. I could believe that xml from Amazon and B&N and Alibris might all contain an tag. I think this organic effect is liable to be much more far-reaching than deterministic microformats, if not as neat and tidy.
Posted by: mathew johnson | October 10, 2007 3:49 PMWe are thinking about some of these problems (opportunities?) at http://blist.com - I know that you are on our beta list already - but I encourage other readers to check us out as well.
typo - fixed:
"B&N and Alibris might all contain an 'author' tag"
Posted by: mathew johnson | October 10, 2007 3:57 PMThose interested in structured data, microformats, etc. may want to check out AlchemyPoint, a structured web / mashup platform that integrates with your browser. It is capable of processing hCard and other well-formatted data sources, as well as "top-down" conversion of unstructured data sources (HTML, text, etc.) into standard formats.
Posted by: Elliot | October 10, 2007 4:13 PMIn the "structured information vs. unstructured information" image you call out that structured information is "readable by machines". I think what you really mean is machine understandable. Today all web content is perfectly machine readable. With things like microformats, we move from just machine readable to machine understandable. That is a key difference.
I think the release of Firefox 3 with native support for microformats, we are going to see microformats thrown into the mainstream. Mark my words.
Posted by: Tyler Keen | October 10, 2007 4:45 PMIf you want to see what microformats can do for you prior to Firefox 3, check out these ad-ons: https://addons.mozilla.org/en-US/firefox/search?q=microformats&status=4
Posted by: Tomas Becklin | October 10, 2007 7:12 PMi think a flow to a web site is also important
Posted by: Shareware Software | October 10, 2007 9:02 PMNice article. However, the proliferation of API's and the extending of RSS, to me seems to have come about at the same time. I would think API's would be in line with RSS as both, "evolutionary drivers" and part of the "new" web whether you think it's Web 2.0 or Web X.0.
I mean these technologies didn't just appear, they have been in existence for years with pontificating prognosticators (whoa, dunno where that came from) for almost as long. Technologies that are well over a decade old.
What's missing in this article is the adoption of these technologies, and it's in that history that kind of befuddles such linear explanation.
I guess if you're new to this stuff, it can be helpful, and if you're trying to guess what's coming next, maybe that can help a bit. But let's be frank, no one has any idea of what's next.
I'll put prediction out for you, it's going to have to do a lot with video and wireless technologies. I don't think we've scratched the surface and I'm sure as hell tired of writing/read.
Ernesto Gluecksmann
Posted by: Web Consulting DC | October 10, 2007 9:08 PMi have no idea what you are talking about. Will i still be able to look up sports scores? You are scaring me :)
Posted by: howard lindzon | October 10, 2007 9:24 PM@Howard. You are safe, sports and stocks stay in tact. The rest is of course a fair game.
Posted by: Alex Iskold | October 10, 2007 9:27 PMI would be interested in seeing a less western-civilization focus and more use and understanding of cultural interface differences, integrated translations and a meshing of languages etc.
Posted by: Antje Wilsch | October 10, 2007 9:35 PMHoward - you wont have to, you'll get them automatically along with player profiles, team history, and upcoming matches. oh and your PVR will record them just by knowing who your favourite team is :)
Posted by: mr. semantic | October 10, 2007 9:44 PMAlex,
Good article. However, seems that (especially when looking at uF) you've missed something: RDFa [1] ;)
Cheers, Michael
[1] http://www.w3.org/2006/07/SWD/RDFa/primer/20070918/
Posted by: Michael Hausenblas | October 10, 2007 11:47 PMWe look forward to smarter search
Yes, we certainly do. One problem is that the ossified GoogleBot can't read an XML page. Seriously.
Put an XML page out there, and it won't even be able to follow the links. (Of course it won't download the XSL and do a transform either.)
Browsers (IE and FF) have been doing transforms for years. It can well be lighter weight and faster to send out structured XML + some styling info.
But even if Googlebot got its act together and learned how to parse XML, Google's internals are set up to weight HTML tags.
In other words, everyone except old search engines are ready to present structured data today. But the search engines are holding us back. Sigh. I wish those PhDs would stop making google recipes and get on with real work.
Posted by: Israel LHeureux | October 11, 2007 12:14 AMYou are missing out on an important point about using structured information on websites: the benefit is now already. Search engines already take into account any kind of structure they can find in a page. Effectively, using structured information is a good search engine optimization strategy that is likely more effective than abusing meta tags or putting lots of links in a page.
Basically if you are publishing reviews, the goal is that people read them. Hreview is trivial to implement and allows search engines to recognize your page as a review of a particular product. Given enough use of a particular microformat, search engines will adapt to it.
I wouldn't be surprised if Google and Yahoo already handle some microformats in their search engines. Yahoo has an calendar event search service as part of their local search for example. And both are heavily investing in map related features so obviously a coordinate market up with the geo format is not something they are likely to ignore.
Posted by: Jilles van Gurp | October 11, 2007 12:35 AMIf you had used the term "semi-structured" instead of structured, I would have agreed with you.
Posted by: Dave | October 11, 2007 2:38 AMMissing a bit from this interesting blog entry is what the uses and advantages of the semantic web will be.
Personally I think one big step forward could be to take the human out of the loop in places. Humans now are needed to analyze data. But with the semantic web machines will potentially be able to share/compare/analyze their (internet) data directly with other machines.
Posted by: Chris Rijnders | October 11, 2007 3:42 AMThis is a pretty relevant video on Web 2.0 and the semantic web:
http://www.youtube.com/watch?v=6gmP4nk0EOE
It is also pretty inspirational.
Posted by: Matt | October 11, 2007 4:22 AMMakes me think of a seminal article on the subject, published in early 2003 by Sebastien Paquet : Towards Structured Blogging.
One key element missing in your article (IMNSHO) is the fact that we need user interfaces that makes it easy for regular users to create, consume and re-use structured information. That's the challenge we are facing right now on the web, it has been the same in the XML world (often offline) for about 10 years now...
Posted by: Sylvain Carle | October 11, 2007 6:49 AMThe "funny" thing is that you use &l;img>s to make <table> AND <book> markup in your article. Why?
To make it look "nice" ?
It's textual information and it's better, IMH,O is to represent it text fomat.
Posted by: karim | October 11, 2007 7:06 AMwhat about machines? blind folks...
You're making a classic mistake. APIs don't have semantics - they mean nothing to the computer. Only humans can write code to take advantage of them, and so people must write different code for every different site and their APIs. That's just plain awful. It's almost as bad as having no structure at all.
Now, forget services and enter _resources_. Take a look at the Ruby on Rails community.
Posted by: John | October 11, 2007 10:43 AMAlex,
Nice article. One sentence I particularly favor in this article is that "the next web is not just about one thing, its about many themes." You are right. The World Wide Web evolves to be more and more complicated. It cannot be precisely described by any single adjective word such as "semantic" or "data" or even the "structural" you used right now. We are always presenting one aspect of this next web, but not the whole picture.
I agree with you that more and more data on the Web are going to be encapsulated in structural forms. By this augmentation, web data become more machine-processable. This is a certain direction of web evolution.
By the way, semantic annotator is a more formal name for the "Top-Down Vertical Applications" you specified in your figure to convert unstructured information to structured information. Basically what you mean is to annotate unstructured information based on some pre-defined structural definitions. Such a process is indeed semantic annotation. If we really need to emphasize "vertical", probably we can call them domain-specific semantic annotators or vertical semantic annotators.
-- Yihong
Posted by: Yihong Ding | October 11, 2007 10:50 AMNice article, Alex.
I like how you've muddled things a bit and looked at a stage in between the full semantic web and where we are today. As APIs become more common, RSS/ATOM is used to pass "richer" structured data and mashup/analysis tools become more friendly, it just seems like the potential for interesting applications are enormous.
It will be wonderful when computers will be able to pass around semantic data freely (or maybe terrible, if they become self-aware like Skynet ;), but it just seems more likely that we'll pass through multiple web 2.x stages, each a "more structured" web than the stage that preceded it. Those stages will likely be difficult to define, unless some new application catches like wildfire, but regardless, I think that we'll be able to look back in a few years and see that we've made some great progress.
Posted by: Ken | October 11, 2007 12:03 PMIf the TABLE was properly constructed with a TH header row, I'd prefer it to the XML.
Posted by: pwb | October 11, 2007 2:18 PMExcellent article Alex. I like your point about modified RSS, being the way XML gets out there without people realising.
I also agree with comment #17 that search is a key adoption technology. The incentive for SEO might help publishers make efforts to structure their content.
I categorise my friends on facebook because I like analysing what my network is made up of (school friends, industry, etc). I see benefit there. The key is getting people to see the benefit.
And PLEASE, let's drop web3.0 and just say future trends of the web. Let the historians create labels of our times, not the marketers.
Posted by: Elias Bizannes | October 11, 2007 8:08 PMThis is fascinating stuff, and the Semantic Web has been in the works for enough time to generate a lot of Meta-Noise.
XML and it the more or less easy to use translators, intermediate object formats and APIs all bring up a very volatile competition for platform mindshare.
Their are two issues that have reduced my excitement in the past two dozen months.
1. Who will structure the structure? You better hope it's you.
2. What is the market value of organized information? This is a knotty one. Clearly, having a standardized marketing dataset is great for the NSA and the India Phone Marketers School. It may not really matter to a humble who human who only really knows a few hundred people.
Ditto to the APIs as they come, the example was Barnes and Noble static HTML for a great reference. You have to wonder what might need be done that cannot be done on Facebook.
If I were to write an XML specification it would be a big recursive operation on a null string, and would take approximately 15 billion years to execute.
Posted by: floating-point-web-0.0 | October 12, 2007 7:25 PMAlex. Nice post. Just for a bit of fun, you might like to have a read of something I started to draft in 2003 but only posted at the begining of the year:
Posted by: James Dellow | October 13, 2007 12:49 AMhttp://chieftech.blogspot.com/2007/01/hidden-web-microformats-and-the-next.html
I didn't have the "Web 2.0" language or experience then to really deal with the ideas I was trying to describe but I concluded at the time:
"Web services and XML based technologies will create a fundamentally different user experience that will make people more productive without being more information literate. Of course the missing piece is who will tell you what information you need to access. Nothing will change in that respect and a few highly information literate and branded knowledge workers will continue to play a role in helping us to select what data we need. But what will change is how we access information. The future evolution of the Hidden Web we have glimpsed in RSS shows an end to the prominence that Web browsers have so far enjoyed."
Many of the shortcomings of RSS described in this post are addressed by Atom. This post describes it better than I ever could:
http://cfis.savagexi.com/articles/2007/04/26/atom-will-change-the-world
Posted by: Luigi Montanez | October 14, 2007 8:10 AMA good read, but with numerous mistakes that I hope you don't mind me pointing out.
1) What you describe is "Semi-structured" data not "Structured". A relational database works with the structured variety, in that the facts it records must have a common format (and hence can be viewed in tables). XML allows irregularity in the facts it collects together, that would require a JOIN to reformulate in SQL.
2) Having said that XML is a reinvention of the hierarchical model which utterly died on its ass 40 years ago. If XML databases (that was not its intended function) are the future we are truly heading back to the medieval days of computers.
3) You describe a concept of a web of data. If by this you mean a network of nodes that can be queried, again this would be the reinvention of network databases, which died on their asses 30 years ago and were thankfully wiped out by relational db's.
4) Which brings me to the semantic web. What you describe is not a precursor to the semantic web. Despite the billions invested in it, the SW has yielded zilch - it's always been fundamentally flawed (see point 3) but given its another case of people ignoring history (the reinvention of a square wheel), its taken a decade or so for its developers to recognize it as a dead duck. There is a way forward but it is not through these 'navigational databases'.
However I agree with some of your insights into mash-ups, the lack of a single format, and so on. My own view is that eventually a relational/web hybrid will emerge. I just hope its sooner than later, but I'm not holding my breath - we won't get there any time quick if we all carry on ignoring the lessons of the last 3 decades of data theory.
Nice blogging anyhow.
Posted by: JOG | October 14, 2007 4:53 PMNice entry. Only one question - since most of the web pages present unstructured data what is needed is a method of building semantics out of them. This is very hard but not intrusive...
Posted by: KBac | October 15, 2007 4:46 AMFirst of all, this is a great article, a good read with quality content.
The article mentions the next web to be compose of RSS, XML and XHTML. What about HTML5? If HTML5 becomes a standard before XHTML, would we be able to achieve the same level of structure we expect from XHTML2? ...
Posted by: Juan I. Ruiz | October 15, 2007 9:25 PMInteresting article, with one or two flaws:
1) The next web is something we don't know about, what is in this article is things that are already happening, but of course only in a very small way.
2) Grouping proprietary with unstructured. I think there is a slight confusion over terminology. Proprietary information is usually very well structured, it's just that the stucture isn't open or freely reusable.
3) HTML tables - whilst I would agree that the example given is typical, there are many examples now where HTML tables do convey structure. When used properly, an HTML data table is structured and accessible. HTML markup for data tables certainly isn't great for human-readability.
Posted by: Richard Morton | October 16, 2007 4:49 AMAlex, great clarification of the staging of Web 2.0. I run a Top-Down Vertical Application company that structures the web for institutional money managers (think investable information for Wall St), and we are able to create huge value just by determining aboutness and impact.
Posted by: Penny Herscher | October 17, 2007 8:18 AMAlex,
Great post - perhaps a mention of Atom (and AtomPub) as alternate pub/sub protocol/syndication formats. The recently published final AtomPub spec (RFC5023) will I think be a pretty important step in the path to a more structured web.
John.
Posted by: John O'Shea | October 17, 2007 1:48 PMSorry - but I don't see anything here that wasn't clear 18 to 36 months ago.
Am I missing something - or are you writing for a late-adopter audience?
Posted by: Matt | October 18, 2007 1:49 PM