ReadWriteWeb

Semantic Web: Difficulties with the Classic Approach

Written by Alex Iskold / September 19, 2007 8:20 PM / 37 Comments

Summary: The original vision of the semantic web as a layer on top of the current web, annotated in a way that computers can "understand," is certainly grandiose and intriguing. Yet, for the past decade it has been a kind of academic exercise rather than a practical technology. This article explores why; and what we can do about it. Update: Part 2 is available now Top-Down: A New Approach to the Semantic Web

The semantic web is a vision pioneered by Sir Tim Berners-Lee, in which information is expressed in a language understood by computers. In essence, it is a layer on top of the current web that describes concepts and relationships, following strict rules of logic.

The purpose of the semantic web is to enable computers to "understand" semantics the way humans do. Equipped with this "understanding," computers will theoretically be able solve problems that are out of reach today.

For example, in a New York Times article, written earlier this year, John Markoff discussed a scenario where you would be able to ask a computer to find you a low budget vacation, keeping in mind that you have a 3 year old child. Primitively speaking, because the computer would have a concept of travel, budget and kids, it would be able to find the ideal solution by crawling the semantic web in much the same way Google crawls the regular web today.

But while the vision of a semantic web is powerful, it has been a over a decade in making. A lot of work has been done at the World Wide Web Consortium (W3C) specifying the pieces needed to put it together. Yet, for reasons ranging from conceptual difficulties to lack of consumer focus, the semantic web as originally envisioned remains elusive. In this post, we take a deeper look at the issues and wonder if the classic bottom-up approach can ever work.

Classic Semantic Web Review

In our post earlier this year, The Road to the Semantic Web, we discussed the elements of the classic semantic web approach. In a nutshell, the idea is to represent information using mathematical graphs and logic in a way that can be processed by computers. To express meaning, the classic semantic web approach also advocates the creation of ontologies, which describe hierarchical relationships between things.

For example, using such ontologies it would be possible to express truths like: dog is a type of animal or Honda Civic is a type of car. It would then also be possible to describe the relationships between things like this: dog is eating food and John is drivng a Honda Civic. By combining entities and relationships and expressing all content on the web in such a way, the result would be a giant network, or, the semantic web.

The W3C has mapped out a set of tools and standards that are needed to make it happen, two of which are the XML-based languages RDF and OWL that are designed to be flexible and powerful. To accommodate for the distributed nature of semantic web, documents are made self-describing - the meta data (meaning) is embedded in the document itself. The entire stack, as it was envisioned by Sir Tim Berners-Lee, was presented in 2000 (see image below), the rest of the post will focus on the difficulties with this approach.

The Technical Challenges

1. Representational Complexity: The first problem is that RDF and OWL are complicated. Even for scientists and mathematicians these graph-based languages take time to learn and for less-technical people they are nearly impossible to understand. Because the designers were shooting for flexibility and completeness, the end result are documents that are confusing, verbose and difficult to analyze.

2. The Natural Language Problem: People argue that RDF and OWL are for machines only, so it does not matter that people might find them hard to look at. (Though as a side note, the advantage of XML representation is precisely that people can look at it, mainly for debugging purposes.) But even assuming that RDF and OWL are for machines only, the question arises: how are these documents to be created?

There are two possible ways, one is automated, where an algorithm takes a piece of text and produces RDF, another approach is for people to annotate existing documents using visual tools that then generate RDF from those annotations. Both approaches have problems. If there is already an algorithm that can take a piece of text and generate RDF, then this algorithm should be smart and AI-like. Why do we even need the RDF if we already have such an algorithm? The issue with manually annotating documents is exactly that: it is manual. Having people annotate things for computers to process is at the least inefficient and at the most offensive.

3. The Bottom-Up Assumption: Because there are vast amounts of existing information that need to be transformed, the classic semantic web approach is a bottom-up approach. Annotating information on the web-scale is a daunting task. If it is to be done be a centralized entity, then there will need to be Google-like semantic web crawler that takes pages and transforms them into RDF. This comes back to the issue we just discussed - having an automatic algorithm that infers meaning from text the way humans do. Creating such an algorithm may not be possible at all (and again begs the question of the need for RDF if the algorithm exists).

An alternative is to have web sites themselves generate and maintain meta data. While this is certainly a much more scalable approach it raises questions. First, what benefit is there for web sites to do this? Second, what tools are out there to get it done? Assuming that these questions are answered this would be the more viable alternative.

4. The Standards Issue: A distributed or self-organizing approach to the problem seems the most promising, but it runs into the classic technology issue of standards or the even more ancient human problem of common language. The history of technology is full of Tower of Babel examples - separate distributed systems that do not talk to each other. A common solution is to build an adapter or translator that maps concepts from one system to another.

For example, suppose there are representations of a book defined by Barnes and Noble and Amazon. Each has common fields like ISBN and Author, but there maybe subtle differences, i.e., one of them may define edition like this: 1st edition and the other like this: edition 1. This seemingly minor difference, one that people would not even think twice about, would wreak havoc in computers.

The only way to have interoperability is to define a common standard for how to describe a book. So having self-describing documents is not enough, because there still needs to be a literal syntactic agreement in order for computer systems to interoperate. The bottom line is that there needs to be a standard and an API.

The Scientific Challenges

1. The Godel and NP-completeness: The technical issues seem to be steep, but even if these issues are addressed there are much deeper and more fundamental problems. A famous mathematical system proved by Kurt Godel in 1933 states: No logical system can ever be both consistent and complete, which means that there are things that can not be proved by logic. That essentially means that not all problems can be solved.

Godel's work was extended by British mathematician Alan Turing and later led to modern computational complexity theory. There is a class of problems, known as NP-complete, that basically can not be solved efficiently by a modern computer. The reason is that the solutions are not algorithmic and requires exploration of all possible paths.

2. Dealing with Uncertainty: You may not understand Godel or NP-completeness, but you are familiar with the consequence - living with uncertainty. Uncertainty is something that computers can't deal with but that we can handle very well. In fact, we thrive on it. Everyday we make decisions without knowing all the facts. We do this by utilizing iteration.

Here is a simple example of how we get around uncertainty: When someone speaks to us and we don't understand, we say: Excuse me, but what do you mean? After the person explains what they're trying to communicate using slightly different words we typically get it. But this is something that computers built on principles of the classic semantic web would not be able to do. They require infallible logic. They require precise representation of the facts. This is certainly not true in our lives, and it is unlikely to be possible on the web.

3. Replacing Humans With Machines: Going back to John Markoff's example of a computer booking a perfect vacation, one can't help but think of a travel agency. In the good old days, you would go to the same agent over and over again. Why? Because just like your friends, your doctor, your teacher, the travel agent needs to know you personally to be able to serve you better.

The travel agent remembers that you've been to Prague and Paris, which is why he offers you a trip to Rome. The travel agent remembers that you're a vegetarian and orders the pasta meal for you on your flight. Over time people learn and memorize facts about life and each other. Until machines can do the same, knowledge of semantics, limited or full is not going to be enough to replace humans.

The Business Challenges

Perhaps the worst challenge facing the semantic web is the business challenge. What is the consumer value? How is it to be marketed? What business can be built on top of the semantic web that can not exist today? Clearly the example of instant travel match is not a "wow." It's primitive and, in a way, uninteresting because many of us are already quite adept at being our own travel agent using existing tools. But assuming that there are problems that can be solved faster, there is still a question of specific end user utility.

The way the semantic web is presented today makes it very difficult to market. The "we are a semantic web company" slogan is likely to raise eyebrows and questions. RDF and OWL clearly need to be kept under the hood. So the challenge is to formulate the end user value in ways that will resonate with people. This is particularly important, if the bottom-up approach is to work. If people can see the value, then so will businesses and that might prompt them to start annotating and transforming their information. Yet, it's difficult to see how this will happen, given the current, rather academic focus.

Conclusion

The original vision of the semantic web as a layer on top of the current web, annotated in a way that computers can "understand," is certainly grandiose and intriguing. Yet, for the past decade it has been a kind of academic exercise rather than a practical technology. The technical, scientific and business difficulties are substantial, and to overcome them, there needs to be more community support, standards and pushing. This is not likely to happen unless there are more clear reasons for it.

We will discuss an alternative approach that we call the Top-Down Semantic Web in our next article. Please tell us what you think about the prospects for the classic semantic web approach in the comments below.

Update: Part 2 is available now Top-Down: A New Approach to the Semantic Web


8 TrackBacks

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1634

Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. The Semantic Web vision has already served its purpose, in a way. Many people have read about it and have been influenced by it, or by some aspects of it, just like many people have been influenced by Ted Nelson and his Xanadu work in the past. Just like Ted never shipped a product, the Semantic Web (as it is formally defined) might never ship or be successful, but its ideals will most likely be realized in one form or another. In fact, some aspects of it are already in use. So for me, the Semantic Web is already a success.

    Posted by: Jean-Michel Decombebe | September 19, 2007 8:42 PM



  2. Alex,

    good work! I can't wait to your next article about the Top-Down Semantic Web.

    I am glad I finally read an article that clearly explains what I've been thinking about the Semantic Web. It's way too technical and scientific and not really practical for the mere mortal.

    Posted by: Jerome Paradis | September 19, 2007 9:02 PM



  3. What the hell is this shit? This is actually researched and provides useful information. You can't build a readership with crap like this. You should be writing about something like http://jottit.com !

    Posted by: AaronF | September 19, 2007 9:41 PM



  4. Alex,

    "What is the consumer value? How is it to be marketed? What business can be built on top of [it] that can not exist today?"

    I'm sure the guys at DARPA who created the 'Net decades ago didn't have answers to those questions either... ;)

    Posted by: Don Jones | September 19, 2007 10:05 PM



  5. Alex,

    Certainly I agree with you that the approaching to Semantic Web is a difficult task. I am also waiting to see what your Top-Down Semantic Web approaching would be. But I am not quite agree on what you talked about the Bottom-Up Semantic Web is. In fact, the Web is under its own evolution. The realization of Semantic Web is not a great human project. In contrast, it is an evolutionary event.

    Anyway, I am preparing writing a long response to this post at my own blog. Moreover, I have several discussions about realizing semantic web both at my own blog and the SemanticFocus Blog. Here are some of them that you might be interested to read.
    1. A Simple Picture of Web Evolution
    2. ome Truth about the Semantic Web
    3. Satisfying the Nature of Selfishness: The Key to Initiate the Semantic Web
    4. Weaving the Thread-Driven Semantic Web
    5. What does tagging contribute to the web evolution? | An introduction of web thread
    6. Epistemological extension to ontologies: a key of realizing Semantic Web?

    Realizing Semantic Web is an important and interesting research topic. I am looking forward to hearing more opinions from you.

    -- Yihong

    Posted by: Yihong Ding | September 19, 2007 10:19 PM



  6. Hi, Alex:

    Excellent article - well-researched and very informative, without being overly technical!

    I agree with you on the business challenges, essentially a "killer app" has yet to be found (although as Don Jones points out above, that may come in the future).

    I think the technical challenges, while complex, aren't as unsolveable as you make them appear. Let me address them one at a time:
    1. Representation:
    Using RDF (and OWL) does not have to be as bad as your accompanying picture shows - basically, once an ontology is defined, you have to create triples of data, and classify them. [Yes, not trivial, but tools will help with this in the future.]
    2. NLP:
    I don't understand; can you please explain further - why is this a technical challenge for Semantic Web?
    If only the *annotations* are in RDF, then only the folks who mark up a document need to know RDF, not end users. Certainly, there are those who understand how to use it.
    Besides, RDF mainly involves the use of XML notation within the RDF-namespace; as you point out, people routinely look at XML.
    3. Scaling:
    If Google can scale up to compute PageRank for every web page that they crawl, I don't see why they couldn't do the same to create annotations for those web pages.
    Creating annotations automatically - ok, that's certainly a core issue; but we may be able to come up with approximations, just as PageRank is an approximation for a Wisdom-of-Crowds solution to the "authority" of a web page.
    4. Standards:
    Here, I must disagree with you. There does not need to be one standard across all web sites. The whole point of OWL is to map different ontologies to each other. Who would do it is not clear, but if a heavyweight jumped in to create a de-facto standard for any domain (e.g. Amazon or EBay for E-commerce), then others would probably provide a mapping to that standard.

    I agree that these are all issues, but (except for the automated annotations from #3), I don't think these are technical issues.

    I recently attended a presentation on Semantic Web by Nova Spivack, CEO of Radar Networks (my writeup here) in which he covered most of these issues. The main impression I came away with is that the technical problems were solvable, the main problem is that of finding a compelling business need.

    Posted by: NitinK | September 19, 2007 11:30 PM



  7. I referenced Nova Spivack in my last comment - he's one of the experts on this topic. Nova's blog is here:
    Minding the Planet

    Posted by: NitinK | September 19, 2007 11:38 PM



  8. I see a lot of misconceptions being propagated about the Semantic Web, resulting in a lot of misunderstanding and oversimplification by nontechnical people (including prominent bloggers) about what it really is.

    If I were to try to explain it to a nontechnical person, I would say that the vision of the Semantic Web is to turn the Web into a global brain that can answer any question anyone may have; furthermore, that it is to be achieved by analyzing facts known by the Web (those RDF statements), finding connections between these facts, and then establishing new facts; furthermore, that constantly teaching more facts to the Web will make it possible to answer more questions, as well as more complex ones.

    This is why the magnitude of the scalability issue is way beyond that of Google scalability. It is one thing to construct a very large graph then count the incoming/outgoing degrees of each node (PageRank overly simplified), but it is quite another thing to evaluate an almost infinite number of paths through a graph of similar (and in fact greater) magnitude (there are usually many facts per page).

    The major problem that the Semantic Web will face in the future is not even scalability, anyway, but trustability. Indeed, what makes a fact on the Web true? And if you cannot guarantee that all Web facts that were evaluated to answer a person's question are true, then what is the value of the Semantic Web's answer? And then, the facts that embody the truth for one person do not necessary embody it for another (e.g. "abortion is amoral", "the Catholic Church is the only true church", "Paris Hilton is cool", etc.). Every Web fact will have to be authenticated and certified by the issuing "authority", and people will be able to choose which authority they wish to rely on.

    Get ready for RDF spam, the worst and most insidious spam form of all.

    Posted by: Jean-Michel Decombe | September 20, 2007 12:40 AM



  9. Great to see discussion on the subject, thanks Alex!

    Ok, so there are dozens of places where I'd personally state things differently, but I'll save you the point-by-point. Overall, my one criticism is that you make things sound more complicated than necessary. A relevant point is summed up in a quote the source of which I can never remember, that goes something like: in the Semantic Web, it's not the "Semantic" that's new, it's the "Web"...

    I'd rephrase the general problem something like this:

    a) We already have lots of information stored in computers in a form that computers can directly use (i.e. data). We know how to collect data - our existing desktop and networked applications are full of it.

    b) We have the Web - a huge, globally distributed information store, but it's primarily human-readable documents, in general computers can do little to interpret the information on Web pages.

    * What's the easiest way of joining a) to b)?

    A good answer, and the essence of Semantic Web technologies is to name things (*any* things) and the relationships between them with URIs. This is the basis of RDF, and is a direct extension of the Web's idea of links (see evolving the link). The rest of the stack just facilitates things on top of this base.

    We do have systems on the Web that deal in data, but interoperability between these systems is the exception, not the rule. Although the rate of increase in folks using Semantic Web technologies in the core of their systems isn't meteoric, that isn't a major problem. Techniques are appearing which enable the bridging of systems to the Semantic Web (a recent example being GRDDL).

    This stuff is certainly not confined to academia - a couple of random examples I saw in a post this morning: Joost and Microsoft Interactive Media Manager both use RDF.

    Posted by: Danny | September 20, 2007 1:26 AM



  10. Great article.

    Jean,

    There doesn't need to be complete truth's, as in life like you stated there aren't black and white answers, there needs to be the "best possible answer".

    e.g. if someone asked "Is paris hilton cool?" the answer should be calculated according to number of times the Paris Hilton node in the graph/tree was related to "cool", and according to it provide an assumption - if she is or she isn't.

    You can't say if Paris Hilton is cool - no one can, but the semantic web can decide if the majority of definitions believe she is, or isn't, that's the maximum it can get, there can't be an issuing "authority" - because there are no truth's to many questions, there can only be a general assumption.

    Posted by: Adam Peled | September 20, 2007 1:28 AM



  11. Adam,

    I agree with you regarding the Paris Hilton example. And that is why I am indeed worried about RDF spam. You could imagine that, say, some religious extremists (I am just taking an example here, not a stand against anyone) would spam the Web with statements related to their beliefs so that people asking a question to the Web would get an answer that could be highly influenced by the spammed beliefs, even though the question might not be directly related to the direct expression of these beliefs. Even in the Paris Hilton case, you could imagine (and, here again, I am not making any accusation, just suggesting potential scenarii) that her agent or financial backers could spam the Web with coolness-related facts so that producers want to start new projects with her, believing that there is a market for that.

    For better or worse, people will be increasingly reliant on the Web to get answers to any question they have. Thus, I still maintain that more and more Web facts will be certified in the future, because people will start to request it. You could imagine that you would set your trust preferences to include any organization affiliated with your political and religious views, for instance. Then, when you ask, say, what is a good movie to download for your children, it would give much more weight to facts certified by the authorities you trust, and in general much less to uncertified facts (and maybe even less to facts certified by authorities you distrust). This is why authentication is critical too.

    Just my 2 cents :-).

    Posted by: Jean-Michel Decombe | September 20, 2007 2:06 AM



  12. "... XML-based languages RDF and OWL"

    Nitpick: RDF and OWL aren't XML-based in any sense. RDF has an XML serialization but there are a number of newer, terse serializations which are much better IMHO.

    Posted by: Chris | September 20, 2007 2:08 AM



  13. @NitinK

    The NLP problem is that to do automated conversion / annotation of existing information you need to build NLP.

    Re: Standards and mapping OWL ontologies. There is no difference between having a standard or having a 3rd party that normalizes and maps things.

    Posted by: Alex Iskold | September 20, 2007 5:33 AM



  14. @Jean-Michel: I have heard this argument a lot - but I don't see why the trust argument is any worse for a Semantic Web-based solution than for a PageRank-based solution; either way, people try to game the system, and as Adam Peled pointed out above, nothing in this world is black or white. Wouldn't a WoC approach sort out the wheat from the chaff in a majority of the cases (exactly as Google does)?

    @Alex: Points well-taken.

    Posted by: NitinK | September 20, 2007 6:39 AM



  15. Great post Alex. Time to create a new trend..."Stack Mashups". With all the open APIs their is a great opportunity to integrate multiple 'stacks' to create a compelling web service.

    Posted by: John Furrier | September 20, 2007 6:49 AM



  16. Jean-Michel,

    you've got the point there but still, isn't what you want bit too much idealistic? If you pull your example out of the semantic web and place it in real life you will see that it is actually very realistic. Paris Hilton's managers, governments, etc. can already influence your opinion using television, newspapers etc. In the end it is always up to us, end user to separate spam from useful resources.

    I just see it as the way life works. You could argue that we need to progress, entire humanity, but I say isn't faster and generally better information availability progressive enough?

    Posted by: Ivan Plestina | September 20, 2007 6:50 AM



  17. This was an excellent article and elaborated on a lot of the questions I had about the viability of the semantic web.

    One thing that makes me question the possibility of a semantic web is the current state of the comparison shopping industry. This industry is all about defining and categorizing information so that computers can read it and give people the answers they need (what's the best thing for me to buy and where should I buy it?). However, this industry is also a ginormous mess.

    As you know, there are quite a lot of sites both big and small that list products and compare prices (Shopping.com, PriceGrabber, NexTag, Shopzilla, etc.). A group composed of the largest of these sites has been (supposedly) working on some kind of standard for the merchant data feeds that provide all the product information for their comparisons. This has been in the works for several years now with no success despite the fact that a standardized feed structure would mean more business for all of them because currently merchants must generate a unique feed for each site which puts a limit on the number of sites that merchants have time and resources to work with.

    The complications that arise when seeking to label and categorize even the paltry amount of information that exists in the sphere of consumer products are significant. For example, not only do all of these sites use different headers/fields/filetypes for their data feeds, they also classify products differently. While NexTag might have something like 31,000 product categories, Yahoo might have 1,600 for the exact same set of products.

    Furthermore, even the most comprehensive category systems are severely lacking because it is very difficult to asses how specific the categories really need to be in order to make the site accurate/efficient and easy to navigate. In addition, it is impossible to stay on top of all the new products coming out that may require a new category. Take clothing for example. A merchant might have men's, women's, and unisex clothing items. However, most of the shopping sites hadn't considered unisex clothing and don't have categories for it. That means that it goes into 'other' or 'men's' or wherever else the merchant might see fit to put it. This results in mis-categorized items that aren't able to be properly compared or even found by consumers.

    I could go on about poor data quality, inconsistent and capricious product naming conventions, the great defilers of data that are legacy ERP systems, and on and on, but I think you get the point. Because of my job, I have to deal with this mess fairly regularly. Having seen the horror that is shopping comparison, I must conclude that this sort of endeavor is ludicrously easier to say than it is to do.

    Posted by: Jason Carr | September 20, 2007 9:09 AM



  18. @Nitink:

    The way I see it, with the Semantic Web, we are getting closer to an instant and natural language formulation of questions (manually, by the user) and answers (automatically, by the Web), thanks to its foundations (predicates, ontologies, etc.). So it is going to feel much more like the truth, even if it is not, because of the way it is delivered. People will have to be more vigilant is all I am saying (but I can agree that 99% of them may not give a damn). As you know, Google and other familiar search engines are quite primitive in comparison. They are more like search assistants than "knowledge" experts. You type keywords and you get a list of likely relevant sites. Then, you are on your own to formulate the answer to your original question yourself, which is really hard as we all know if the question is nontrivial.

    Well, I guess I am mostly fascinated by the issues of trust and truth when it comes to the Semantic Web, because the Web is going to start "sounding" a lot more human in the future, if not suprahuman.

    Posted by: Jean-Michel Decombe | September 20, 2007 9:53 AM



  19. Excellent article. I would say that the question is not can we actually put everything on the net under the Semantic Web, rather what areas can it actually make a difference. In that regard I think it is likely to be useful in specific domains.

    Medicaid is an area I am quite familiar with. The Federal Centers for Medicare and Medicaid Systems has created what they call the Medicaid Information Technology Architecture which specifies around 80 different business processes broken out into 8 categories. Those business processes can be specified in a formal language or in an ontology and particular implementations of a process can be further refined. If this is done (I hope within the next 10 years) then we could have a situation where a policy analyst could formulate new policy requirements in a way that could be translated into a formal set of specifications (in OWL lets say). These requirements could be used to query the existing system to see if it can accommodate them or not. If no, a set of requirements that need to be met could in theory be created. Then a search for existing modules or specs for new modules could be created far more easily than is the case now.

    I am sure there are other specific domains that could benefit from semantic technology. In general, I a skeptical of any "universal" technology, but find that many of these grandiose visions can have real value when restricted to appropriate application domains.

    Again, thanks a lot for an excellent and thought provoking article.

    Posted by: Ivan Handler | September 20, 2007 10:23 AM



  20. @Ivan:

    You're right, definitely the idealistic type :-). I was in fact about to say that we need to progress from the current state of things, but you beat me to it. Point well taken anyway.

    Posted by: Jean-Michel Decombe | September 20, 2007 10:26 AM



  21. Alex,

    I have a long response to this post. The key is that The realization of Semantic Web is an evolutionary event but not just a goal attempt.

    -- Yihong

    Posted by: Yihong Ding | September 20, 2007 11:01 AM



  22. For any readers interested, I've rounded-up all the recent posts regarding people's ideas of what the Semantic Web is and is not.

    Posted by: James | September 20, 2007 1:04 PM



  23. I agree with Jean-Michel: it's about trust. But a reader only gets to the problem with understanding the information source's rhetorical objectives if the information source has enough trust in the reader to release it in the first place. The example from TBL's 10-year-old article--you met someone you want to contact, but all you remember is a the person's company, last name, and the fact that their son went to XYZ university--assumes that the company and the university are happy letting just anyone prance through their personnel and student records. Not going to happen. The data you can see is the data the sources want you to see. Not a new problem, but a significant limitation on the great brain.

    Posted by: Jay | September 20, 2007 1:51 PM



  24. Excellent post Alex--It definitely clears up a lot of ambiguities.

    I have one question that will probably sound rather childish, but I've never read any clear answer to it, so it can't hurt to ask. Is there any clearly defined vision for how the semantic web will be queried?

    I always hear talk about "imagine being able to ask the semantic web about what is a good university for a student that likes to surf in water above 70 degrees?" (As an example in any case...)

    So my question is, in all this talk about giving meaning to web documents, has anyone tackled the issue of how these "hooks" will actually be queried? Is the goal to be able to type a natural language sentence into a search bar and hit the "get results" button? Because... wouldn't the computer still have no way of parsing the question to provide useful answers (assuming the semantic web exists and all documents have been marked up with meaning.)

    If this has already been covered somewhere, please point me to some useful reading. Thank in advance!

    Posted by: Benjamin DiGregorio | September 20, 2007 2:52 PM



  25. @22 My understanding is that it would be like querying(traversing) a graph. You can also imagine SQL like interface or even some form of natural language (perhaps with a bit stricter grammar).

    Posted by: Alex Iskold | September 20, 2007 4:31 PM



  26. > Is the goal to be able to type a natural language sentence into a search bar and hit the "get results" button?

    No, it is not. That is a common misrepresentation of the Semantic Web, and would indeed be a fool's errand.

    > I always hear talk about "imagine being able to ask the semantic web about what is a good university for a student that likes to surf in water above 70 degrees?" (As an example in any case...)

    This is actually an excellent example. "Universities" and "good surfing beaches" are two data sets that are are likely to be published (by the universities themselves, and by surfing enthusiasts), but aren't likely to be combined unless it's very easy.

    The point of the Semantic Web is to make it very easy. Your query in SPARQL would look something like this:

    SELECT ?university
    WHERE {
    ?university rdf:type edu:University .
    ?university go:within_3_power_10_metres ?beach .
    ?beach rdf:type surf:Beach .
    ?beach surf:temperature ?temperature .
    FILTER ( ?temperature > 70 )
    }

    (I've simplified this, omitting namespace declarations and what units of temperature to use, but you see the idea)

    Take this query, a dataset with a list of universities and their locations, a dataset with a list of surfing beaches, their locations and water temperatures and a SPARQL processor and your answer falls out.

    Nobody's expecting Joe User to type this into a search engine, but it's easy enough that Joe Mashupper could make a site or application to do it.

    The tech is solid, and exists today; all we need is the data.

    Posted by: Brendan Taylor | September 20, 2007 5:00 PM



  27. I take issue with several things in this article. Firstly, with the statement that RDF is difficult to understand. The image used to demonstrate this is indeed incomprehensible, but it's hardly representative of any RDF that would be used in the real world. In fact, I'd say that RDF is significantly easier to understand than XML or even the relational data model. It's not like people are born knowing how to understand a well-normalised database.

    But my main problem is with the section "The Natural Language Problem". Worry about metadata later; there is a huge amount of machine-readable data in databases that is only being exposed in a human-readable form. Adding a machine-readable version is trivial and would be a huge win. Integrating it and settling on standards will be tricky, but we'll cross that bridge when we come to it.

    Posted by: Brendan Taylor | September 20, 2007 5:24 PM



  28. The Semantic Web is difficult, but real. Like many visions, its hype inflated expectations. Oddly, it has seemed to get a "second chance." That is to say, it survived the so-called "trough of disillusionment" and reverted back to an *earlier* stage in the [Gartner] hype curve. More to the perhaps, there exist many APIs, COTS, OSS and other solutions built around semantic concepts/standards. (See www.semwebcentral.org for OSS projects, and www.siderean.com for a robust COTS semantic repository.)

    BTW, the "Semantic Web Layer Cake" in this (great) article has been updated; see http://www.w3.org/2007/03/layerCake.png.

    Looking forward to the Top down article!

    Posted by: Sam | September 20, 2007 6:17 PM



  29. Let me just say...

    http://www.freebase.com

    and I'll just step aside now..

    Posted by: tyler | September 20, 2007 11:22 PM



  30. semantic web is not carrot, so it is bad.

    Posted by: tutu de kuku | September 21, 2007 12:21 AM



  31. i'm not sure if semantic web ever will work out. there was already so much research and results are close to zero.

    Posted by: Peter P | September 21, 2007 5:04 AM



  32. Here are some Semantic Data Web Links that hopefully demonstrate practical use today:

    1. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
    2. http://dbpedia.org
    3. http://pingthesemanticweb.com
    4. http://sindice.com

    Posted by: Kingsley Idehen | September 21, 2007 5:09 AM



  33. Here are some Semantic Data Web Links that hopefully demonstrate practical use today:

    1. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
    2. http://dbpedia.org
    3. http://pingthesemanticweb.com
    4. http://sindice.com

    Posted by: Kingsley Idehen | September 21, 2007 5:42 AM



  34. Jean-Michel‚Äôs goal of ‚Äúinstant and natural language formulation of questions is available today. Potentially over a cell phone - if a quality voice-text-voice module could be added to the ‚Äúsemantic NLP‚Ä? site located at Boston Children‚Äôs Hospital‚Äôs ‚ÄúCenter on Media and Child Health.‚Ä? www.cmch.tv/research/. Domain Ontology-referenced NLP eliminates the need for ‚Äúquery structuring‚Ä? by a user. Queries excel with lots of ‚Äúcontext‚Ä? in unrestricted conversational style. This site provides fit-to-context‚Äôs concepts ranking of all results from ten different ‚Äúsocial science‚Ä? professional silos. Try some domain relevant questions like: What is the impact of the media on adolescent sexual attitudes and behaviors? Or, Can parents prevent children from experiencing unwanted effects of violent television?

    Posted by: Michael Belanger | September 21, 2007 10:11 AM



  35. A radically new way to process information :

    http://homepage.mac.com/ricatact/HIPintro/en/ACT.html


    and give it a semantic sense :

    http://homepage.mac.com/ricatact/HIPintro/en/semantique.html

    Posted by: Richard Chappuis | September 22, 2007 8:08 AM



  36. Alex, I enjoyed the post as well as its sequel. That said, the passage about Goedel and incompleteness needs correcting:

    "No logical system can ever be both consistent and complete, which means that there are things that can not be proved by logic. That essentially means that not all problems can be solved."

    This is certainly not true and Goedel never proved any such thing. Case in point, OWL-DL, which is clearly a logical system, is complete (all conclusions are guaranteed to be computed), decidable (all computations will finish in finite time), and possibly consistent.

    It would be more accurate to say that Goedel proved that some logical systems can never be proven to be both consistent and complete. For more details, see:

    http://plato.stanford.edu/entries/goedel/

    Posted by: Lowell Vizenor | October 3, 2007 6:19 AM



  37. @ Lowell, you are right. Some sufficiently complex, where complex means beyond Nth order logic.

    The point is that human languages and human brains are such systems. I have further example on this post on BlueBlog:

    http://blog.adaptiveblue.com/?p=618

    Posted by: Alex Iskold | October 3, 2007 6:31 AM



The ReadWrite Real-Time Web Summit
RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS