ReadWriteWeb

Everything You Wanted to Know About Semantic Technology, But Were Afraid to Ask (at SemTech 09)

Written by RWW Sponsor / June 26, 2009 5:00 AM / 11 Comments

Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.

Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.

Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:

Structured Data

Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.

Morphology

If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax

A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.

Statistics

An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.

Scalability

Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.

These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. So how does one optimize (or not optimize) for semantic search? For some terms I have top positions on Google but my website appears nowhere on Hakia (http://hakia.com/). If I want Hakia users to be able to find my website, do I need to re-orient my content? Just curious, because I know semantic search is going to be more relevant.

    Posted by: Amrit Hallan - Writing Services Provided | June 26, 2009 7:03 AM



  2. The statement on statistics seems silly to me. The use of probabilities must be inherent to semantic technology; otherwise, machines wouldn't know that you can bet your fortune on the fact that a million monkeys typing for a million years will NOT come up with the declaration of independence!

    Posted by: Eric Hellman | June 26, 2009 8:42 AM



  3. Dear Amrit. Semantic search engines will look for content richness, integrity, and source's credibility. Similar to the way link referrals used today, if your page was refenced by credible sites, then it would rank higher. Those credible sites (depending on your industry) are well known, and are also defined by librarians. Major news channels is one potential credible source to make referrals to you. If Alexa.com makes a referral to you, eventhough it is a very popular site, will not increase the credibility of your content compared to if CNN.com makes a referral to your page.

    Posted by: Riza Berkan | June 26, 2009 2:34 PM



  4. This is an interesting article-- for now I would just like to disagree with the sentence "Statistics have no place in semantic technology." You may argue that statistics or lexical analysis or other enabling techniques/technologies by themselves do not make a system semantic, but to say these have no role is misguided or at least too storng. Semantics can be derived by a bottom up process (eg analyzing a corpora) leading to what one may call implicit/embedded semantics (this is what happens when you extract entities and relationships), by a top down process (defining a model or ontology a priori, with the help of a formal description and reasoning), and so on. Here is a pointer to additional information on this view point: Semantics for The Semantic Web: the Implicit, the Formal and the Powerful.

     Posted by: Amit Author Profile Page | June 27, 2009 8:09 AM



  5. Hi Amit. Thanks for the comment.

    "There is no place for statistics in semantics",.. let me rephrase it: ""There is no place for statistics in meaning."

    The example I gave above shows that natural languages have an infinite (for all practical purposes) degrees of freedom in expressing relationships to form meanings. A semantic system can handle them systematically via concept (ontology) based approach to the extent of its sophistication, whereas a statistical approach cannot handle them because (for all practical purposes) there will be no data, ever.

    If the human brain can understand it (extract meaning) without prior sampling, then statistics should have no place in this process.

    There is more to it than the sheer data availability. For example, the sentence "Polar bears don't eat alligator eggs before dawn" is absurd to the human brain because of complex associations referring to vast amount of priori knowledge, but also it can be logical or illogical, it can be funny, ect.

    Hope it is clearer now what I meant.

    Posted by: riza | June 27, 2009 12:45 PM



  6. Nice Post, Thanks a Lot. This was exactly what i was looking for after reading your previous post "Why the Web 3.0 Conference Was a Success" LoL.

    However, i'm still afraid to ask this question:

    So, How does one convert their normal website or blog to include semantic capability.


    Raja
    http://HostWisely.com

    Posted by: HostWisely.com - Web Hosting Reviews | July 1, 2009 2:31 AM



  7. All these companies talking about doing brand monitoring; I’d be interested to know what the technology behind the solution is. For example, for brand monitoring, conversation monitoring and analysis to be remotely accurate the technology used must be able to do several things: (1) understanding word meaning (think of a Semantic Map as a giant dictionary), (2) ability to disambiguate word meanings within the context of how they are used; and (3) have an understanding of synonymy.

    Posted by: DJ | July 1, 2009 2:38 AM



  8. Nice post Riza!

    Posted by: DJ | July 1, 2009 2:40 AM



  9. Excellent post. For once debunks many of the 'me too' semantic web services.

    And I would take a half-way stance between comment by Amit and Reza on relevance of statistics in semantics. Though statistical analysis can't cover the whole gamut of meanings possible by human language, it so happens that a sufficiently large sample data can provide sufficiently relevant semantics.

    Of course statistics would be blind to aspects not covered by the sample data set.

    Consider the case of Google Translate, their claim of using a statistics enabled translation engine is an interesting example of semantic services. Something as subjective as language translation being pulled off by rules inferred from raw data. Reza, what do you think about that?

    Posted by: Mahesh CR | July 1, 2009 4:12 AM



  10. Good post! Educating and breaking semantics down into themes than can easily be digested and studied independantly is a step in the right direction. While there is much research and experimentation still to be done, let's not forget the community must also evangelize the technology to the public and business community at large.

    Posted by: Fabien Tiburce | July 1, 2009 8:31 AM



  11. There is no clear definition for semantic search, But there is a clear definition for the Semantic Web. (which is based on Structure Data)

    why not wait with the definitions until someone actually accomplish any of these promises...

    Posted by: Tim | July 9, 2009 12:20 PM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  

If you think Twitter is big, check out the Real-Time Web
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel



TEXT LINK ADS



RWW PARTNERS