<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2009://1.15511-</id>
  <updated>2009-11-23T16:51:12Z</updated>
  <title>Comments for Search and Rescue: 6 Approaches to Semantic Data Collection</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15511</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=15511" title="Search and Rescue: 6 Approaches to Semantic Data Collection" />
    <published>2009-06-25T22:45:41Z</published>
    <updated>2009-06-26T04:10:11Z</updated>
    <title>Search and Rescue: 6 Approaches to Semantic Data Collection</title>
    <summary>It&apos;s been more than ten years since Tim Berners-Lee first spoke about the semantic web and computers indexing all web-based data. He said, &quot;The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The &apos;intelligent agents&apos; people have touted for ages will finally materialize.&quot; Since then a...</summary>
    <author>
      <name>Dana Oshiro</name>
      
    </author>
    
    <category term="List of Links" />
    
    <category term="Semantic Web" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img alt="semantic_search_logo_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_logo_jun09.jpg" width="150" height="150">It's been more than ten years since Tim Berners-Lee first spoke about the semantic web and computers indexing all web-based data. He said, "The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The 'intelligent agents' people have touted for ages will finally materialize." Since then a handful of companies have attempted to tackle the issue of machine-based indexing and language interpretation. None of them are perfect. Below are 6 unique approaches to semantic data collection.</p>]]>
      <![CDATA[<h2>1. <a href="http://powerset.com">Powerset</a></h2>
<img alt="semantic_search_bing_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_bing_jun09.jpg" width="610" height="237">
This site was one of the first to publicly apply machine-based natural language processing to a consumer search engine. Nevertheless, because public expectations were so high, when Powerset launched a Wikipedia-only beta, <a href="http://www.readwriteweb.com/archives/powerset_vs_google.php">reviewers were harsh.</a> The site was acquired by Microsoft shortly after the initial launch and the team has been low key ever since.  While Powerset is one of the definitive semantic engines in existence, Microsoft is currently concentrating on using Powerset's technology to index Wikipedia pages in Bing. Powerset's search result pages actually contain a "Try this on Bing Reference" note in the sidebar of the site.  
<br />
<br />
<h2>2. <a href="http://www.cuil.com/">Cuil</a></h2>
<img alt="semantic_search_cuil_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_cuil_jun09.jpg" width="610" height="210">
This team touted its language processing product as being much faster to index pages than Google; however, consumers rarely covet speed over quality and the site <a href="http://www.readwriteweb.com/archives/cuil_publicity.php">was criticized right from the start</a>. Expectations were not met as Cuil's claim to 120 billion pages indexed did not match up to the results on <a href="http://www.readwriteweb.com/archives/google_hits_one_trillion_pages.php">Google's reported 1 trillion unique URLs.</a> However, what Cuil did right was separate related search results from regular web results. That being said, without any human intervention, the related results are often bizarre and irrelevant. For instance, my name produces the rankings of Ultimate Fighting Challenge Champions. 
<br />
<br />
<h2>3. <a href="http://hakia.com">Hakia</a></h2>
<img alt="semantic_search_hakia_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_hakia_jun09.jpg" width="610" height="282">
This is a natural language search engine where sponsored results, regular web results and "credible" web results are broken down visually into separate categories. Similar to Wikipedia, Hakia <a href="http://www.readwriteweb.com/archives/hakia_relaunches_with_credible.php">employs a community monitoring system for credibility</a> and "credible" results must be peer reviewed and seemingly free of corporate interest. One of the great features of Hakia is that users can tab over the site to show only images or news. 
<br />
<br />
<h2>4. <a href="http://worio.com/search/">Worio</a></h2>
<img alt="semantic_search_worio_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_worio_jun09.jpg" width="610" height="282">
Worio is considered a "discovery engine" as it is not technically a search engine destination site. While users are still required to visit the <a href="http://worio.com/search/">Worio destination</a>, search is actually powered by Yahoo, Google or Windows Live search. Regular web results appear in the larger left-side column and natural language-based "discoveries" appear on the right. These discoveries are further refined by personal bookmarks and shared relevancy with Facebook friends. 
<br />
<br />
<h2>5. <a href="http://labs.mozilla.com/2008/08/introducing-ubiquity/">Ubiquity</a></h2>
<object width="400" height="298"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=1561578&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=1561578&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="298"></embed></object><p><a href="http://vimeo.com/1561578">Ubiquity for Firefox</a> from <a href="http://vimeo.com/user532161">Aza Raskin</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
Ubiquity is perhaps the opposite of a semantic web engine, but it serves a similar function for those looking to aggregate useful data. The Firefox plugin allows users to create command lines that incorporate natural language search with a series of mashups. Users can then combine relevant data from Craigslist, translation tools, maps, reviews  and social networks for easy user visualization. While the end product is an extremely useful document, users may not be ready for the drastic behavioral change of using command lines for semantic data collection. 
<br />
<br />
<h2>6. <a href="http://www.semanti.com/">Semanti</a></h2>
<img alt="semantic_search_semanti_jun09.jpg" src="http://www.readwriteweb.com/semantic_search_semanti_jun09.jpg" width="610" height="229">
From a consumer standpoint, Semanti sits somewhere on the spectrum between Worio and Ubiquity. ReadWriteWeb <a href="http://www.readwriteweb.com/readwritestart/2009/06/semantic-search-engine-gets-he.php">reviewed the product earlier this week</a> and like Ubiquity it is a Firefox plug-in rather than a destination site. However, like Worio, it employs leading search engines, bookmarking and Facebook friends to produce results. Semanti's key difference is that it prompts users to choose from multiple definitions prior to completing the search. Decision-making is actually human-powered rather than machine-powered. CEO, Bruce Johnson, said, "I tried machine-based semantic tagging, but my priority has always been a faster search experience." While this is not the "use of intelligent agents" that Berners-Lee suggested, it is a "semantic" tool in that it helps the user distill meaning and relevancy from language.

<p>If you've got more examples of semantic data collection tools, list them in the comments below. </p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15511-comment:144115</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15511" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php#c144115" />
    <title>Comment from Elliot on 2009-06-25</title>
    <author>
        <name>Elliot</name>
        <uri>http://blog.alchemyapi.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.alchemyapi.com/">
        <![CDATA[<p>Good rundown of available semantic search engines.  I'm interested in seeing how Swingly (a yet-to-be-launched semantic search / question answering engine) matches up against some of these existing players.  We're slowly starting to see PowerSet functionality deployed into Bing but I'm guessing MSFT has some other exciting developments in the wings</p>]]>
    </content>
    <published>2009-06-25T23:58:33Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15511-comment:144117</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15511" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php#c144117" />
    <title>Comment from Ravikant on 2009-06-25</title>
    <author>
        <name>Ravikant</name>
        <uri>http://cherukuri.wordpress.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://cherukuri.wordpress.com">
        <![CDATA[<p>Nice summary Dana. I like powerset and ubiquity. Havent tried the rest. Not sure if these use the linked data. We need way more semantic annotation on regular web pages to make this useful at scale. Micro formats, Common Tag, Linked Data all should get more tool support. Both for new content and tonnes of existing content. </p>]]>
    </content>
    <published>2009-06-26T00:18:39Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15511-comment:144203</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15511" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/search_and_rescue_6_approaches_to_semantic_data_collection.php#c144203" />
    <title>Comment from Govind Kabra on 2009-06-26</title>
    <author>
        <name>Govind Kabra</name>
        <uri>http://www.cazoodle.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.cazoodle.com">
        <![CDATA[<p>Interesting Summary. </p>

<p>Web truly has huge amounts of data---we just need to better tools like you listed to make better use of this content.</p>

<p>One possible approach is to understand the structured data hidden in unstructured web content. At Cazoodle, we saw great interest from users after we organically integrated apartment data from every where on the Web.</p>

<p><a href="http://www.cazoodle.com/blog/?p=272" rel="nofollow">http://www.cazoodle.com/blog/?p=272</a> </p>]]>
    </content>
    <published>2009-06-26T19:22:28Z</published>
  </entry>

</feed>