<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2008:/1/tag:72.47.210.69,2006://1.5225-</id>
  <updated>2008-05-09T18:17:32Z</updated>
  <title>Comments for ClearForest: a Top-Down Approach to Semantic Web</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.1</generator>
  <entry>
    <id>tag:72.47.210.69,2006://1.5225</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=5225" title="ClearForest: a Top-Down Approach to Semantic Web" />
    <published>2006-12-22T02:06:58Z</published>
    <updated>2007-12-16T23:16:40Z</updated>
    <title>ClearForest: a Top-Down Approach to Semantic Web</title>
    <summary>By Alex Iskold We&apos;ve been writing recently about the rise of semantic web and how in 2007 we&apos;ll see many interesting semantic technologies. The fundamental problem that all these technologies need to solve is explaining the meaning of things to computers. There are several approaches to this, all of which in principle can work. There...</summary>
    <author>
      <name>Alex Iskold</name>
      <uri>http://www.adaptiveblue.com</uri>
    </author>
    
    <category term="Semantic Web" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><i>By Alex Iskold</i></p>
<p><a href="http://www.clearforest.com"><img
src="http://www.readwriteweb.com/images/clearforest_logo.gif" align="left" vspace="5"
hspace="5" border="0" width="316" height="50" /></a>We've been <a
href="http://www.readwriteweb.com/archives/semantic_web_road.php">writing recently</a>
about the rise of semantic web and how in 2007 we'll see many interesting semantic
technologies. The fundamental problem that all these technologies need to solve is
explaining the meaning of things to computers. There are several approaches to this, all
of which in principle can work.</p>

<p>There are companies and technologies that are doing it bottom up - by embedding
semantical annotations (meta-data) right into the data. The opposite camp is exploring
the top-down approach, which relies on analyzing existing information. The ultimate
top-down solution would be a fully blown natural language processor, which is able to
understand text like people do.</p>

<p>In this post, we are going to look at <a
href="http://www.clearforest.com">ClearForest</a> - one of the companies in the top-down
camp. At first glance, you might not think much of the company's web site, but a deeper
dive reveals that ClearForest is restructuring - to apply its core natural language
processing technology to facilitate next generation semantic applications. The fact that
ClearForest has released both a Web Service and a Firefox extension that leverages an API
to deliver the end-user application, says that the company gets what the next generation
web is all about.</p>]]>
      <![CDATA[<h2>Gnosis - Firefox extension for annotating web pages with semantics</h2>

<p>The first Clear Forest product that we looked at was the Firefox extension called <a
href="https://addons.mozilla.org/firefox/3999/">Gnosis</a>. Here is how it is described
on the Mozilla extensions page:</p>

<blockquote><p>"With a single click, Gnosis will identify the people, companies,
organizations, geographies and products on the page you are viewing. Using the built-in
navigation sidebar you can gain immediate understanding of the page&rsquo;s
contents."</p></blockquote>

<p>Downloading and installing Gnosis was as easy as any Firefox add-on. We used the
Read/WriteWeb home page to try the extension. With one click from the menu, the page was
filled with various types of annotations. The current version of Gnosis recognized
Companies, Countries, Industry Terms, Organizations, People, Products and Technologies -
an impressive range of things. Each word that Gnosis recognized, got colored according to
the category.</p>

<p><img src="http://www.readwriteweb.com/images/clearforest1.jpg" border="0" /></p>

<p>This was interesting, but overwhelming. A better approach would be to have the
coloring appear on a mouse over or another gesture. But this is a usability nuance that
will get polished in the next iteration on the product. Overall, I was impressed. At an
instance, the page was analyzed and annotated. It was not perfect (it thoughts that all
the Jasons on the page were Jason Briggs), but it was more accurate than I expected it to
be.</p>

<p>Next I turned my attention to the sidebar. The extension created a categorised tree of
all words and phrases that it found on the page. We could expand and collapse each
category to find the terms. It looked like vertical search for a single page. It was
interesting and is very useful for blogs and lengthy pages.</p>

<p><img border="0" src="http://www.readwriteweb.com/images/clearforest2.jpg" width="500"
height="398" /></p>

<p>Again, the interface needs to evolve - but the idea that key terms and concepts on any
page can be identified and organized in such a way seems compelling. In addition to the
organization, the extension offered to search for any keyword on Google, Wikipedia or
Technorati. If you are interested in a keyword, you are likely to want to find more
related information. So the context search seems like a logical extension of
categorisation, as it makes this data further searchable.</p>

<p>Overall, this seemed unpolished but intriguing. The question is, how does this work?
The Firefox page stated that this extension is based on a web service. So this is what I
want to explore next...</p>

<h2>ClearForests&rsquo;s Semantic Web Service (SWS)</h2>

<p>Behind every great service there in an API. Modern web companies have re-discovered an
old software engineering wisdom - interfaces are a powerful way to build complex
software. Today we are seeing the rise of the most complex software system yet - a
service powered web. ClearForest has also recognized the value (both can be monetized
independently) of building a product on top of a service. Gnosis leverages the interface
to offer a powerful natural language processing service.</p>

<p>The Semantic Web Service (perhaps the name is a bit broad) offers the SOAP interface
for analyzing text, documents and web pages. The service returns the categorization and
annotation information which can be further leveraged by consumer facing applications
(the company recommends building mashups). I am fairly certain that SWS is powered by a
web crawler, because it is able to recognize people like Richard MacManus, Jason Biggs
and Alex Iskold. My guess is that the crawler is used to build a giant index, that is
then used by the document parser to annotate the terms in the document.</p>

<p>The service right now is free to try, but you need to contact ClearForest to use it
commercially. To encourage the usage of the service the company announced a mashup
contest. The contest was advertised on ProgrammableWeb and ended December 11th. It is not
clear to me that it was successful, as there are no announcements of winners and no
showcase - but it certainly seems like a creative way to promote the new API.</p>

<h2>Conclusion</h2>

<p>Clear Forest might not have a glamorous/Ajaxy web site and might not have a polished
product yet. But it is a company that has been around and has been <a
href="http://www.clearforest.com/AboutUs/AboutUs.asp">backed</a> by top tier VC firms.
Both the approach and technology are worth attention and consideration. Their natural
language processing technology, first applied to business data mining, is able to clearly
distill useful information. To offer it as a service shows the insight and the
understanding of the new market opportunities (think Amazon). And to create a Firefox
extension that showcases the technology demonstrates their desire and the readiness to go
mainstream.&nbsp;</p>

<p>All these factors indicate that Clear Forest is worth watching. And it is yet another
brick to support the top-down semantic web approaches. Let us know what you think about
this company.</p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41567</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41567" />
    <title>Comment from John Milan on 2006-12-21</title>
    <author>
        <name>John Milan</name>
        <uri>http://intelligantt.blogspot.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://intelligantt.blogspot.com">
        <![CDATA[<p>It's great that so many people are striving to reach a true semantic web, both from the bottom up and the top down.</p>

<p>I do have a question though: how will we know when we get there?</p>

<p>Will we need some sort of 'Semantic Test', much like the Turing Test, to verify a semantic solution. Or do we really on the most number of people telling us this is a great semantic solution. Or would the semantic solution that work in the US not work in England or New Zealand-- ie. each region, or even culture may have their semantic solution.</p>

<p>As everyone strives for what is acknowledged to be a holy grail, the question is not only how do we know when we get there, but how do we know when we're even making progress?</p>

<p>Perhaps just like the Turing Test, a Berners-Lee Test is needed to both define a goal and measure progress.</p>]]>
    </content>
    <published>2006-12-22T06:35:01Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41568</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41568" />
    <title>Comment from Michael Molin on 2006-12-21</title>
    <author>
        <name>Michael Molin</name>
        <uri>http://geocities.com/gene_technics</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://geocities.com/gene_technics">
        <![CDATA[<p><i>>>There are companies and technologies that are doing it bottom up - by embedding semantical annotations (meta-data) right into the data. The opposite camp is exploring the top-down approach, which relies on analyzing existing information. The ultimate top-down solution would be a fully blown natural language processor, which is able to understand text like people do.</i></p>

<p>It's endless loop. There is no way to explain the meaning to the machine except the case when the machine interprets the meaning between the two languages. It's called Translation Matrix or just the Matrix if all the languages included in this system.</p>

<p>The implementation is simple. There is a company in Spain - Atril. It has a product - DejaVu 3. The main function is a pretranslation by assembling meanings and phrases. The translation project has an attribite - Subject. You choose the Subject and the system chooses the corresponding meaning during assembling. By default, there is an hierarchical classification system up to 999 subjects. This approach is completely wrong - the DejaVu system has the subsystem Lexicon that is responsible for specilized terminology (it's a glossary of a translation project) and has a priority during assembling - so the Lexicon is used for fine tuning of corresponding meanings for the project.</p>

<p>I did a simple thing. I divided the subjects into only two categories: General - 1 (it's a default setting for a project, it's used during assembling first) and Options - 2. It's a binary coding of meanings. In general terms, binary coding of information. My native language is Russian - so I gathered the Russian translations of English words and phrases together into a general terminology database (.tdb).</p>

<p>I've been working with this system for three years. It pretranslates any English literary texts. Exactly on the level of a three-years old child. That's enough to say it's an artificial intelligence.</p>]]>
    </content>
    <published>2006-12-22T07:58:44Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41569</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41569" />
    <title>Comment from Alex Iskold on 2006-12-22</title>
    <author>
        <name>Alex Iskold</name>
        <uri>http://www.adaptiveblue.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.adaptiveblue.com">
        <![CDATA[<p>Excellent comments. I personally do not think that Semantic Web needs to be equal strong AI. We do not have to explain meaning to machine, instead we encode the meaning so that machine can take advantage and drive productivity. Thats what I think.</p>

<p>I spend 8 years of my life studying complexity science and I do not think that intelligence can be designed, it needs to be evolved.</p>

<p>Alex</p>]]>
    </content>
    <published>2006-12-22T14:40:25Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41570</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41570" />
    <title>Comment from Paul Walsh on 2006-12-22</title>
    <author>
        <name>Paul Walsh</name>
        <uri>http://segala.com/searchthresher_wp/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://segala.com/searchthresher_wp/">
        <![CDATA[<p>It‚Äôs great to see so many people leverage the popularity of the term Semantic Web.</p>

<p>The Semantic Web though, *is* all about the plumbing of the Web. It's data about the data and nothing more. That‚Äôs not to say that cool applications shouldn‚Äôt be class as enabling a more Semantic.</p>]]>
    </content>
    <published>2006-12-22T23:33:53Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41571</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41571" />
    <title>Comment from Alex Iskold on 2006-12-22</title>
    <author>
        <name>Alex Iskold</name>
        <uri>http://www.adaptiveblue.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.adaptiveblue.com">
        <![CDATA[<p>Paul,</p>

<p>Certainly, this is it. If you look on wikipedia, it talks about weak AI. Do you think that data about data has to be embedded in the data or can it be inferred?</p>

<p>Alex</p>]]>
    </content>
    <published>2006-12-23T00:34:07Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41572</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41572" />
    <title>Comment from Thomas Tague on 2006-12-23</title>
    <author>
        <name>Thomas Tague</name>
        <uri>http://sws.clearforest.com/blog</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://sws.clearforest.com/blog">
        <![CDATA[<p>Alex, </p>

<p>Thanks for noticing what we're trying to do! </p>

<p>A quick note - the contest you mentioned wrapped up last week. A page with the winning entries and honorable mentions is located here <a href="http://sws.clearforest.com/Blog/?p=37." rel="nofollow"><a href="http://sws.clearforest.com/Blog/?p=37." rel="nofollow">http://sws.clearforest.com/Blog/?p=37.</a></a> Some very interesting work by some smart people.</p>]]>
    </content>
    <published>2006-12-23T11:56:46Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41573</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41573" />
    <title>Comment from Pamela Fox on 2006-12-23</title>
    <author>
        <name>Pamela Fox</name>
        <uri>http://imagine-it.org</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://imagine-it.org">
        <![CDATA[<p>Just wanted to offer a comment as one of the participants/winners in the ClearForest contest.<br />
Generally, I just think it's exciting to see an API offered for this type of service. Most APIs offered have been for accessing pre-existing data in an easy way, or creating data in a system. This API is actually doing a smart nontrivial service for us. I can easily screenscape Google Images for results without outside services, but I can't easily deduce the semantic entities in a Google News article by myself.<br />
I look forward to seeing more APIs released that offer interesting services like this.</p>]]>
    </content>
    <published>2006-12-24T00:39:26Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41574</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41574" />
    <title>Comment from Tim on 2006-12-29</title>
    <author>
        <name>Tim</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>this is great but a fraction of what is required for real information processing and sense making on the web which is incredibly difficult to do because of the primitiveness of the hypertext and one window browser paradigm that noone seems to want to break out of and is sorta hardcoded into browser technology. Ever tried iframe stacks, quick way to crash your browser, even a reasonable number of layers of DHTML windows has its limitations. Suck..</p>

<p>I'd much rather a snippet/bookmark linking tool + browser history and have that information editable (ala wiki but with depth and layers of windows that can be connected/chained together) so I can do some spatial and information reorganization. Ie Sense making..</p>

<p>Sucks, do a search on the web using google. Count how many clicks and moves on the mouse you are performing. A good search will take many hours and only then can you say with maybe 80-90% probability that you have covered said domain topic.</p>

<p>Put ajax to use and build a fluid search engine front end to google. Eg plug in search terms and the topics and pages start expanding out. let me tick or drag the pages i'm interested in (goddamn tab browsing is so 1 dimensional) to another area of the window (using larger canvas area than browser window). then let me start drilling down further and building and linking to build up my world view of the domain target of interest.</p>

<p>In my mind, whilst semantic and reasoning API's are great. the interface is still but ugly and being able to handle information load, or complex amounts of information will rely on much more complex UI than the 1 window tab browsing paradigm we have suffered with for many years.</p>

<p>Blah, rambles, tired.</p>

<p>TiM</p>]]>
    </content>
    <published>2006-12-29T10:24:41Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41575</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41575" />
    <title>Comment from Tim on 2006-12-29</title>
    <author>
        <name>Tim</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>ok one more rant, </p>

<p>I'm quite sure some clever developer is going to use Microsoft WPF and build a replacement browser tool. Essentially the HTML rendering will be done with IE but the interface will be done in WPF. Imagine a slick 2/3d massively multi windowed app that is a cross between a mindmapping tool and an HTML rendering component :)</p>

<p>Certainly would make it easier to research and track topics as well as share/collaborate. Why keep your notes and other information seperate from information you find on the web?</p>

<p>mmm global read/write web, christ just give me my own read/write web that I can 'rip,mix,share' with collegues. let me pull in information and consume it, ie diseminate, collect, track, etc.</p>

<p>ranty rant, ooh k i shut up now</p>

<p>TiM</p>]]>
    </content>
    <published>2006-12-29T10:45:44Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41576</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41576" />
    <title>Comment from Michael Molin on 2006-12-30</title>
    <author>
        <name>Michael Molin</name>
        <uri>http://geocities.com/gene_</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://geocities.com/gene_">
        <![CDATA[<p>Alex wrote:<br />
<i>I personally do not think that Semantic Web needs to be equal strong AI. We do not have to explain meaning to machine, instead we encode the meaning so that machine can take advantage and drive productivity.</i></p>

<p>Well, as I see the AI as a development process is a "man-made" competition between the human intelligence and mathematical logics of the machine. As I see the Semantic Web concept - it's a new level for HTML tags as metadata for better search results and making decisions on them.</p>

<p>As I wrote describing the Translation Matrix, the question is only in the number of the levels. Two is enough and effective - the original information and its tags. What we have now on the Internet, the more tags the better the search system recognize a Web site's goals. These particular goals are really not the objective meaning of the information presented on the Web site. It's advertising. And that's the top level of the hierarchy because that competition between the human psychology in a desire for success (paying for the placing your Web site into the Sponsor Links category) and the machine (as a search robot) is already won by human being by default. Politics is based on economics or science is based on business.</p>

<p>Political science is a decision-making process. So, the Internet is already semantic by man-made decisions.</p>

<p>Also, there is really a fundamental question in the comment of John Milan. About the tests for AI. I think that Turing's test based on the identities of the results made by the human being and the machine doesn't include the second part of the logics - to err is human, to forgive divine.</p>

<p>I can give an example from the work with the Translation Matrix. There is the main algorithm for assembling phrases - the longest source text phrase between the two overlapping source text phrases is chosen for translation. It's mathematical logics. Usually, the machine (taught gradually by human logics of translation) pretranslates everything right and consistently (words, phrases one by one) but sometimes the second phrase is longer the first one so there is an inconsistency. The machine is right - math logics. My task is to combine these phrases into one source text phrase and assign a new general translation and options. That's a machine learning process to work with the meanings by human logics. Man-made intelligence - AI.</p>]]>
    </content>
    <published>2006-12-31T06:21:49Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41577</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41577" />
    <title>Comment from Michael Molin on 2006-12-30</title>
    <author>
        <name>Michael Molin</name>
        <uri>http://geocities.com/gene_technics</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://geocities.com/gene_technics">
        <![CDATA[<p>"Well, as I see the AI as a development process, it's is a "man-made" competition between the human intelligence and mathematical logics of the machine."</p>

<p>That's correct.</p>

<p>Happy New Year!</p>

<p>My best wishes.</p>

<p>Michael</p>]]>
    </content>
    <published>2006-12-31T06:34:05Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2006://1.5225-comment:41578</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2006://1.5225" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/clearforest.php#c41578" />
    <title>Comment from Michael Molin on 2006-12-30</title>
    <author>
        <name>Michael Molin</name>
        <uri>http://geocities.com/gene_technics</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://geocities.com/gene_technics">
        <![CDATA[<p>it's a "man-made" competition - you know :)</p>

<p>Cheers!</p>]]>
    </content>
    <published>2006-12-31T06:37:58Z</published>
  </entry>

</feed>