hakia - ReadWriteWeb http://www.readwriteweb.com/feeds/search/hakia en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 15:30:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Hakia Licenses its Semantic Search Technology Semantic search engine hakia is announcing today, at the Search Engine Strategies conference in New York, that it is licensing its proprietary OntoSem technology to other companies. This will enable third parties to build semantic search applications. The first such customer to be made public is RiverGlass, Inc, a provider of real-time analytics. RiverGlass will integrate hakia's OntoSem technology into its analysis software.

]]> This is an interesting development by hakia - and has some parallels to the young Google, which you'll recall started out by licensing its search technology to the likes of Yahoo. But the parallels end there, because this move by hakia is more about licensing their underlying search technology to power the proprietary applications of other companies - whereas Google was a branded search app integrated into Yahoo's front-end.

According to hakia, this is what their OntoSem technology does:

  • information retrieval, analysis, and distribution
  • text summarization
  • information assurance and security
  • machine translation
  • ontology support
  • terminology standardization
  • supply chain automation

Essentially, it will enable third parties to find and use "the meaning of language" in their applications. Hakia's definition of 'semantic search' by the way differs from the traditional Semantic Web definition, in that hakia search aims to automatically determine meaning from search queries using its algorithms - whereas Semantic Web is all about adding metadata to information to enable connections between data.

At this early stage there aren't any visuals from RiverGlass showing how they're using hakia technology, but the company told us that "we will see the biggest boon in increased relevancy of results".

Disclosure: hakia is a RWW sponsor

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_licenses_semantic_search.php http://www.readwriteweb.com/archives/hakia_licenses_semantic_search.php Search Tue, 18 Mar 2008 08:00:00 -0800 Richard MacManus
Hakia Relaunches With 'Credible Sites' hakia_logo.pngSemantic search engine Hakia announced a major redesign of its site today, including the addition of 'credible sites' to its search index. In order to create this index of trustworthy sites, Hakia is asking volunteers to submit credible, peer reviewed sources. Credible sites are currently limited to health and environmental topics, but Hakia is planning to expand this quickly. By adding these credible sources, Hakia wants to go beyond '10 blue links' and give its users an alternative to popularity driven approaches like Google's PageRank. Hakia has also added a 'Galleries' section, which is a structured directory of some of the most popular search topics.

]]> Credible Sources

In order to create this index of credible and trustworthy sites, Hakia is relying on volunteers. Hakia is specifically recruiting librarians, though it seems anybody can sign up, which could potentially leave the site open to spammers. Hakia asks submitters for their professional credentials, but it is not clear if the company will actually check these.

hakia_new_sshot.png

Hakia uses a very strict definition for what makes a site credible. To be included in the index, a site should have gone through a peer review process, not have any commercial bias, and the information should be current. The fact that Hakia insists on only adding peer reviewed sites should greatly enhances the signal-to-noise ratio of the search results.

Great Structured Results

In our tests, we were often impressed by hakia's ability to structure its regular search results. For 'Sarah Palin', for example, Hakia organizes the results by official websites, images, news, biography, awards, and speeches. A search for 'Portland, OR,' on the other hand, first displays general information about the city, images, transportation options, and restaurant guides.

hakia_credible_small.pngAll results now also feature images and user-generate content.

Whenever we tried to ask more general questions ("What is a blog?"), however, Hakia's results were often underwhelming and uneven. Sometimes we got results that were spot-on, while at other times, the results barely had anything to do with our query.

Hakia also introduced 'my hakia,' a personal start page which still looks a bit unfinished, but seems to rely on Hakia's expertise in structuring search results to give users more background information about current events.

Overall, we liked Hakia's updates and we are looking forward to the expansion of the 'credible sources' to other topics, as we were quite impressed with the results it returns already.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_relaunches_with_credible.php http://www.readwriteweb.com/archives/hakia_relaunches_with_credible.php News Mon, 06 Oct 2008 10:24:55 -0800 Frederic Lardinois
Hakia Announces Semantic API Semantic search engine Hakia today announced a set of APIs that opens up their natural language processing and search platform to developers. Hakia's Syndication Web Services really comes in two parts: search queries, which allow developers to add web search functionality leveraging Hakia's five billion page index, and XML feed calls, which give developers access to Hakia's underlying natural language processing technology. The latter of the two is clearly the more compelling of the offerings.

]]> Mobile video firm, Berggi, released Berggi Search, a mobile search application that lets users search Hakia's index via the API from mobile phones. Berggi is leveraging the part of the Hakia's API that lets developers lean on the company's search platform -- that, however, is not the part that really interests us.

What is more interesting are the XML feed calls that Hakia is offering that give access to their underlying NLP engine. Right now, only the "Summarizer" element is available. Summarizer, which Hakia says can be used to suggest tags or abstracts, analyzes and extracts meaning from large blocks of text or the contents of URLs. Other elements that are not yet available are Categorizer, which identifies "categorical phrases" in text, Characterizer, which "identifies and expands descriptive keywords or tags," and Text Meaning Representation.

Hakia has an XML testing form up on their Club Hakia page, and in our testing it seemed a little rough around the edges. Compared to our testing of Open Calais from Reuters (our coverage), the summaries and tags the XML testing form returned using the Summarizer element weren't very impressive. Mostly, it seemed to just return the headline or first sentence as the summary for articles we threw at it. And for RWW articles, Hakia Summarizer would suggest as tags the tags that we entered by hand in MovableType.

Hakia's Syndication Web Services are free for up to 30,000 requests per day for search services (unlimited free queries for Quotes and Cartoons), and free for up to 1,000 requests per day for XML feed calls. Have you had a chance to play with Hakia's new semantic API? If so, what did you think? How does it compare to Calais or Semantic Hacker? Let us know in the comments below.

Full Disclosure: Occasional ReadWriteWeb contributor Emre Sokullu is a technology evangelist at Hakia.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php Semantic Web Thu, 19 Jun 2008 12:56:42 -0800 Josh Catone
Hakia Adds Social Networking - But Does Search Need Social Networking Features? Semantic search engine Hakia has just released a new social networking feature, called Meet Others (MO). The basic idea is to "meet others" who asked the same query. This is something I've never seen in a search engine before - and actually I'm not convinced that social networking is a good fit with search. But let's take a look at how this works, using an example provided to us by Hakia:

1. You ask a query and then receive your search results:

2. You will see an icon in the top-right of search results that says "Meet Others who asked the same query". If you click on the button, you enter into a room (if the room exists) of people who have a) asked the same or similar query; and b) decided to post a message to the room.

]]>

3. You can either post a message or contact someone who has already posted a message. To post, you only need to authenticate your email address - there is no other personal info or registration required. You can choose the method you'd like to be contacted: via email (which is masked), or IM (Skype of MSN). So if a user has IM contact enabled, you can start chatting with that person with one click.

There is a voting system too, which together with message age determines how long messages stay in the room for.

Conclusion

Hakia MO kind of resembles Yahoo! Answers, in that you are basically asking a question and then getting feedback from other users. However Hakia points out that MO is not a collaborative search result voting system. They are calling MO a "peer-to-peer transactional platform". Rather than Yahoo! Answers, Hakia says that MO most resembles Craigslist - because "users post content and there are no registration requirements."

In evaluating Hakia HO, I'm in two minds about the usefulness of social networking in a search engine. On one hand, it enables you to join groups of like-minded users in a very specific topic. I'm a big Velvet Underground fan for example, so if I search for "velvet underground" then it might be useful and/or enjoyable for me to join a "room" full of VU fans and begin conversations.

On the other hand, social networking is not something I am usually looking for in a search engine. I use search engines to gather information - in and out. Once I get what I came for, I'm outa there. So, will enough users join topic-focused rooms to make Hakia's MO a compelling feature?

I guess we'll find out, but it's an open question worth seeing the results of. Google would probably be very interested to see if they can integrate social networking into their search homepage, given their new OpenSocial APIs. Although, Hakia says they have a patent application on MO - so maybe Google won't be able to do it anyway!

What do you think: do search and social networking go together? Or should they be kept separate?

Disclosure: Hakia is a sponsor of our network blog AltSearchEngines and recently they signed up as a Read/WriteWeb sponsor for November.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_adds_social_networking_meet_others.php http://www.readwriteweb.com/archives/hakia_adds_social_networking_meet_others.php Search Wed, 31 Oct 2007 20:20:16 -0800 Richard MacManus
AI Favored Search 2.0 Solution In the current Read/WriteWeb poll (see below), we're asking what 'search 2.0' concepts you think stand the best chance of beating Google. The results so far are interesting, because Artificial Intelligence is currently top pick - despite having a history of underachievement in the tech industry and there being no real AI search contenders yet. Hakia, which we profiled recently, is one such AI (or natural language processing) search engine. But Hakia is at this stage a fair way off being a finished product.

Poll results so far:

1. Artificial Intelligence (e.g. Hakia, Powerset) 23% (123 votes)

2. People Powered Search (e.g. del.icio.us, ChaCha) 21% (115 votes)

3. Vertical Search (e.g. SimplyHired, Technorati) 15% (81 votes)

4. Personalized Search (e.g. Collarity) 12% (63 votes)

5. Clustering (e.g. Clusty, SearchMash) 11% (58 votes)

6. Social Search (e.g. Eurekster, Rollyo) 7% (37 votes)

7. Visualization (e.g. Quintura and Kartoo) 6% (33 votes)

8. Previews (Snap, Live Image Search) 5% (25 votes)

]]> Alex Iskold, in his R/WW post The Race to Beat Google, was skeptical of AI search:

"Based on what we have seen so far, it is difficult to see how these companies can beat Google. Firstly, being able to enter the query using natural language is already allowed by Google, so this is not a competitive difference. It must then be the actual results that are vastly better. Now that is really difficult to imagine. Somewhat better maybe, but vastly different? Unlikely."

But it seems R/WW readers beg to differ. 23% of you think AI search is the most likely approach to challenge Google.

Disagree? Well, let us know in the poll which search 2.0 approaches you favor. You can choose more than one:

]]> Discuss]]>
http://www.readwriteweb.com/archives/ai_favored_search20_solution.php http://www.readwriteweb.com/archives/ai_favored_search20_solution.php Search Fri, 05 Jan 2007 03:55:32 -0800 Richard MacManus
Do Semantic Search Companies Need a Semantic Map? It's All Semantics... This week we reported that Cognition had announced "the largest commercially available Semantic Map of the English language." In our interview with Cognition CEO Scott Janus, we asked him to compare Cognition's technologies to those of other semantic search companies Hakia and Powerset. Janus pointed to their large Semantic Map as the main differentiator. Indeed he told us that semantic search companies "must include a comprehensive semantic map" to be successful.

Is this true? We sought a response from both Hakia and Microsoft-owned Powerset on this semantically charged question.

]]> Cognition claims that its Semantic Map has over 10 million semantic connections, including "over 4 million semantic contexts (word meanings that create contexts for specific meanings of other related words)".

Hakia CEO Riza C. Berkan responded in the comments to the original article that "hakia is deploying Ontological Semantics (OntoSem)", which he described as "a network of concepts reflecting ontology." He went on to say that hakia covers "over [a] million words in English".

However Berkan noted that the size of a Semantic Map does not necessarily matter: "the sheer size of the collection of words or concepts does not represent, by any means, the capability of the system." Hakia's position is that "there is no silver bullet for a semantic solution that will succeed", as long as the system developed is scalable and imposes "minimum reliance on 'words'".

Semantopoly: Advance token to nearest Semantic Context

At this point we were still confused. Cognition uses the term "semantic map" and said it was necessary to have. One of the commenters on the original post agreed with that assumption. Yet Hakia's Riza Berkan didn't use the term "semantic map". So we asked Hakia in a follow-up email, does it or does it not have a semantic map? Dr. Christian Hempelmann, Hakia's Chief Scientific Officer, responded:

"The term sometimes comes up in the context of data integration, but "Semantic map" is not a term used in linguistics. I can only speculate that it is what is commonly called an ontology. To the degree that they let us on about it in the documentation on their website, Cognition operates with only 2 main relations, much like WordNet: hyperonymy/hyponymy (e.g. cat is-a feline is-a mammal; their "taxonomy") and synonymy (e.g., "buy" means almost the same as "purchase"; their "thesaurus"). Furthermore, this map is not independent of English, cannot grow into other languages. hakia, on the other hand, has an ontology with many more relations, effectively raising our "semantic map" to the size of a higher power, and can and is already growing into other languages."

We also tried to get a comment from Powerset, but as of writing we haven't received it.

So, are we all clearer now on what is a Semantic Map, is it needed, and does size matter? Er, it depends. If you think you know the answers, tell us in the comments please!

]]> Discuss]]>
http://www.readwriteweb.com/archives/do_semantic_search_companies_need_a_semantic_map.php http://www.readwriteweb.com/archives/do_semantic_search_companies_need_a_semantic_map.php Analysis Fri, 19 Sep 2008 15:05:28 -0800 Richard MacManus
Announcing Our New Contextual Link Advertising Product - Built by Hakia This month we are offering some additional value to our long-term sponsors. It's a new type of contextual link advertising and we think it is important to the future of blogging as a business. For our wider audience, some of whom operate websites that are monetized through advertising, the background may be interesting.

]]> How Does It Work?

Here is an example:

The sponsor's name is clickable and points to the sponsor's landing page.

All that our sponsors have to do is provide us with up to three "trigger phrases" that define their business. A phrase can be a single word or two linked words. For example:

  • Hosting (single word)
  • Dedicated server (two linked words)
  • Web hosting (two linked words)

Think of these as search terms. It is search advertising within ReadWriteWeb. More importantly, it is in context, in posts that are relevant.

The idea is (a) to offer value to our readers by providing advertising links in the context of what they are reading and therefore more likely to find of interest, and (b) to offer a higher level of engagement to our advertisers, resulting in both more branding impressions and click-throughs.

Background on the Technology

We experimented with this manually to test whether the theory made sense and whether both readers and advertisers got some value from it. As you can imagine, doing this manually is difficult at any level of scale. So, we hunted for a technology partner who could build what we envisaged. This was not a simple technical challenge. What is easy for a human to do (read an article and quickly determine which phrases are most relevant) is quite hard for a search engine to do.

We needed an engine that would return ranked/scored results. We decided to limit the number of ads to three per post. Any more would detract from the reader's experience. We imagined that such an engine could come up with more than three matches for a sponsored trigger phrase in a single post, so we needed the engine to return the three most relevant sponsored trigger phrases.

That raised the bar considerably.

Our Partner: Hakia

We were delighted to find a partner that could jump this high bar. Even better, the partner is also a ReadWriteWeb sponsor (so it will have its own sponsored trigger phrases matched by the engine). Our partner is Hakia.

For those who don't know Hakia, check out its semantic search engine. It was a meeting of minds from the start. When we outlined our vision of what we wanted, it was clear that Hakia was headed in the same direction. Becoming partners to make it happen was a natural decision. Increasingly, we see our ability to partner effectively as being a core competency. We build our business with partners and couldn't imagine it any other way. Hakia clearly shares this partnering philosophy and competence.

Together with Hakia, we have big plans for what to do with this in future. We see it as something of a native revenue model for blogging. As always, we are keen to hear your feedback in the comments.

Interested in being a ReadWriteWeb sponsor? ReadWriteWeb is one of the most popular blogs in the world and is read by a sophisticated audience of thought leaders and decision-makers. Email our COO Bernard Lunn for all the details.

]]> Discuss]]>
http://www.readwriteweb.com/archives/announcing_contextual_link_advertising_partnership_hakia.php http://www.readwriteweb.com/archives/announcing_contextual_link_advertising_partnership_hakia.php Sponsors Wed, 20 May 2009 05:00:00 -0800 Bernard Lunn
Hakia Takes On Google With Semantic Technologies This week I spoke to Hakia founder and CEO Dr. Riza C. Berkan and COO Melek Pulatkonak. Hakia is one of the more promising Alt Search Engines around, with a focus on natural language processing methods to try and deliver 'meaningful' search results. Alex Iskold profiled Hakia for R/WW at the beginning of December and he concluded, after a number of search experiments, that Hakia was intriguing - but it was not a level to compete with Google yet. It is important to note that Hakia is a relatively early beta product and is still in development. But given the speed of Internet time, 3.5 months is probably a good time to check back and see how Hakia is progressing...

What is Hakia?

Riza and Melek firstly told me what makes Hakia different from Google. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. Riza and Melek told me that the future of search engines will go beyond keyword analysis - search engines will talk back to you and in effect become your search assistant. 

One point worth noting here is that, currently, Hakia still has some human post-editing going on - so it isn't 100% computer powered at this point.

]]> Hakia has two main technologies:

1) QDEX Infrastructure (which stands for Query Detection and Extraction)  - this does the heavy lifting of analyzing search queries at a sentence level.

2) SemanticRank Algorithm - this is essentially the science they use, made up of ontological semantics that relate concepts to each other.

If you're interested in the tech aspects, also check out hakia-Lab - which features their latest technology R&D.

How is Hakia different from Ask.com?

Hakia most reminds me of Ask.com, which uses more a natural language approach than the other big search engines ('ask' a question, get an answer) - and also Ask.com uses human editing too, as with Hakia. [I interviewed Ask.com back in November]. So I asked Riza and Melek what is the difference between Hakia and Ask.com?

Riza told me that Ask.com is an indexing search engine and it has no semantic analysis. Going one step below, he says to look at the basis of their results. Ask.com bolds keywords (i.e. it works at a keywords level), whereas Riza said that Hakia understands the sentence. He also said that Ask.com categories are not meaning-based - they are "canned or prefixed". Hakia, he said, understands the semantic relationships.

Hakia vs Google

I next referred Riza and Melek to Read/WriteWeb's interview with Matt Cutts of Google, in which Matt told me that Google is essentially already using semantic technologies, because the sheer amount of data that Google has "really does help us understand the meanings of words and synonyms". Riza's view on that is that Google works with popularity algorithms and so it can "never have enough statistical material to handle the Long Tail". He says a search engine has to understand the language, in order to properly serve the Long Tail.

Moreover, Hakia's view is that the vastness of data that Google has doesn't solve the semantic problem - Riza and Melek think there needs to be that semantic connection present.

Their bigger claim though is that the big search companies are still thinking within an indexing framework (personalization etc). Hakia thinks that indexing has plateaued and that semantic technologies will take over for the next generation of search. They say that semantic technologies allow you to analyze content, which they think is 'outside the box' of what the big search companies are doing. Riza admitted that it was possible Google was investigating semantic technologies, behind closed doors. Nevertheless, he was adamant that the future is understanding info, not merely finding it - which he said is a very difficult problem to solve, but it's Hakia's mission.

Semantic web and Tim Berners-Lee

Throughout the interview, I noticed the word "semantic" was being used a lot - but their interpretation seemed to be different to that of Tim Berners-Lee, whose notion of a Semantic Web is generally what Web people think about when uttering the 'S' word. Riza confirmed that their concept of semantic technology is indeed different. He said that Tim Berners-Lee is banking on certain standards being accepted by web authors and writers - which Riza said is "such a big assumption to start this technology". He said that it forces people to be linguists, which is not a common skill.

Furthermore, Riza told me that Berners-Lee's Semantic Web is about "imposing a structure that assumes people will obey [and] follow". He said that the "entire Semantic Web concept relies on utilizing semantic tagging, or labeling, which requires people to know it." Hakia, he said, doesn't depend on such structures. Hakia is all about analyzing the normal language of people - so a web author "doesn't need to mess with that".

Competitors

Apart from Google and the other big 'indexing' search engines, Hakia is competing against other semantic search engines like Powerset and hybrids like Wikia. Perhaps also Freebase - although Riza thinks the latter may be "old semantic web" (but he says there's not enough information about it to say for sure).

Conclusion

Hakia plans to launch its version 1.0 (i.e. get out of beta) by the end of 2007. As of now my assessment is the same as Alex's was in December - it's a very promising, but as yet largely unproven, technology.

I also suspect that Google is much more advanced in search technology than Mountain View is letting on. We know that Google's scale is a huge advantage, but their experiments with things like personalization and structured data (Google Base) show me that Google is also well aware of the need to implement next-generation search technologies. Also, as Riza noted during the interview, who knows what Google is doing behind closed doors.

Will semantic technologies and 'sentence analysis' be the next wave of search? It seems very plausible. So with a bit more development, Hakia could well become compelling to a mass market. Therefore how and when Google responds to Hakia will be something to watch carefully.


]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php Fri, 23 Mar 2007 12:14:43 -0800 Richard MacManus
Hakia Launches Semantic Highlighter and Scoop Button Today hakia added a hakia highlighter to their “meaning-based” search engine, producing a highlighted sentence inside a search result. The bigger announcement is tomorrow, when hakia will launch a scoop button - a browser plug-in that not only highlights text, but when you click on a result page it scrolls automatically to the highlighted passage, enables you to save data to your computer, and more customization features that we'll discuss below.

Both of these new tools allow for faster more relevant result selection and additional utility for users.

]]> hakia highlighter

The hakia highlighter upgrade addresses what Melek Pulatkonak, COO of hakia, termed ‚ÄúClick-Thru-Itis‚Ä? - or clicking through a link to determine relevance. Traditional search engines often force users to click links excessively in order to determine (based on limited information) which result is relevant. The hakia highlighter alleviates the need for excessive clicks by displaying meaning-based, uninterrupted sentences in the search result. Determining relevance and providing an educated selection of URLs is improved because of the added content and context provided by these sentences. Broken keywords and phrases simply do not provide enough information.

Let‚Äôs examine what typically happens when we do a search on Google. For the search query: ‚ÄúWhat does it mean to cross the Rubicon?‚Ä? the Google results are almost always more difficult to "filter" because of fragmented meaning. The real problem is in the disparity between relative results and how people are forced into making decisions based on this broken information. The Google results to this query vary from the rule of habeas corpus to a metaphysical discourse, but the bold keywords do not signal this wide disparity.


Note: fragmented keywords and phrases

Invariably, selecting from broken sentences will lead to unwanted visits to those URLs. Let’s now look at what is revealed with the same search on hakia. The highlighted sentences provide more information and relevance for deciding. Results that display no sentences on hakia are ruled out all together and more obvious examples become readily apparent. In this example, extra clicks aren't necessary in order to significantly "narrow" a subject.


Note: I followed the highlighted sentences as much as I did the links

The hakia highlighter examples demonstrate the engine's ability to "think" semantically and display the process in a way that narrows selection options.

Scoop Bar

The scoop bar is a browser plugin that lets users apply semantic results in a unique way. Currently it is only available on Windows, but Mac support is coming.

When installed, clicking the scoop browser icon reveals a pull-down with several options. The home option refers back to hakia, where a search for "What is the specific gravity of lead?" renders a highlighted result as before; but clicking on a desired result reveals the page AND scrolls the page to the highlighted passage as below.


Note: The scroll bar is about 1/3 down a very long page on ballistics

The button in front of the highlighted passage has several functions. Clicking the pull-down arrow allows the user to highlight desired text and save the link and text to a custom folder. Alternatively, clicking the "scoop and save" icon saves the link and text to a default folder or file.


Note: Additional highlighted paragraph and saving to a created folder

Additional buttons and functions allow for more customization and navigation. The options button at top center of the tool bar brings up the scoop and save functions, while the "my results" pull-down lists saved results and navigates to them as illustrated below.


My result function and destination folder

Perhaps a real world example is in order. If I were doing a post on ballistics, Hakia (even in beta) has provided me with faster and more relevant results, a method for saving/customizing results, and a point for the study of all the data gleaned. A user could perform the same function in any browser, but it is obvious that time and function would be lost comparatively. 

Conclusion

Hakia is progressing to the point of expressing results in true "natural language" terms. Make no mistake, the comparative results of Google, Yahoo and hakia are not conclusively differentiated yet, but the progress of hakia is fairly clear in these examples. The way the "scoop" function directs the user to relevant links and then "scrolls" to the pertinent passage is evidence of the semantic engine at work locating relevant data. The implications of this are powerful and exciting for hakia and the rest of us. There is a long road ahead for hakia, but predicting outcomes is so often a function of watching the little things.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_introduces_semantic_highlighter_and_scoop_button.php http://www.readwriteweb.com/archives/hakia_introduces_semantic_highlighter_and_scoop_button.php Thu, 26 Jul 2007 21:00:59 -0800 Phil Butler
Hakia - First Meaning-based Search Engine Written by Alex Iskold and edited by Richard MacManus.

There has been a lot of talk lately about 2007 being the year when we will see companies roll out Semantic Web technologies. The wave started with John Markoff's article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For background on the Semantic Web in this era, check out our post entitled The Road to the Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova Spivak's recent article.

The media attention is not accidental. Because Semantic Web promises to help solve information overload problems and deliver major productivity gains, there is a huge amount of resources, engineering and creativity that is being thrown at the Semantic Web. 

What is also interesting is that there are different problems that need to be solved, in order for things to fall into place. There needs to be a way to turn data into metadata, either at time of creation or via natural language processing. Then there needs to be a set of intelligence, particularly inside the browser, to take advantage of the generated metadata. There are many other interesting nuances and sub-problems that need to be solved, so the Semantic Web marketplace is going to have a rich variety of companies going after different pieces of the puzzle. We are planning to cover some of these companies working in the Semantic Web space, so watch out for more coverage here on Read/WriteWeb.

]]> Hakia: how is it different from Google?

The first company we'll cover is Hakia, which is a "meaning-based" search engine startup getting a bit of buzz. It is a venture-backed, multi-national team company headquartered in New York - and curiously has former US senator Bill Bradley as a board member. It launched its beta in early November this year, but already ranks around 33K on Alexa - which is impressive. They are scheduled to go live in 2007.

The user interface is similar to Google, but the engine prompts you to enter not just keywords - but a question, a phrase, or a sentence. My first question was: What is the population of China?

As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message:

"Your query produced the Hakia gallery for China. What else do you want to know about China?"

At first this seems like a value add. However, after some thinking about it - I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster - i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia's credit, it recovered nicely when I typed in:

Hakia experiments

Next I decided to try out some of the examples that the Hakia team suggests on its homepage, along with some of my own. The first one was Why did the chicken cross the road?, which is a Hakia example. The answers were fine, focusing on the ironic nature of the question. Particularly funny was Hakia's pick:

My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too. 

Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron's collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty.... but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? - which is not one of Hakia's homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

Hakia is impressive, but does it really understand meaning?

I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn't sufficient evidence yet that it really understands meaning. 

It's intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia's method for picking the top answer. That seems to be Hakia's secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings - and it is impressive, at least at first.

Hakia and Google

Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it's a long shot that they'll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints.... 

Google's string algorithm is very powerful - this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it's hard, but we also know that it is possible. 

While we are waiting for all the answers, please give Hakia a try and let us know what you think.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_meaning-based_search.php http://www.readwriteweb.com/archives/hakia_meaning-based_search.php Search Thu, 07 Dec 2006 12:08:31 -0800 Alex Iskold
Evri Beta Launches: Search Less - Understand More

Evri, a Paul Allen backed semantic search engine, is launching into a limited beta tonight. Evri was first shown publicly at the D6 conference. Evri's CEO Neil Roseman likes to talk about Evri in terms of organizing content instead of calling it a search engine. At its core, however, Evri definitely is a search engine, though it adds a very sophisticated semantic layer on top of its results that emphasizes the relationships between different search terms.

]]> In its early stages, Evri is only going to start out with a limited set of results and possible search terms, based on what it considers to be the most popular terms and people. This approach of starting with only the most popular terms is reminiscent of Mahalo. However, unlike Mahalo, which relies on paid editors and volunteers to create its results, Evri completely relies on its algorithms to create connections between people, products, concepts, and events.

Evri especially prides itself for having developed a system that can distinguish between grammatical objects such subjects, verbs, and objects to create these connections. In his demo at D6, Roseman described the system as being similar to "an army of 7th grade grammar students graphing the Web."

evri-screen.png

Evri is entering in direct competition with a number of recent entries to the semantic search market, especially Powerset and Hakia. Powerset, however, only indexes Wikipedia articles, while Hakia tries to index all of the web, but focuses less on the relationships between objects and more on providing highly organized results for a given term.

You can sign up for invites to Evri on their homepage. The first wave of users should be receiving invites tonight.

For a more in-depth look at the state of semantic search, see also Alex Iskold's article on the myth and reality of semantic search.

]]> Discuss]]>
http://www.readwriteweb.com/archives/evri_beta_launches_search_less.php http://www.readwriteweb.com/archives/evri_beta_launches_search_less.php News Tue, 24 Jun 2008 21:01:00 -0800 Frederic Lardinois
Powerset and hakia - Quest For The Semantic Web This week I spoke with Barney Pell, CEO of Powerset; and Melek Pulatkonak, COO of hakia. In both (separate) conversations we discussed how the Semantic Web is getting very close. The Semantic Web as defined by Tim Berners-Lee is: "a universal platform for the exchange of data, information and knowledge." I think Barney and Melek would agree, that the only thing preventing the Semantic Web so far has been an inefficient use of horsepower - or a lack of it.

]]> Speed, Power and Getting There

Semantics is expressed meaning in language, code or "other" representations of information. My discussions with Barney and Melek revealed the fundamental differences in architecture and philosophy between hakia and Powerset. The index systems of the two companies are fundamentally different, as is their philosophy - but their goals and visions are remarkably similar. They are also different in the way they apply what I term horsepower to natural language search. Like the symbolism of Shelby vs. Ferrari,– it is possible for different approaches to achieve a desired result - given enough horsepower.

Hakia has built their search in-house, refining and sculpting the QDex indexing system (like an Enzo Ferrari). Their view is that processing power should be maximized with super efficiency, via fuzzy logic and advanced semantics. Powerset, on the other hand, utilizes basically the same inverted indexing system as Google - but backed by natural language and immensely powerful processing that essentially “overpowers” the long tail query (like the GT 500). This is a vast oversimplification, but the elements involved reveal the larger story.

Technology (horsepower), communication (language) and people make up the semantic Web. The Web has not been lacking "language", but the adequate application of processing power. As Barney said: "Even five years ago we did not have the processing capability to even attempt this, but five years from now these answers will seem elementary." Google's system below, currently consumes massive horsepower with comparatively limited results - at least according to hakia and Powerset!

Diagram of Google's inverted index and search (courtesy -changturtle)

Unbending Humans

Barney described the relationship between people and computers as people being "bent" around or adapted to technology in order to utilize it. With the advent of services like Facebook, programs and applications are beginning to “understand” each other. Everyone reading this has been “forced” by technology to conform to varied “bending events”, in order to use it. Barney explained this idea by calling Facebook and the iPhone true innovations approaching total “community engagement.” Barney also said that “Facebook will become one of the primary communications platforms of the future.” Given this new perspective, I could not agree more because Facebook is one heck of a representation of information for a social network. Essentially, hakia, Powerset, Facebook and others are bending the machines to engage humans. And in a way, Facebook is the semantic Web in a microcosm - but in it's infancy.

Semantics and Search?

Search is a critical part of our daily lives, but the interface has changed very little over the years. We define search as the act of typing in a query on Google and getting results. This is a type of search, but how many other kinds of “searches” do we perform? In an earlier article, Josh Catone wrote about Yahoo!’s contention that search will not determine the future of the Web. Josh rightly asked if Facebook and MySpace might be better positioned if “personalization” was to be the future of the Web.

Conclusion

I should make it clear that neither Barney nor Melek really consider themselves as "Google Killers". Powerset and hakia are not in a race either against each other or to overtake Google, but they are on a quest for better Web communication and engagement. Both efforts emphasize the necessity for “the system” to be able to universally understand and handle data without ambiguity. Viewing Facebook and others as functional repositories of semantic data is essential in seeing the long view. Whether we are talking about object oriented data, textual semantics or complex algorithms, the semantic Web is about making people “bend” less for technology.

]]> Discuss]]>
http://www.readwriteweb.com/archives/powerset_and_hakia_quest_for_semantic_web.php http://www.readwriteweb.com/archives/powerset_and_hakia_quest_for_semantic_web.php Analysis Fri, 20 Jul 2007 00:15:56 -0800 Phil Butler
A New Commercial Ontology from Hakia Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products.

We at Hakia are proud to announce our upcoming commercial ontology, perhaps the world's first. What is a commercial ontology? If you're asking this question you have just touched on an important distinction: fantasy versus reality. In the context of the Web, a commercial ontology is a realistic version of an ontology, as we explain below.

]]> Realities of the Web

Hakia has accomplished two important innovations in building its commercial ontology (CO): first, the development of concepts and lexicons that follow strict guidelines on the realities of Web operations. What are these realities? Most search queries on the Web reflect a single dimension of intent, almost exclusively relevant to commercial topics. "Commercial topics" here must be taken in the broadest sense possible. For example, if you were looking for "the benefits of foot massage" or "the director of the movie Last Emperor," your queries would fall into a commercial pattern. One particular distinction of the commercial pattern is that they come in short packages, including a name (onomasticon) or referring to something sold, bought, watched, heard, etc.

In contrast, many (if not all) ontologies that have been built to date (or claimed to exist) are focused on the use of language in the general sense, but not in the sense of commercial patterns on the Web. Therefore, their usefulness when tackling Web search queries is greatly compromised, sometimes to the point of absolute failure. If such an ontology could disambiguate a dozen different senses of the word "kill," it would be sad news if the last 100,000 queries in the search logs did not include a single occurrence of the word "kill." Like drowning in two-inch-deep water, such ontologies do not use their disambiguation capacities for nearly 80% of queries because the queries include nothing but onomasticons or are too short (under-articulated).

The Sequence Approach

The second innovation used in the CO is the use of sequences instead of single words. A single word, like "kill," is the most ambiguous state of information and is hardly used in human communication without a strong implied context. As a result, building natural-language processing (NLP) systems by taking individual words as units of computation is an invitation for disaster.

In contrast, word sequences (two or more words) are inherently safe and highly descriptive. Take "road kill," for example. This sequence describes the corpse of an animal killed on the road by a passing vehicle. If a language processing system takes the sequence of words as a unit of computation, 99% of the ambiguity problem vanishes. There is no need to process the words "kill" and "road" separately, trace their senses, and locate convergence to identify the meaning of "road kill" if you can just take the sequence "road kill" itself as your unit of computation for mapping. This is depicted below:

Note the number of traces required in a conventional ontology approach compared to the sequence approach. The sequence approach requires a lot of data storage space (which is dirt cheap), whereas the conventional ontology approach requires a lot of CPU for a simple mapping task (which is expensive). But the bad news does not stop there. The trace routes in conventional ontology require manual work (impossible to automate), whereas sequence-based ontology can be easily built via automation.

Perhaps not everyone will understand the second point above. Nevertheless, the scalability and performance of the end product will speak for themselves when Hakia puts the testing platform online.

Usage of the Commercial Ontology

The immediate use of the CO is for search queries, or document characterizations, not tied to any advertising in conventional systems. This unrecognized domain of search queries and characterizations means loss of revenue. Hakia's CO is designed to fill in this gap. For example, if the search query or page characterization is "beat generation," the CO can map it to "literature" on the fly. As a result, systems using the CO will have a much deeper understanding of the incoming terms, and thus will be able to recognize the underlying intent beyond the face value of the words. The same capability can be used in a number of places other than advertising with the same effect.

Stay tuned for the release of the first version of Hakia's commercial ontology.

]]> Discuss]]>
http://www.readwriteweb.com/archives/new_commercial_ontology_from_hakia.php http://www.readwriteweb.com/archives/new_commercial_ontology_from_hakia.php Sponsors Thu, 30 Jul 2009 05:00:17 -0800 RWW Sponsor
Cognition Announces "World's Largest Semantic Map" Cognition Technologies, a Semantic Web company that specialises in Natural Language Processing (NLP) search, is today announcing the release of what it claims is "the largest commercially available Semantic Map of the English language." We interviewed Cognition CEO Scott Janus to find out what this means.

We also discovered that Cognition, which currently licenses its technology to other organizations, is planning to build a general consumer search engine - which will compete with Google and others.

]]> What is a Semantic Map?

A Semantic Map is kind of like a dictionary, in that it's a representation of Cognition's ability to define things. Cognition claims that its Semantic Map has over 10 million semantic connections; over 4 million semantic contexts (word meanings that create contexts for specific meanings of other related words); over 536,000 word senses (word and phrase meanings); 75,000 concept classes (or synonym classes of word meanings); 7,500 nodes in the technology's ontology or classification scheme; and 506,000 word stems (roots of words) for the English language.

Image from Cognition

The company says that its Semantic Map "is more than double the size of any other computational linguistic dictionary for English".

Cognition Technologies has been working on its technology for 24 years, with a lot of input from lexicographers and linguists over that time. Because they've used a mix of algorithms and human input, Cognition has been able to discern relevancy, meaning, synonymy. Scott Janus told us that one of Cognition's strengths is that it can disambiguate words and phrases, which Janus says differentiates them from the keyword and pattern matching algorithms of Google, Yahoo and others.

For example Janus told us that Cognition's technology can find results even if direct words are not used - which he says Google can't do.

Cognition Plans General Search Engine

The comparisons to Google led us to ask the obvious question: does Cognition's semantic technology have a more general application? In other words, does Cogition plan to take on Google by creating a search engine for consumers? CEO Scott Janus replied that yes they do plan to "one day offer search on the general web". However he said that they need more capital funding to index the entire Web, put infrastructure in place, etc.

As of now Cognition will continue to license its semantic technology to verticals like law and health. Janus told us that Cognition is "good for complex content where lot of synonyms are used", so right now data-intensive industries are where it is aiming.

Cognition's current applications include legal (e.g. LexisNexis Concordance's case management), health (e.g. MEDLINE), and a semantically charged version of Wikipedia.

Image from Cognition

Cognition vs Powerset and Hakia

Two other Semantic search engines we've been tracking closely on ReadWriteWeb are Powerset and Hakia. We asked CEO Scott Janus what makes Cognition different from those two products?

In a nutshell, Janus says that its Semantic Map is bigger and better.

Specifically, he said that Powerset is actually "not so similar" to Cognition. According to Janus, Powerset does "parsing" - which it licensed from Xerox Parc. That is 20-25% of the solution, said Janus, but Powerset "doesn't have a good semantic map". Cognition went so far as to write a white paper (pdf) explaining why it thinks Powerset "misses the point".

As for Hakia, Janus said that as far as he can see Hakia is focused on "ontological classifications" - classifying words and concepts together. But he says Hakis doesn't have as full a semantic map as Cognition, so he thinks Cognition has "a better understanding" compared to Hakia.

In summary, Janus told us that semantic search companies "must include a comprehensive semantic map" to be successful. We're sure that Powerset and Hakia will have different opinions on what makes a successful semantic search company, but it does make for a good differentiator for Cognition.

Open Question

Tell us in the comments what you think of Cognition and whether you think it can compete with Google in the long run?

]]> Discuss]]>
http://www.readwriteweb.com/archives/cognition_semantic_map.php http://www.readwriteweb.com/archives/cognition_semantic_map.php Semantic Web Tue, 16 Sep 2008 09:55:00 -0800 Richard MacManus
Thanks Sponsors; Packages Available April-May Thank you to our sponsors, for supporting ReadWriteWeb's mission to provide in-depth coverage of Web Apps, Web Technology Trends, Social Networking & Social Media.

We currently have a couple of sponsor slots available for April-May on ReadWriteWeb, so if you would like to enquire about those then please email us for a Media Kit. We also have opportunities on our network blogs last100, AltSearchEngines and ReadWriteTalk.

Here are our current sponsors:

]]> Quintura is a visual-based search engine. Currently it is offering to display your brand (via graphical ad) in the search cloud on Quintura.com for free - click here for details.

Wild Apricot offers Membership Database Management Software for non-profits and associations.

Hakia is a leading semantic search engine and recently released Club Hakia, a place to go to rate hakia, get webmaster tools, find out about technology licensing, and more.

Foretal is an online prediction market. People can submit predictions to Foretal and members vote on them for free or for cash.

Compete Search Analytics is a way to build and optimize search marketing campaigns.

Techinline provides an easy to use, reliable, and affordable remote desktop solution oriented towards small and mid-size businesses.

Wordclay is a DIY Self-Publishing that allows you to self-publish as many books as you like, for free. Wordclay also offers a variety of services to help you create, distribute and promote your book.

Central Desktop offers a set of Wiki Tools for business teams (not the IT department). There's no download, it's delivered on-demand. There is a 30 day free trial.

Userplane is a provider of communication software for online communities. As well as instant messaging, Webmessenger 2 has a Presence system that allows sites to display and leverage online user presence anywhere.

EditMe is a hosted wiki and content management service; there is a 14 day free trial for RWW readers.

Eurekster is a search engine that learns from the community's search behavior, so it gets better the more you use it.

MediaTemple provides hosting for RWW and SixApart provides our publishing software MT4.

]]> Discuss]]>
http://www.readwriteweb.com/archives/thanks_sponsors_23mar08.php http://www.readwriteweb.com/archives/thanks_sponsors_23mar08.php Sponsors Sun, 23 Mar 2008 12:53:15 -0800 Richard MacManus