ReadWriteWeb

This week I spoke to Hakia founder and CEO Dr. Riza C. Berkan and COO Melek Pulatkonak. Hakia is one of the more promising Alt Search Engines around, with a focus on natural language processing methods to try and deliver 'meaningful' search results. Alex Iskold profiled Hakia for R/WW at the beginning of December and he concluded, after a number of search experiments, that Hakia was intriguing - but it was not a level to compete with Google yet. It is important to note that Hakia is a relatively early beta product and is still in development. But given the speed of Internet time, 3.5 months is probably a good time to check back and see how Hakia is progressing...

What is Hakia?

Riza and Melek firstly told me what makes Hakia different from Google. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. Riza and Melek told me that the future of search engines will go beyond keyword analysis - search engines will talk back to you and in effect become your search assistant. 

One point worth noting here is that, currently, Hakia still has some human post-editing going on - so it isn't 100% computer powered at this point.

Hakia has two main technologies:

1) QDEX Infrastructure (which stands for Query Detection and Extraction)  - this does the heavy lifting of analyzing search queries at a sentence level.

2) SemanticRank Algorithm - this is essentially the science they use, made up of ontological semantics that relate concepts to each other.

If you're interested in the tech aspects, also check out hakia-Lab - which features their latest technology R&D.

How is Hakia different from Ask.com?

Hakia most reminds me of Ask.com, which uses more a natural language approach than the other big search engines ('ask' a question, get an answer) - and also Ask.com uses human editing too, as with Hakia. [I interviewed Ask.com back in November]. So I asked Riza and Melek what is the difference between Hakia and Ask.com?

Riza told me that Ask.com is an indexing search engine and it has no semantic analysis. Going one step below, he says to look at the basis of their results. Ask.com bolds keywords (i.e. it works at a keywords level), whereas Riza said that Hakia understands the sentence. He also said that Ask.com categories are not meaning-based - they are "canned or prefixed". Hakia, he said, understands the semantic relationships.

Hakia vs Google

I next referred Riza and Melek to Read/WriteWeb's interview with Matt Cutts of Google, in which Matt told me that Google is essentially already using semantic technologies, because the sheer amount of data that Google has "really does help us understand the meanings of words and synonyms". Riza's view on that is that Google works with popularity algorithms and so it can "never have enough statistical material to handle the Long Tail". He says a search engine has to understand the language, in order to properly serve the Long Tail.

Moreover, Hakia's view is that the vastness of data that Google has doesn't solve the semantic problem - Riza and Melek think there needs to be that semantic connection present.

Their bigger claim though is that the big search companies are still thinking within an indexing framework (personalization etc). Hakia thinks that indexing has plateaued and that semantic technologies will take over for the next generation of search. They say that semantic technologies allow you to analyze content, which they think is 'outside the box' of what the big search companies are doing. Riza admitted that it was possible Google was investigating semantic technologies, behind closed doors. Nevertheless, he was adamant that the future is understanding info, not merely finding it - which he said is a very difficult problem to solve, but it's Hakia's mission.

Semantic web and Tim Berners-Lee

Throughout the interview, I noticed the word "semantic" was being used a lot - but their interpretation seemed to be different to that of Tim Berners-Lee, whose notion of a Semantic Web is generally what Web people think about when uttering the 'S' word. Riza confirmed that their concept of semantic technology is indeed different. He said that Tim Berners-Lee is banking on certain standards being accepted by web authors and writers - which Riza said is "such a big assumption to start this technology". He said that it forces people to be linguists, which is not a common skill.

Furthermore, Riza told me that Berners-Lee's Semantic Web is about "imposing a structure that assumes people will obey [and] follow". He said that the "entire Semantic Web concept relies on utilizing semantic tagging, or labeling, which requires people to know it." Hakia, he said, doesn't depend on such structures. Hakia is all about analyzing the normal language of people - so a web author "doesn't need to mess with that".

Competitors

Apart from Google and the other big 'indexing' search engines, Hakia is competing against other semantic search engines like Powerset and hybrids like Wikia. Perhaps also Freebase - although Riza thinks the latter may be "old semantic web" (but he says there's not enough information about it to say for sure).

Conclusion

Hakia plans to launch its version 1.0 (i.e. get out of beta) by the end of 2007. As of now my assessment is the same as Alex's was in December - it's a very promising, but as yet largely unproven, technology.

I also suspect that Google is much more advanced in search technology than Mountain View is letting on. We know that Google's scale is a huge advantage, but their experiments with things like personalization and structured data (Google Base) show me that Google is also well aware of the need to implement next-generation search technologies. Also, as Riza noted during the interview, who knows what Google is doing behind closed doors.

Will semantic technologies and 'sentence analysis' be the next wave of search? It seems very plausible. So with a bit more development, Hakia could well become compelling to a mass market. Therefore how and when Google responds to Hakia will be something to watch carefully.




Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. It's a beautiful technology, very well thought out in terms of linguistics and phrase parsing. Fully realized, it's a nice addition to the semantic web.

    From a business perspective, I agree with the statement "Hakia thinks that indexing has plateaued and that semantic technologies will take over for the next generation of search". There best bet may be semantic search add-ons to corporate networks, filtering knowledge databases.

    But from a consumer perspective, it will be hard to differentiate itself from Google, where most people just want to find out what Britney is doing.

    Posted by: David | March 23, 2007 1:05 PM



  2. Great report on my favorite search engine!

    I interviewed Dr. Berkan a couple of weeks back and came to a similar conclusion about him and the work they are doing at hakia!

    I a sure Dr, Berkan gave your the guided tour as well, and I am glad to see other people who can imagine the possibilities innovation like this can have.

    As far as differentiation from Google, I think that 2 years time will show a factor resembling watching B&W TV in 1968 to seeing color at your friend's house. Hakia can find Shakira as well as Google, but will likely be able to tell you what she had for dinner.

    Thanks,
    Phil Butler

    Posted by: Phil Butler | March 23, 2007 2:07 PM



  3. Flipping through my "about us" files of other alternative search engines, I found a few that might also be of interest:

    1) AnswerBus: Their pitch: AnswerBus is an open-domain question answering system based on sentence level information retrieval. It accepts users' natural-language questions in English, German, French, Spanish, Italian and Portuguese and extracts possible answers from the Web. From the Web pages, AnswerBus extracts sentences that are determined to contain answers. The current rate of correct answers is 70.5%. AnswerBus demonstrates that practical question answering on the Web is highly feasible.

    2) And if you know Japanese, NTT's Cyber Communications Lab's natural language query technology enables them to, "make an advanced Internet search engine that quickly analyzes questions written in natural spoken language expressions, extracts, from the Internet, words and expressions that are potential answers, and places the Web pages that are likely to contain answers at the top of the list." See http://www.ntt.co.jp/cclab/e/pamph/sp/sp01.html

    3) Lexxe (alpha): has been developing a third generation Internet search engine with advanced Natural Language Processing technologies. "Lexxe" is derived from a linguistic term "Lexical", which means "related to words".
    {That's for our friend, Christopher Johnson, "The Name Inspector."} It emphasizes the processing of language from the level of words and the meanings associated with them.

    Lexxe has been exploring more intelligent ways to find information for users in a more meaningful way. They believe this method will eventually bring far more accurate and relevant search results than the current search technology. Their technology is built upon the foundation of advanced Natural Language Processing technology.

    4) SurfWax: SurfWax's patent-pending design is the first to make searching a "visual process," seamlessly integrating meaning-based search with key knowledge-finding elements for effective association and recall.

    And as for the "Semantic Web," there's 5) Swoogle (http://swoogle.umbc.edu) What does Swoogle do?
    Swoogle is a search engine for the Semantic Web on the Web. Swoogle crawls the World Wide Web for a special class of web documents called Semantic Web documents, which are written in RDF.

    Posted by: Charles Knight | March 23, 2007 2:12 PM



  4. hakia not only understands English. It also answers questions that are written in Turkish. Have a look at this query in Turkish (http://www.hakia.com/search.aspx?q=Sabanc%C4%B1+%C3%9Cniversitesi+rekt%C3%B6r%C3%BC+kimdir%3F) (Who is the president of Sabancı University?). First three answers are correct.
    I think this is not officially announced on "hakia Blog".

    Posted by: Mustafa Ulu | March 23, 2007 2:38 PM



  5. P. S. I almost forgot the most interesting one of all!

    http://www.infactsolutions.com/projects/coghog/demo.htm

    CogHog: CognitionSearch(tm) "The Next Evolution in Search"

    Posted by: Charles Knight | March 23, 2007 2:54 PM



  6. I don't agree when he says Google does not use "Semantic technology".

    He must be joking right. Semantic web is the future of the web. Google with it's PHD's certainly is not making such a mistake by avoiding Semantic technology.

    Posted by: infonote | March 23, 2007 3:00 PM



  7. I think I agree with Hakia's definition of the semantic web... at least, it's a better view to take if you want to get anything done. Assuming people will conform to standards in publishing seems like kind of a silly assumption to make. At least, I don't think that will happen any time very soon. The web isn't academic anymore... it's an extension of everyday speech and language, so people are going to treat it as such.

    Also, that first screenshot made me giggle. Web 3.0 is all these thingies getting combines? eh? ;)

    Posted by: Josh | March 24, 2007 2:29 AM



  8. For those of you reading these comments, I would HIGHLY recommend this video on YouTube: The Machine is Us/ing Us: http://www.youtube.com/watch?v=NLlGopyXT_g

    Posted by: Charles Knight | March 24, 2007 9:41 AM



  9. Richard:
    There are some not-so-correct-sounding pieces here. The use of "natural language" and semantics are a little inconsistent. For example, it is one thing to analyze the text of a web page and figure out the meaning behind it, not just treat it as a bag of words, and a different thing to search by using a natural language query. Which of the two does Hakia do? Both?

    You also mention sentence analysis of queries, but 99% of queries today are not presented in a form of a sentence, but as a set of 2-3 keywords, or a phrase at best, so statements around that don't quite compute. Furthermore, say that we all suddenly start entering our searches as full sentences - which language do we use? Which language(s) can Hakia handle? With keywords, there are fewer problems around the use of words from different languages. With natural language queries, the engine needs to know the structure of the language before it can analyze the query sentence. Same with analyzing the input text.

    Posted by: Otis Gospodnetic | March 24, 2007 9:15 PM



  10. I agree with Otis. NLP is still far from satisfying. So we use human-made semantic patterns as a way of semi-automatic annotation of semantics in a natural language question/answer. Try www.buyans.com and ask questions by selecting an automatically suggested patterns, which were provided by other human users encouraged by business models.
    Since better understanding by the machine based on the semantic annotation of the questions, precise auto-answers (which were actually provided by other users previously) can be returned.
    Enjoy using it.

    Posted by: Liu Wenyin | March 25, 2007 5:49 AM



  11. Google uses semantic technology, it does not use NLP. However Google's usage of semantic technology is used outside of the main search engine - for things like displaying those adsense blocks.

    If Google were to integrate their semantic algorithms into their main search it would produce more relevant results. Their current indexing method however is likely not tailored enough to their semantic algorithm to produce search results at the speed we are now (largely thanks to them) used to.

    Imagine you have a petabyte or more of web pages indexed. The enormous problem i'd imagine is keeping all that data whilst trying to convert it into a new index database that is more compatible with the advanced methods used by new semantic engines like Hakia and others. This is surely a headache for Google, and if they're indeed working on it I suspect it's something they've been doing for a long time so they can rebuild their indexes on a new technology away from the public eyes.

    It would explain why they seem to have been so light on making any actual improvements to search over the past few years, seeming to rest on their current laurels in that arena and push out other products like maps which are not related to the main engine.

    This is my pet theory at the moment at least, I also have been doing a lot of work reverse engineering how Google ranks its pages and am now sure that they are still relying very heavily on the pagerank algorithm, with cut off points for high keyword densities and similar factors rather than assigning actual scores to particular densities (as so many SEOs attempt to match).

    The dragon is sleeping, but that doesn't mean it's not dreaming.

    Posted by: Phill Midwinter | March 25, 2007 8:24 AM



  12. Although I trust the technology behind hakia, it won't be easy to catch up google. The company made a lot of research and improvements over the years and they are still continuing. It is getting rather difficult to compete with them.

    Posted by: Can Erten | March 25, 2007 9:06 PM



  13. Google also does sentence analysis, but it cherry picks sentences where semantic analysis is going to be useful. There are plenty of examples of this, but "the time is.." is the most obvious.

    Compare What is the time in New York and new york the time is what in. The first query triggers the "New York, US — Current local time"... answer while the second query doesn't. That's why search engines which claim they will beat Google using Semantic Web technologies are ignoring reality: Google can choose to switch in any bits of technology it finds works at any time, and Google has got more resources to experiment than anyone.

    Posted by: Nick Lothian | March 25, 2007 10:15 PM



  14. I'm always suspect any time people claim to be better than Google because this is an empty promise. What I mean is that Google has become a media company and while their search is nice, there's tons of better search solutions out there. They're not winning because of search. They're winning because they've built the critical mass of participants in a marketplace, advertisers and publishers, which enables them basically mint money these days. A better search engine does nothing for the fact that most of Google's ads appear on other sites. Try having sites remove the incumbents code of their sites and see how easy that task is. The Google game is not about search, and until these ankle biters realize that, their claims will continue to be empty.

    Posted by: P-Air | March 26, 2007 8:41 AM



  15. Very interesting. Someone once explained to me that the database people think that they can infer semantics from large sets of data; and the AI people think they they'll be able to discover the right algorithms to extract semantic meaning.

    I wish that you would have delved a bit more into the search engine side. Are they doing their own crawl? How big is their index? How do they feel that this plays into their engine? etc. Leap-frogging or gaining parity with Google must depend, to some extent, on answering these questions.


    Full disclosure: I'm a current employee of PowerSet and a former employee of Kosmix

    Posted by: Mark Johnson | March 26, 2007 11:24 AM



  16. Mustafa, I'm not that sure about that hakia understands Turkish. Try this: (where is the netherlands)http://www.hakia.com/search.aspx?q=hollanda+nerededir%3F
    No hit on first page!
    Hakia understands a little Turkish maybe...

    Posted by: Atilla | March 26, 2007 12:05 PM



  17. google is not sleeping.
    look at this.
    http://www.parc.xerox.com/cms/get_article.php?id=589

    Posted by: nara | March 29, 2007 6:34 AM



RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel
Visit ReadWriteWeb's new developer channel, ReadWriteHack, sponsored by Intel Atom Developer Program





TEXT LINK ADS



RWW PARTNERS