ReadWriteWeb

Hakia - First Meaning-based Search Engine

Written by Alex Iskold / December 7, 2006 12:08 PM / 43 Comments

Written by Alex Iskold and edited by Richard MacManus.

There has been a lot of talk lately about 2007 being the year when we will see companies roll out Semantic Web technologies. The wave started with John Markoff's article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For background on the Semantic Web in this era, check out our post entitled The Road to the Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova Spivak's recent article.

The media attention is not accidental. Because Semantic Web promises to help solve information overload problems and deliver major productivity gains, there is a huge amount of resources, engineering and creativity that is being thrown at the Semantic Web. 

What is also interesting is that there are different problems that need to be solved, in order for things to fall into place. There needs to be a way to turn data into metadata, either at time of creation or via natural language processing. Then there needs to be a set of intelligence, particularly inside the browser, to take advantage of the generated metadata. There are many other interesting nuances and sub-problems that need to be solved, so the Semantic Web marketplace is going to have a rich variety of companies going after different pieces of the puzzle. We are planning to cover some of these companies working in the Semantic Web space, so watch out for more coverage here on Read/WriteWeb.

Hakia: how is it different from Google?

The first company we'll cover is Hakia, which is a "meaning-based" search engine startup getting a bit of buzz. It is a venture-backed, multi-national team company headquartered in New York - and curiously has former US senator Bill Bradley as a board member. It launched its beta in early November this year, but already ranks around 33K on Alexa - which is impressive. They are scheduled to go live in 2007.

The user interface is similar to Google, but the engine prompts you to enter not just keywords - but a question, a phrase, or a sentence. My first question was: What is the population of China?

As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message:

"Your query produced the Hakia gallery for China. What else do you want to know about China?"

At first this seems like a value add. However, after some thinking about it - I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster - i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia's credit, it recovered nicely when I typed in:

Hakia experiments

Next I decided to try out some of the examples that the Hakia team suggests on its homepage, along with some of my own. The first one was Why did the chicken cross the road?, which is a Hakia example. The answers were fine, focusing on the ironic nature of the question. Particularly funny was Hakia's pick:

My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too. 

Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron's collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty.... but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? - which is not one of Hakia's homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

Hakia is impressive, but does it really understand meaning?

I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn't sufficient evidence yet that it really understands meaning. 

It's intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia's method for picking the top answer. That seems to be Hakia's secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings - and it is impressive, at least at first.

Hakia and Google

Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it's a long shot that they'll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints.... 

Google's string algorithm is very powerful - this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it's hard, but we also know that it is possible. 

While we are waiting for all the answers, please give Hakia a try and let us know what you think.


6 TrackBacks

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2895

Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Good analysis, I wanted to write one but now there's no need (:

    Anyway, I fail to see the difference between a 'semantic' search engine and a regular search engine. All search engines are 'semantic' in a way. If you type something like 'How do you make a hot-dog' in Google, it will give you the right answers. It won't just search for "how", then "do", etc. and compile the results. It also has algorithms which know how to decipher the order of words in a sentence and other patterns that makes our writing meaningful.

    So, Hakia should do something really spectacular to beat Google with the semantic approach. It should actually be able to understand complex sentences better than Google, and as such be a search engine for more complex tasks, for example for questions like 'I need drivers for Geforce 8800, but not the latest version'. Currently, compared to Google, it doesn't deliver.

    Posted by: franticindustries | December 7, 2006 12:36 PM



  2. What's interesting is that Ask started out by trying to create just this type of search engine years ago. They abandoned that approach in favor of a more traditional Google competitor. So can we interpret from that that Ask learned that people would rather use a traditional search engine, or was there another reason for the switch?

    This type of semantical search technology seems especially well suited to encyclopedia sites like Wikipedia or Britannica. I.e., being able to type in "What is the capital of China?" at Wikipedia and get not only relevant topic articles about China, but also the specific answer, would be great. I would love to see a semantic search engine built into MediaWiki. But web search engines should, in my opinion, direct you to a variety of relevant sources.

    I don't think I'd feel comfortable asking "What were the causes of the American Civil War?" and have the search engine only spit back one result answer (or, one viewpoint).

    Posted by: Josh | December 7, 2006 12:58 PM



  3. Josh,

    Excellent points. I really like the Wiki idea.
    In terms of single answer, I think if you are looking for a quick answer - possibly, but otherwise you would defnitely want more results.

    The other thought occurs to me is that we might not necessarily need the new way of inputing the question in as much as we need new ways of getting the answer. So in a way, I view vertical search engines, like Retrevo, as approaching the same problem but from more pragmatic and better angle.

    Alex

    Posted by: Alex Iskold | December 7, 2006 1:02 PM



  4. Greetings from hakia!


    Thanks for the review and comments. We appreciate feedback:-)


    We are still developing, it will CONTINUE TO IMPROVE as many of the meaning associations will form in time, like connecting the neurons inside the human brain during childhood. hakia is like a TWO-year old child on the cognitive scale. But it grows EXPONENTIALLY -- much faster than a human.


    Cheers,

    Melek

    Posted by: melek pulatkonak | December 7, 2006 2:05 PM



  5. Melek,

    Thats great! Please make sure it does not become self-aware. I would hate for it to experience the kind of pain we do :)

    Alex

    Posted by: Alex Iskold | December 7, 2006 2:19 PM



  6. Noted:-)

    Melek

    Posted by: melek pulatkonak | December 7, 2006 2:25 PM



  7. Hakia is promising, good to see this early review, but we'll be able to judge them only after the official debut. Bad comments > /dev/null

    Posted by: Emre Sokullu | December 7, 2006 2:55 PM



  8. Hakia sounds quite Finnish - hakea means to fetch for instance.

    Reminds a little of Ms Dewey actually, but not as, errm, Flash. :)

    Posted by: Juha | December 7, 2006 3:58 PM



  9. So, do they intend to read RDF? That is, the data about the data.

    I'd like to talk to them as it simple to read Content Labels. They can then provide users with more information about a sites *before* having to enter them... And that is based on Semantic capabilities ;)

    Posted by: Paul Walsh | December 7, 2006 4:31 PM



  10. @Juha: yes, Hakia names comes from that Finish word. See About Us section of their site.

    Posted by: Emre Sokullu | December 7, 2006 5:03 PM



  11. Paul,

    It seems to me that their claim to fame is that they do not need RDF because they mastered NLP (natural language processing).

    Alex

    Posted by: Alex Iskold | December 7, 2006 5:15 PM



  12. That's a great question you bring up though Paul. Semantic Web is really associated with RDF, thanks largely to Tim Berners-Lee's relentless promotion of RDF as 'HTML 2.0' (to coin a very awkward phrase!). So how many of these new meaning-based search engines coming on the market will utilize RDF?

    Alex is much more of an expert in these things than me, but still NLP seems to me the harder route to take - given all the difficulties AI has had in the past.

    Posted by: Richard MacManus | December 7, 2006 6:34 PM



  13. I think search engines need to focus on the social aspect. Tracking what users search for and allowing them to vote on sites. This allows them to make good decisions - to immediately understand the domain a housewife is referring to when she says soap and when a developer says the same.

    Posted by: David Mackey | December 7, 2006 7:59 PM



  14. Hmmm, doesn't like "Where can I find a good globe?" much (a recent search that hadn't worked too well for me on Google or Froogle). First link is good practice guidelines and legislation reform, which appear to use the word "GLOBE" for some reason (I can't torture it enough to make it an acronym). Granted, the second link was to an eBay auction for a globe. Third was an auction for a Lionel station light "with globe". The first and third results suggest to me that the meaning of the question hadn't been understood. Still, we're talking beta here, and it's a very difficult problem. It'll be interesting to see how they progress.

    Posted by: T.J. Crowder | December 8, 2006 1:06 AM



  15. Hello Melek,
    Hakia rocks, its a really good search experience!

    Cheers.

    Posted by: Abhishek Sharma | December 8, 2006 2:33 AM



  16. A semantic search is quite different from a text search like Google, which is not primarily based on context and the relationship between words and resources, but on the occurrence and position of words.

    If Haika really does semantic searches it could easily distinguish itself from Google by generating new content (e.g.) answers, that combine relevant unique snippets of information to a semantic result/answer to a query, as opposed to just a list of resources like the other search engines do and Haika currently does. In that case you don't have to visit the resources to get the answer.

    The query "What is the capital of Finland?", could show Helsinki as an answer and provide related answers regarding history, population, etymology, other capitals etc.

    For this capability Haika should not only be able to do semantic searches, but entity extraction as well, since RDF and XML schema's are not that widespread at the moment.

    If they can manage to do this, people won't hesitate to abandon Google, especially because the Google brand is loosing it's value rapidly because of SEO, spamming and privacy intrusions...

    Posted by: Gert-Jan van Engelen | December 8, 2006 4:04 AM



  17. I think Hakia is bluffing if it claims to be 'semantic'. I find it as semantic as Google :-)

    I tried questions like
    Why did the US attack Iraq?
    and
    Why did Israel attack Lebanon?

    It gace absolutely unrealted results which confirms that it is as good as as text search. However, when i tried the Q - "Who is Mahatama Gandhi?" - it immediately responded with a remark "See below the Mahatma Gandhi resume by hakia. What else do you want to know about Mahatma Gandhi?"

    My hunch is that Hakia guys have set up a word filter before the search query gets executed on its DB (call it a 'semantic filter' if you's like). If it contains words like 'Who' or 'What' it is set to return the 'resumes' and 'galariies' for the rest of the search terms. But that isnt what a semantic is about - the engine still does not 'understand' my question - thats just a slightly 'domain restricted' search being performed.

    I could as well have a dropdown for domain (who, what etc) before the search box and retrict the search queries myself!

    While Hakia is not bad - i wont give up my Google for it!

    Posted by: Nikhil Kulkarni | December 8, 2006 8:25 AM



  18. really? no one but me remembers askjeeves? i'm all about semantic web, but i'm also skeptical of the recycling of web 1.0 into web 2.0. gigaom & techcrunch have already covered a few companies who have tried this, and while i'm sure hakia is great, let's not pretend they reinvented the wheel. the concept isn't new.

    Posted by: geektastik | December 8, 2006 9:08 AM



  19. "but already ranks around 33K on Alexa - which is impressive."

    Impressive? Give it a break.

    Posted by: michal frackowiak | December 8, 2006 2:05 PM



  20. As pointed out in #16, a Semantic Web search is radically different from a regular search. I see no reason to believe that Hakia has anything to do with the "Semantic Web" proper, as the underlying technologies - RDF, OWL, and so forth - simply are not in widespread use.

    If the people publishing data on the web are not publishing it in a format which is intended for consumption by the Semantic Web - and most people aren't - then either Hakia has next to nothing to do with the Semantic Web, or they've made an earth-shattering breakthrough in Natural Language Processing.

    Posted by: Phillip Rhodes | December 8, 2006 2:07 PM



  21. michal,

    33K rank is impressive given that the service just launched beta.

    Alex

    Posted by: Alex Iskold | December 8, 2006 2:26 PM



  22. It's my opinion that for a semantic search engine to *really* work properly, it will have to
    a. have demographic - based parsing logic, not just language - based.
    b. know the demographics of the user submitting the query.

    Posted by: Ernesto | December 8, 2006 2:31 PM



  23. Ernesto,

    Add other factors like the stuff you like, etc. That would be more of a personalized search. I think the way to go is:

    Personalize( Semantic Search ) ==> Really cool stuff.

    Alex

    Posted by: Alex Iskold | December 8, 2006 2:36 PM



  24. Remember that Google's growth was spread basically by word of mouth not SUV megalith marketing.
    If google an upstart can do it to yahoo it can happen again.

    Posted by: Shinderpal jandu | December 8, 2006 2:49 PM



  25. This concept didn't work with ask.com, it ain't gonna work again now. It simply isn't how people search for information on the web.
    There are many ways to work search engines but I'm quite surprise we keep seeing the same thing over and over again. What we are missing are real innovations, not a second runner up of same clothes with a different name.

    Posted by: Sal | December 8, 2006 2:55 PM



  26. Ask both of them (and Ask.com) this question:
    what is 5 plus 5?

    enough said.

    Posted by: Dave | December 8, 2006 3:01 PM



  27. @Dave - duh. Things like calculating 5 plus 5 is a VERY simple matter of doing word associations with relevant mathematical operators. Something which I'm sure Hakia can achieve shortly.

    The more interesting phrases here are - as Melek mentioned above - "connections being formed cognitively" and "intelligent as a 2 year old". Is the engine behind it aware of the data it parses and spits out? What is the level of awareness then - Word associations, lexical analysis, categorization and meaning vs actual causal factors?

    Posted by: Viksit | December 8, 2006 3:53 PM



  28. Nice work, going to check out how this handles.

    Posted by: Tele Man | December 8, 2006 4:25 PM



  29. Very interesting, and props to the developers. I know it's not a new concept (as pointed out earlier, ASK did try to do it), but then again, neither was a GUI when Apple took over... these things take development -- do you know how long the concept of the Macintosh was alive at Xerox park before Jobs discovered it and furthered the development into a now-common operating system? Give Hakia (and semantic-search) a change to develop. Recycled ideas usually have merit. That's why they're recycled. They just didn't get developed 100% the first time around.

    I do, however, see Hakia as far away from success of semantics. To get the semantics perfectly, and accomplish its goal here, it really has to conquer Bloom's Taxonomy of learning and apply it to each query; especially if it is to return one (or few) valued and cross-compiled results from different sources.

    Currently, it wouldn't pass a TRUE Turing Test -- just mimics the foreign language copied from book to carry on conversation argument proposed by (insert name here, I forget it at the moment...)


    ^Wow... I just referred to like 5 things I learned last quarter in my freshman computer science classes... that felt good. Hope my thoughts make sense. Keep up the work Hakia, I really would be impressed to see success here, I just think it would have to incorporate some AI which is not looking good (from my eyes, anyway).

    Posted by: Augie | December 8, 2006 9:08 PM



  30. I think Hakia weighted W5 (Who, What, Where, When and Why) heavily in the search queries. I think Hakia is decent but I am still not too sure the difference in using semantical search or text search (if the text search query is specific enough).

    Posted by: andy kong | December 8, 2006 9:34 PM



  31. While there is some growing interest in semantics and meaning, partly due to work in the semantic web and upstarts like Hakia, the first copy of the first semantic search engine was delivered to the Congressional Research Service in 1988. I know because I was there and I installed it for the research staff there.

    In your analysis you asked: Does Hakia really understand meaning?. I think the question that has to be answered first is: What does it mean to understand meaning?. Long before you come to the turning test, you have to come to understand what the term "semantics" means and how it is used and understood by those in and outside the domain of software and computational technology practice.

    The answer to the last question you offered: Is semantical search fundamentally better than text search? depends greatly upon what you think semantical means in a search and retrieval context.

    In a word though, the answer is a resounding Yes.

    I think, in its most common and general usage (among peoples) semantics refers to the interpretation of the significance of the relationships and interactions of subjects and objects in a situational context.

    For example, the semantics of the state of affairs in modern day Iraq range over a state of civil war to extreme cases of outside insurgencies intended to deceive and delude. When the semantics are cloudy and unclear, judgments and decisions about what and how to name particular aspects of the state of affairs can also be murky. Thereby interdependent judgments or decisions become delayed or the subject of further debate. Ideally you want to present a situation such that a uniform perception emerges, with semantics (significance) that drives or guides interpretations such that those that are relevant and those with the same validity or authority prevail.

    As the Bush administration has demonstrated, the process, the presentation, the semantics-- can become political and highly charged. When questions of significance persist, that is, questions ranging over the signifier and signified in a given situation, uncertainty, lack of clarity and disarray blur and obscure any significance and generally erode confidence and delay action.

    This is not the kind of semantics the Semantic Web and AI technologies proclaim. In their quest to share and exchange information, they want just enough semantics to normalize data labels between systems so that they are able to exchange information and be sure they are referring to the same items in the data exchange. They want to use named references, with authority of course. In fact, they strive to clear and unambiguous semantics --a foreign concept to the Bush administration.

    But semantics has to do with the significance of interpretation. What is significant in our experience of the search and retrieval application. What is of significance in the results of the search engine? Relevance. The benefit of semantic search is greater relevance. For Hakia to be relevant, it has to offer more relevance than Google. A semantic search engine should also offer more-- in my opinion.

    A modern language semantic search engine should offer more than relevance. It should offer insight. Rather than fixing semantics to simple categories for easy exchange, a truly semantical search engine should aid and assist one while exploring topics. It should help to relate language to abstract ideas instead of just connecting the keywords, names and nouns.

    Posted by: Ken Ewell | December 8, 2006 11:32 PM



  32. No,It is not better than google ,type the ame questions in google and you wll get better answers

    Posted by: jyotheendra | December 8, 2006 11:37 PM



  33. Gee golly, as far ahead of me Ken Ewell is in every sense of technological knowledge and understanding, I have to say... You went way off topic just to make a point about the Bush administration... I get so sick of that.

    Of course semantic search is better than connecting language parts. People may not think it's better, but I argue that they only feel that way because they are used to searching with boolean operators and combinations of keywords. Everyone knows WHAT SPECIFICALLY they want to find, but some people have trouble putting their question into acceptable and successful search terms... Imagine never having to phrase a question specially for a search engine: just type what you're wondering, and have an instand answer.

    Much easier than combining keywords with booleans to try to simplify natural language to "search engine" language!

    PS -- No offense to you, Mr Ewell -- I really do respect that your technological insights and opinions are worth 10 times my own because of the knowledge gap; I guess I just got really sick of seeing more politically charged comments in non-related areas... I'm just sick of politics all-together right now, I think. Not trying to start a flame-war or anything! :)

    Posted by: Auggie | December 9, 2006 1:36 AM



  34. Great job done by hakia

    I got the perfect answers to my questions in the top 3-5 links and this saved a lot of time.

    I am impressed

    Posted by: priya | December 9, 2006 11:42 AM



  35. What about Chacha.com? they actually have guides who help you with your search.

    Posted by: Tori | December 9, 2006 3:26 PM



  36. Unfortunately, Tori, I was unable to ever get a guide connected to use, but I do remember trying that out a few days ago and thinking it was a pretty cool concept... as long as they don't charge you for it ever! Could you connect to guides?

    Posted by: Auggie | December 10, 2006 1:33 AM



  37. Guides worked for me.

    Alex.

    Posted by: Alex Iskold | December 10, 2006 6:15 AM



  38. Looks like there's a /very/ long way to go yet. Given that "what is the capital of china" is semantically ambigous, I tried to be helpful:

    what is the administrative capital of China
    what is the administrative capital of the United States of America
    what is the administrative capital of the USA
    what is the administrative capital of the US

    Unfortunately, Hakia provided irrelevant answers to all four questions. Google got 4/4.

    Given the apparently overwhelming power of Google's indexing algorithm and the extent of their dataset, a semantic-based search facility such as Hakia may have to seek a qualitatively different area of search in which to make a contribution.

    Posted by: Graham Higgins | December 10, 2006 7:33 AM



  39. Ref: # 35

    Tried the so called ChaCha.com forget about getting any good result, it felt like I was doing a chat!!! Users around the world have limited attention period. Getting best (no precise) results with minimum efforts - that's the key. Advanced search and Personalized search have been there for long time with no good impact on users.


    Hakia - doing good work, but it's too early to say something concrete. In addition, I would not like to accept that Google doesn't have sementic features in their search algorithm. I'm sure they are working on it or looking out for something good (startup kid).

    Posted by: Dhruba Baishya | December 16, 2006 7:24 PM



  40. props to geektastik for doing what the author failed to do. Mention askjeeves.

    Posted by: Bog | December 19, 2006 9:41 AM



  41. I mention Ask Jeeves in the second comment. ;)

    Posted by: Josh | December 23, 2006 5:10 PM



  42. This is good example of success of hakia
    why dont people tell their salaries?

    Posted by: Anonymous | January 3, 2007 2:14 AM



  43. The main for Hakia is that Google is not standing still. G has a secret project which I feel must be to do with semantics.

    BTW - Google does not use any knowledge of semantics for translation. We have from Google.

    El barco attravesta una cerradua - un vuelo de cerraduras - La estacion de ressorte - jogar de puente

    The last is particular annoying. My daughter plays for England and I when I try to search for "Bridge" I am overwhemed with sites on civil engineering.

    I specifically tested these.
    with Hakia

    The locks on the Grand Union Canal
    Spring flowers (primavera) Springs in Gloustershire (mamanthal)
    Bridge tournaments

    The results on the whole were satisfactory - much better than Google. Understand is a difficult word to define. My definition (bueno espagnol) is the difference between Primavera, Ressorte, Mamanthal. In other words can we use our "understanding" in an operational way. My view is that precise definition + a large enough database = Turing. To some extent Hakia appears to do this. It must be the future. The fly in the oitment is what Google is doing.

    Posted by: Ian Parker | January 6, 2007 5:27 AM



The ReadWrite Real-Time Web Summit
RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS