By Dr. Riza C. Berkan, Founder & CEO, hakia.com
Editor's Note: This is a guest post by the CEO of Hakia, Dr Riza C. Berkan. I want to stress that this post is NOT an advertorial - in fact I made it a condition of publication that the post should focus on the theory of semantic search and it should mention Hakia's competition, both of which Dr Berkan has done. I should also mention that while Hakia ads sometimes appear on this site, they are managed separately by FM Publishing. In any case the reason for this post by Dr Berkan is purely to explore the topic of semantic search and try to get a conversation going. Semantic search is seen as one of the next-generation search methods that may challenge Google, so the idea with this post is to understand it better - and perhaps debate its future in the comments.
How satisfied search engine users are today is an on-going debate. However, there is wide consensus, from a scientific viewpoint on the competency of the current search engines: They are half-way to the target and there is huge room for improvement. Semantic search is now under the magnifying glass and the question is "can semantic search be an antidote for poor relevancy?"
Let's start with "what is semantic search?" Academically speaking, semantic search ought to be a system which understands both the user's query and the Web text using cognitive algorithms similar to that of the human brain, then brings results that are dead on target (right context) at first glance (not requiring to open the Web page for further investigation.) There are several ideas on how to build such a system.
But before looking into these variations, let's clarify one thing. A semantic system cannot be called "semantic" if it does not encapsulate the knowledge of languages. From this very basic fundamental requirement, we have to exclude all those fancy algorithms that rely on collecting statistics of links, symbols, words, clicking behaviors, and so forth. Statistics is a tool, not a model of a solution. To go the distance, we need a deterministic model of a language processing solution. We need algorithms that match the meaning of concepts (rather than mere words) and emulate "understanding."
For example, the query "what is Palladium useful for?" may bring search results related to the London Palladium Theatre by statistical methods (a popular subject) as opposed to the actual meaning of the query which is not very popular. A semantic algorithm can easily identify that "useful for" implies the element Palladium.
The two basic views of a semantic search are identified by the location of the semantic resources to be implanted. The first view is to embed the semantic resources in the Web pages themselves. It is called the "Semantic Web". Why not compose Web pages in a structure that is semantics friendly? The second approach is to locate the semantic resources in search engines which deploy algorithms that use them. This is called "Semantic Search Engine" and works on any text.
The "Semantic Web" approach has been around for a long time now. Unfortunately, it is based on an unrealistic assumption that every Web author will abide by the complex rules of semantics - not to mention the education it requires - and place content in the correct buckets of mysteriously unified standards. Another form of this approach may be to design Web factories that crank out refined Web pages once fed by ordinary Web pages. Of course if there is more than one factory, you have the standards issue again. In this day and age of fast content production, the Semantic Web seems to be more idealism than realism.
The option of "Semantic Search Engine" has yet to be tested. My company hakia, along with others like Powerset, Cognition Search, and Lexxe are taking steps in this new direction. There are challenges with this approach as well. First and foremost, the knowledge of languages must be built in a structure that would allow a scalable and speedy search process. Building such resources is an expensive, tedious, and time consuming endeavor. Then, all the Web pages must be analyzed using this system to prepare for a retrieval platform; another time-consuming process. But when all of this is done properly, the users will start to experience something totally new. Let me emphasize the word "properly" here, which is an entirely new discussion point.
One of the first impacts of semantic search engine will be on the handling of long-tail queries. Without relying on statistics, long-tail queries can be analyzed by semantic algorithms on the fly, and bring search results with the accurate context. With such a capability, we are talking about finding answers to longer than usual, complex, and unpopular queries.
Let's make no mistake about it. The long-tail is the bottom part of the iceberg under the water. Philosophically, the number of long-tail queries is infinite where as the tip-of-the-iceberg queries can fit on one large hard drive. Popularity algorithms fail at the long-tail queries (by definition) because there is never enough statistical sampling.
Many people are not realizing the fact that long-tail queries are partly personal queries (uniquely unpopular and complex reflecting individual personalities.) Thus, the idea of "personalized search" actually requires semantic capabilities without the need for tracking the user's behavior unless it is been tracked for psychological profiling.
In a similar argument, queries against dynamic content are also long-tail queries. Because dynamic content, like news, decays its value very fast during which there is no time to collect statistics. By the time the link referrals are made, or click statistics are collected, the content is no longer in demand. Therefore, a semantic approach is very effective in handling dynamic content and can unleash its full power the second the content is born.
Semantic search is definitely an antidote for poor relevancy; but only time will tell how well this can be done.
I will close this with a few commonly asked questions.
Q. How can a semantic search engine recognize a popular Web page compared to an unpopular one for a given query term(s)?
A. A semantic search engine recognizes the correct context for a given query term(s). Once the context is correct, popularity becomes irrelevant, and credibility must be questioned. Credibility of a Web page is a relatively easy task to detect. As a result, if you have the correct context from a credible source, the job is done. You can test this logic for any query today. The popularity method is a replacement of these capabilities as a crude approximation.
Q. If the user types "madonna", how would a semantic search engine understands the intent of the user? (i.e., is it the artist, or the religious figure?)
A. Semantic Search engine is not a psychic. Thus, attempting to guess the intent is futile for an under-represented query. But the solution is easy, just to give back the user search results of all possible senses of the word. Even better, categorize them neatly. This is within the design envelope of semantic search engines.
Q. How can a semantic search engine be manipulated by spam pages?
A. If done properly, a semantic search engine cannot be manipulated by text. Because it specializes on detecting the right context, the spammers will have to put the right context for the right query; which is no longer spam per definition. The abuses related to image and video are possible. But these kinds of abuses are common today and can be detected in different ways.
Q. Will semantic search take over today's search engines?
A. In the long run, they most likely will. Again, this depends on how well they are done. Once the long-tail searches start to show the difference, then it will probably have a domino effect. If people are satisfied in the complex query domain, they are more likely to switch for simple queries as well. Let's remember that there is no cost to switch.
Q. There were previous failed attempts of natural language search engines. Why would this work now?
A. Natural Language is a wide term that includes all sorts of things. Previous attempts have failed mostly because they were not done properly, and methods used were not based on proper semantic principles. Some of them were merely statistical methods very similar to the conventional search engines. Others were behavior tracking AI applications. And some relied on human labor to keep up with question answering. There are so many ways of doing it improperly, and only one way of doing it right.
Q. How would the advertising systems be affected by semantic search?
A. The impact will be very big, perhaps more than the search itself. A semantic advertising system, which can detect the right context most of the time in a consistent manner, means a huge jump in ROI.
Q. What is the single most drastic problem in front of semantic search today?
A. Misconceptions and hype. Business continuity must rely on honest declarations of what is to be expected in accordance with the pace of development. Semantic search is a difficult technological endeavor; it takes time and patience. Investments with short-term agendas will hurt this newly emerging technology sector.
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2217
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Dr Berkan,
Thanks for this highly informative post. Richard said in his intro that the purpose of your article is to get a conversation going, and I'd like to engage with you about the methods for imbuing a search engine with understanding. As you rightly point out, attempting to guess the intent is futile for an under-represented query.
Do you have a take on the frequency with which a query is 'represented' enough to generate a semantic meaning? And do current efforts towards semantic search engines rely on sufficient language in the query to be able to interpret a context?
I work for VortexDNA, whose MyWebDNA technology measures relevance according to the user's core purpose and values, and I used your article as the base of a post this morning. I would be highly interested in your opinion as to whether these sorts of technologies might work together to create adaptive and intelligent search systems.
Thank you and best regards,
Kaila Colbin
VortexDNA Blogger
I wanted to have Quintura included to a list of so called semantic search engines since we are very familiar with the problems underlined above and use active semantic neural net techniques to solve them.
A nice, concise introduction to semantic NLP.
Well done.
According to this post, there is a need for understanding language.
The problem is that most of the world, specifically the long tail, is not native English speaking. If you need to understand the language (probably, but not only I assume, with Part of Speech (POS) tagging) having a misspelled and/or grammatically incorrect and/or syntactically incorrect text that will be common in non native English speaking web sites will pose a big problem.
That's the content side of the problem. If we will take the user side of the problem - the one searching - the text the user will use might suffer from the same problems and in that case since search terms are relatively short (currently most of them are composed from one or two words - at least from the last statistics that I saw) it would be even harder to figure out what the user wanted, not to mention figure the exact possible concepts that the user have in the search terms.
How is this problem with incorrect non native English handled in both the content and search side and to what extent it may interfere with the search for non native English speaking users?
Thank you for this interesting post. Cross-Language Search goes a step further.
What is cross-language search?
A1. The question is being translated: Examples of companies that do a simple one-on-one translation: Eurospider, Convera, Google Cross-Language, Temis.
A2. Content is indexed across several languages ergo the question is being asked across several languages: InfoCodex.
Shortcomings of A1.:
**********************
a) The translation is never sharp. What meaning does "Automobile" have, what are the possible translations into German:
- Automobil
- Auto
- Kraftfahrzeug
- Motorfahrzeug
- Personenkraftwagen
- PKW
etc.
b) What meaning does "Insurance" have when translated into German:
- Versicherung
- Haftpflichtversicherung
- Assekuranz
etc.
c) What meaning does "IT" have when translated to German:
- IT
- Informatik
- EDV
- Elektronische Datenverarbeitung
etc.
The A1 solution will always only give you just one solution for a translation of a word even though there are always several translations of a common word. With a linguistical database you do not have this problem, i.e. your search result increases.
Advantages of A2:
*******************
a) A linguistical database with synonym groups will help you put a document or a search term into context (Verschlagwortung).
b) A linguistical database with a taxonomy will help you do more then just a one-on-one translation.
c) With a linguistical database you can search in more then just one language. You can search in 5 languages at the same time.
See: http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure
Natural language search on the web is another decade-old Holy Grail. I'll check back in when Hakia can tell me how to shoot an arrow on the first page of hits. Amusingly, the only sponsored link was for Ask.com which sent me straight to the Beginner's Guide to Archery.
This is very interesting and I hope that this technology is able to move forward despite the rather large obstacles in its way.
I want to make a few comments to see how others see the usefulness of this technology.
1. Let's say that someone perfects some algorithms that make semantic search viable (in most user's minds). Given the incredible growth of the web which does not appear to be letting up, a semantic search can still end up providing way too many responses as time goes on. While narrowing context should help, I expect that at a certain point, the difficulty of adding the necessary context will outweigh the benefit for most users. That is, I see the enormous growth of the web as the primary difficulty that search engines need to confront. (do the following experiment: pick a search, do it on Google and watch it month by month for a year) As a result, I expect that semantic search (if it comes about and I hope it does) will eventually loose its initial utility and that some new technology will be needed to help us negotiate the web (if that is ever going to be possible).
2. I think that a better way to approach semantic search is domain specific rather than general. Specific domains, clothing, furniture, consumer electronics, healthcare, algebraic geometry, etc. should be far easier to model because the structure of the contexts that need to be constructed in order to provide good search outcomes are already well understood and can probably be formalized more easily than the general problem. In that sense, the vertical search engines that are mentioned here so often would appear to be better candidates to be "semanticized" than the more general engines. Another way of saying this is that approaching things in too much generality has rarely if ever turned out to be fruitful. I think the key is to find domains that are narrow enough to be able to provide a robust formal definition.
Dr. Berkan,
It is always a pleasure to see your hard work in not only building such a complex and interesting work of technological art, but in attemtpting to guide the concept and worth into the "sometimes" unwillling minds of others.
I am endlessly distraught at the number of people who just seem to not want "anything" to work. We are not talking about "Linear A" here (Ancient Minoan text yet to be adequately deciphered). Linear B (Micenean) was thought impossible to decipher until the Rosetta Stone was unearthed in Egypt. Hakia has a language(s) fully operable with unlimited contextual references to (simply) filter thru the (right) system.
Dr. Berkan's most important statement here is: "But when all of this is done properly, the users will start to experience something totally new. Let me emphasize the word "properly" here, which is an entirely new discussion point."
I have no doubt that Dr. Berkan understands what the term "right" means, but I wonder at 80 percent of the rest of the world sometimes. A search at Hakia right now will render no greater efficiency than many of the other engines. The point is that is is not built yet! The Hakia site is there as a reminder, a symbol and as an outlet for showing developments.
I know there is no remedy for cynicism and negative analysis, but I often wonder how much more amazing and fast developments would be if people looked for ways that innovation "can" make a difference rather than the abyss of "can't" do attitudes.
Of course there are the agendas to be considered. For those that cling to the Adsense and SEO apple cart consider some out of the box thinking. Take RWW for example, I think we can agree it is one of the best sites on the web. A web like Dr. Berkan decribes would render RWW even more valuable (or any site with the juice) than it currently is.
How so? Because content, value and excellence will stand for something again. Adsense may go the way of the DODO but it might be replaced with much more lucrative, well done ads and revenue models based on what content is really worth. There is no need to panic, searching for MySpace can still be accomplished, the semantics wlll just have to be altered slightly. Instead of online community one might type in "Dog and Pony Show".
Sorry guys, I just see it working. I expect the language issue is a simple download of some codec into our friendly language center on the constructed AI brain. Like teaching a human via a probe into the right hemisphere of the brain.
In a similar way Dr. Berkan is essentially constructing the corpus callosum (middleware) connecting the right hemisphere (which he is also building to some degree) with the left hemisphere (or what Google and others use). As Sperry said back in 81:
"The great pleasure and feeling in my right brain is more than my left brain can find the words to tell you."
Roger Sperry
http://nobelprize.org/educational_games/medicine/split-brain/background.html
Best regards and positive visions.
Phil Butler
Kaila Colbin from VortexDNA - The frequency measurements of the query do not shed light to the meaning of the query, rather it tells us how common the query is. Commonality does not help analyzing the long-tail queries. If I ask you a question that you never heard before, you can still process it. This is how semantic search ought to work, independent of historical occurences. Yes, these capabilities are the basic pre-requiste for intelligent adaptive systems.
Yakov - Yes, Quintura must be added to the list. Sorry for this mistake.
Eran Sandler - Detecting context is more involved than POS, tags and syntactic methods. It requires ontological treatment of concepts, and text-meaning-representation (TMR)models. Now, the incorrect use of English is a different matter and some level of corrections to a single word can be made by looking at other words in the sentence to detect the right context. I agree corrections is a big issue, but I would not go so far as to claim most long-tail queries are non-native use of English.
Zeno Davatz - Cross language search is an interesting problem. Machine translation is more complex than semantic search. Search is one-to-many mapping whereas machine translation is one-to-one mapping. The tolerance for failure is less.
Micheal - If you look at the results, the detection of the concept (shooting an arrow, and how to do it) is correct, but what is missing is the coverage of the Web sources to encounter more relevant context. It takes time to cover all the pages on the Web. hakia is not perfect, but for this particular case the system will correct itself as it encounters those pages you are looking for during crawling .
Ivan Handler - Semantic search is already domain specific. The ontology it utilizes has all domain specific attributes, properties, concepts, taxonomies, etc. The fact that the system is deployed on a general search engine platform has nothing to do with what is built inside and how Web data is organized. Your observation about the Web content propagation is correct, however calling for another technology beyond semantic search is not quite right if you assume "domain specifity" is not in the envelope of semantic search technology. Actually, semantic search is all about specifity.
Phil Butler - I appreciate your encouraging words and your vision of the future. I think the IT world facing this unavoidable turning point where "short-term" success ideas are increasingly depleted and now it is time to enter the next phase of more scientific advances. Naturally, people in control by means of short-term success are having difficulty in this adaptation period. Semantic search may be the first to exemplify this transition. Fortunately, we are not alone in this journey.
Dr. Berkan,
thanks for your reply. I think I was not as clear as I hoped to be. I can not understand how "semantic search is already domain specific." This implies a lot of work. You seem to indicate that there is a single ontology that parameterizes all specific domains. My understanding is that is where Minsky and other AI researchers got stuck in the 70s. I, for example do not see how one can put say consumer electronic and algebraic geometry together in a practical matter. The domains are structurally quite different.
If you mean that once a domain has an ontology that "works", then the search engine can take advantage of it, then I am with you. But then there is a whole lot of work to do for each domain. In that case (the realistic one if I am following along correctly), semantic search will proceed as domains are formalized. This will be a massive undertaking, though certainly worthwhile in my view.
I am not seriously calling for another technology beyond semantic search, I am just pointing out that semantic search will also eventually be overwhelmed by the growth of the web. I have no idea when that will happen and what we will do in that case ("necessity is the mother of invention" and we aren't there yet).
Ivan Handler - Yes, we are pretty much on the same line of thinking. I did not deny it is a massive undertaking to build such ontologies and more importantly ontological parsers that distill the semantic associations embedded within. Another comment, AI is mainly focused on solving a given problem without dealing with its underlying mechanism, rather creating a black-box model using the input-output data, or the rules of the most common behaviors of a process. Being a former AI person myself, I know its shortcomings pretty well. An AI system, for example, can guide an airplane in landing based on well studied data. Therefore, landing an airplane is not a long-tail problem. Understanding language across the board is a long-tail problem. It cannot be modelled purely by data (or statistics.) Computational linguistics and AI, unfortunately, have very little in common, historically speaking. That's where the dots are not connected well. AI will get stuck in dealing with languages unless it absorbs underlying semantic principles. That has been my opinion so far.
Dr. Berkan,
thanks for your reply. I think I was not as clear as I hoped to be. I can not understand how "semantic search is already domain specific." This implies a lot of work. You seem to indicate that there is a single ontology that parameterizes all specific domains. My understanding is that is where Minsky and other AI researchers got stuck in the 70s. I, for example do not see how one can put say consumer electronic and algebraic geometry together in a practical matter. The domains are structurally quite different.
If you mean that once a domain has an ontology that "works", then the search engine can take advantage of it, then I am with you. But then there is a whole lot of work to do for each domain. In that case (the realistic one if I am following along correctly), semantic search will proceed as domains are formalized. This will be a massive undertaking, though certainly worthwhile in my view.
I am not seriously calling for another technology beyond semantic search, I am just pointing out that semantic search will also eventually be overwhelmed by the growth of the web. I have no idea when that will happen and what we will do in that case ("necessity is the mother of invention" and we aren't there yet).
Dr. Berkan,
Thanks for mentioning CognitionSearch (http://www.cognitionsearch.com) in this article. We agree that search can be drastically improved with meaning-based semantic search and along with Hakia, feel we have a foundation that support the case you are making!
Brian Maser
Cognition Technologies.
Hi Riza,
You say, "The frequency measurements of the query do not shed light to the meaning of the query." While that may be arguable, I'm really curious as to how you do hakia's "Spelling Suggestion." Is it a simple look-up or frequency-based a la Google? Are you at all willing to inject frequency-based insight into the haika process, without which the search would not even begin to get the approximate context for a semantic search in this case? Is it forever to be frequency | semantics, or some combo?
Also what's 'hakia'?
I agree that the semantic search needs to be done in the right way. Can any one give some examples of academic papers or books describing the "right way" algorithms for this job? I am mostly interested in "understanding" the web text.
Just wanted to simply say that I enjoyed this post immensely. Kudos to you Dr. Berkan.
A semantic system cannot be called "semantic" if it does not encapsulate the knowledge of languages.
Statistics is a tool, not a model of a solution. To go the distance, we need a deterministic model of a language processing solution. We need algorithms that match the meaning of concepts (rather than mere words) and emulate "understanding."
------------
But how does the human brain understand the meaning of words? It's all statistics at the end of the day, we learn by association and words said together most often start to build our understanding surely? The other option is we are told what a word means, which is also statistics because then we compare it to what we previously know. So by saying that we need algorithms that match the meanings of concepts what you're actually doing is completely contradicting your initial argument.
As for the 'encapsulating the meanings of languages', the semantic technology you build shouldn't be focused on just one language because that means you're not building semantic data. It means that somewhere you've hardcoded grammar and syntax, semantic analysis should retrieve that without aid.
I saw an article trying also to demonstrate the stupidness of current search engines:
http://www.otherworldvision.com/technorati-authority-google-pagerank-v001/
The following quote is from the article/blog above:
‚ÄúTo go the distance, we need a deterministic model of a language processing solution. We need algorithms that match the meaning of concepts (rather than mere words) and emulate ‚Äòunderstanding.‚Äô ‚ĶThere are so many ways of doing it improperly, and only one way of doing it right.‚Ä?
Are you saying that you have discovered the ‚Äúone way‚Ä? of doing it right? Given the challenges of natural language processing and artificial intelligence, this seems unlikely.
Doug Lenat and his team at Cycorp (http://www.cyc.com/) have been working on these problems for more than twenty years. Doug’s presentation at a Google TechTalk from May of 2006 provides a very good overview of the current state of natural language processing: Computers versus Common Sense
http://video.google.com/videoplay?docid=-7704388615049492068
It is quite possible that hakia has created some very good algorithms that evaluate semantic relationships within blocks of text and it is also quite possible that these algorithms will enable hakia to create a superior search engine. I understand the business requirements for proprietary knowledge. I hope in the near future hakia will be able to reveal more about its technologies.
Another quote from the article/blog above:
‚ÄúThe ‚ÄòSemantic Web‚Äô approach has been around for a long time now. Unfortunately, it is based on an unrealistic assumption that every Web author will abide by the complex rules of semantics - not to mention the education it requires - and place content in the correct buckets of mysteriously unified standards.‚Ä?
The criticism of complexity was certainly true of the semantic web standards released in 2004. That complexity combined with a lack of tools is the main reason the semantic web fizzled.
In January of this year the W3C released a working draft for a simplified standard: RDFa. This new standard will enable semantic attributes to be embedded directly into XHTML. A clear introduction to RDFa can be found at http://www.w3.org/TR/xhtml-rdfa-primer/
We believe RDFa, along with a supporting tool set, can form a bridge to the Semantic Web envisioned by Tim Berners-Lee.
Semantic Bridge Technologies http://www.semanticbridgetechnologies.com is a startup company located in Austin, TX. We are creating a tool set and the supporting infrastructure for the implementation of the Semantic Web. We are taking a very pragmatic approach. Our target audience is comprised of web designers and software engineers who build Internet applications not theorists who study semantic structures. We are building a bridge, not an ivory tower.
One of the key aspects of the Semantic Bridge Project is the creation of a dynamic and interactive ontology management system, ‚ÄúThe Semantic Knowledge Repository‚Ä?. This system along with the tools that will allow web designers and software engineers to easily interact with the repository will have a profound impact on the rapid deployment of the Semantic Web.
While our approach (the creation of tools and infrastructure for the semantic web) is different from hakia‚Äôs (the creation of a semantic search engine), we most likely share a common belief: Google will soon become the ‚ÄúCommodore 64‚Ä? of Internet search (cool technology for about ten years that was supplanted by something much better).
Sincerely,
Mike Duffy
CEO / CTO
Semantic Bridge Technologies
mduffy [at] austin.rr.com
I have been following the field of
natural language understanding, which
is what semantic search requires, since
1960, when I was in HS. I worked for one
of the pioneers at MIT, did my MS thesis
on the subject at Penn, and was the
doctoral dissertation advisor to a
student who worked on the topic at
Columbia.
I would love to see it happen, but
haven't seen any successful approach
presented in 47 years. I also have some
pretty good ideas as to what's so hard
about succeeding.
Dave
PS Google's translations aren't anything
to brag about yet.
I have to disagree with the idea that there are two basic views of semantic search.
It is surely true and Dr. Berkin is right that there is a dichotomy between search engines that include the semantic reasoning in the search or query engine and the semantic web type that use tags and other structures to add some sorts of semantics to the source page structure. It divides the field into the semantic web and semantic search in a way, but is does little to illuminate the views or the debates about semantics.
These divisions run deep and span the fields of metaphysics, philosophy, human psychology, logic, language and mathematics. There are not just two but multiple views and one problem is that technology is only built on one of those views. This is not the place to expound upon them though I will offer that there are few technologists or developers pursuing models built on frameworks outside of NLP or AI constraints.
I also disagree that the semantic web is based "on an unrealistic assumption that every Web author will abide by the complex rules of semantics". Not that I am any fan of the semantic web hype, the fact is, ontologies like the Dublin Core are not complex and can be applied without the author agreeing or abiding to much more semantics for his work than those demanded by everyday convention and best practices. These are not real human semantics though.
By that I mean these semantics are about standardizing document elements and thereby the semantical relationships between documents using those elements, e.g, title, author, summary, etc. Let us call these "quasi-semantics" here so I can draw a distinction between quasi-semantics and real or human semantics-- the kind that distinguish significance and relevance from noise. Quasi-Semantic markup tools of the former sort will become ubiquitous like HTML markup tools have become. Yes Riza, I predict that there will be WYSIWYG quasi-semantic editors in the next few years.
And I have to say I did not get the parts about the long tail query. The long tail is a statistical distribution of event-items, I fail to see the connection to semantic analysis but it could be that Dr. Berkin has a special kind of semantics for long tail comparisons or something. I was really lost when I read: "With such a capability, we are talking about finding answers to longer than usual, complex, and unpopular queries." That implies something entirely different in my view.
After reading the next few paragraphs though, I guess the argument is that working on the capability of getting relevant answers to longer queries, coupled with the fact that semantic analysis is a useful tool for the first-hand analysis needed for today's more dynamically changing content. Somehow these measures support semantic search as an antidote for poor relevancy.
As I started reading I was hoping the article would add information to the social consciousness about the details of semantic reasoning and how a semantic search can be conducted. Instead, I read a weak argument that better relevancy is the reason for semantic search. Who would a thunk it? I was expecting an article that illuminated how we might distinguish significance from an otherwise noisy environment with many signals impinging on our senses. How is that sort of human semantics captured in the algorithm for query and text analysis?
My conception of semantics for query and text analysis has to do with filtering, comparing and contrasting the principle concepts and topical categories often implicit of queries and texts. There is a common way, a human way, of doing this and it shows up in conversations between regular people. The semantics are not personal-- they are inter-personal.
This kind of semantics can be seen when someone makes a statement like ‚Äúlet‚Äôs not argue over the semantics‚Ä?. In such a situation, it usually means one of the parties‚Äô objects to the way some term or language is being used as a characterization or representation for a particular object or state of affairs. This can happen when you are reading a text as well. You may not agree with the author‚Äôs choice of words and more likely, you do not agree that a particular word is the proper sign in a certain case. When examining hits from a search engine, you do not agree they are relevant.
While processing text, a semantic search engine must deal with an interconnected system of meanings, including; ideation, interpersonal semantics, having to do with control and exchange, and also the textual semantics concerning how a text may be constructed, e.g., title, author, publication date, etc.. Coincidently, I recently posted an article about this kind of semantic search for anyone who wants to know more (http://commonsensical.wordpress.com/).
The question and answer part did contain some useful information and I while I do not see it as the single most drastic problem, I otherwise agree with that last question and its initial answer so much that it is worth repeating:
Q. What is the single most drastic problem in front of semantic search today?
A. Misconceptions and hype.
The single most drastic problem, the hard problem though, is the lack of a unified semantic theory.
-Ken Ewell
See there Riza, I am not the only one who spells your name wrong. :)
I agree however that misconceptions and hype (and down right hard headedness) threaten most of what we attempt. Perhaps it has always been this way as I remember people in our neighborhood saying things like: "That sucker will never get off the ground" when the Apollo missions started.
I bet $100 that 30 years from now hakia's and other developer's innovations in AI, and language/knowledge processing will seem elemental to the next wave of creators.
Thanks for continuing this wonderful dialogue..
Ziya Oz - There is statistical medicine. But, in the same time, there is knowledge of medicine that is fundamental, deterministic, experimental, etc. You can use statistical algorithms (like popularity tracking) with certain success. Just like the prognosis of a medical condition. But you cannot operate on a patient with purely statistical knowledge. That is the difference between statistical search versus semantic search. If you want to go the distance (long-tail) the semantic approach cannot rely on any statistics where statistics fail by definition.
Shahar Peleg - Our sources may help http://www.ontologicalsemantics.com. We are actually updating this site very soon with much better information.
Niall Larkin - Thanks for your support.
Phil Midwinter - The human brain learns by repetition, which reinforces synaptic junctions. About 10 billion neurons, each having 10,000 connections, we are talking about a massive associative memory. The artifical intelligence topic "neural networks" have modeled this process with success. It is not a statistical method. It is a nonlinear mapping technique. I recommend you review this topic to realize that not every repetition implies statistics. Ontological semantics works is a similar but more deterministic manner. The hierarchy of concepts, word senses, and their connections alltogether is an xray image of the human's cognitive process. Not all xrays are good. Thus, it is a matter of art and science to put it together correctly. Last comment: Ontological Semantics is language independent.
theothereye - I disagree. Current search engines, especially Google, are a remarkable achievement of using so little knowledge to solve a cumbersome problem successfully. But they have a limit because of the limit of foundation they are built on.
Mike Duffy - I congratulate you on your achievements with Semantic Bridge technologies. When I said there is one way of doing it right, I did not say there is one person/team doing it right. But at the end, all different teams will eventually converge onto a very similar solution. That did not happen with Semantic Web so far, according to my personal view. I heard a good comment lately: new technology is not supposed to build barriers, it is supposed to tear them down (I cannot remember who that was, please identify yourself.) Every standard issued from some centralized authority is a barrier. Otherwise, we would all be speaking the same language, programming with the same language, downloading the same media format, using the same brand cell phone, etc., etc. Nothing like that has ever happened on the face of the earth, and I cannot see how it will happen now: billions of people will abide by the rules of the new Semantic Web publishing standards? I am sorry, I am stuck in this passage and my mind refuses to bend. But if the claim is that 5% of it will do it for the most popular subjects, then I can start to agree with the vision while remembering the current success of Google in that envelope.
David Klappholz - Wonderful comment. We think we are doing it now, but it will be experts like you to decide at the end.
Ken Ewell - I respect your disagreements. However, if we start the "Quasi" world of things, then the discussion platform will be waxed and greased, no one would be able to stand on it. Let me bring this to your attention that a tag like TITLE, AUTHOR, ect., does not even begin to dig into the first layer of semantics. We should be talking about multi-layer connectionism where concepts and relationships are accurately presented, TMR models that disambiguate between the different senses of the words based on context are discussed. Actually, you have identified this yourself later in your comment. But being so forgiving to a surface scratching form of semantics is a bit different style for me. Last remark: there is a unified theory of what is semantics, which can be found even in the Websters' dictionary. What is not unified is how to deliver it to the technology world.
Phil Butler - I guess my name was Quasi spelled right. It does not matter at all, I enjoy these comments, favorable or not.
I think perhaps of more benefit than us arguing about the problem, perhaps we should come together to set up a standard. I'm a firm believer in not retagging the entire internet, but if it must be done then we should be working to do so in a fashion that both parties can use successfully.
I am so very sorry for spelling you name wrong Dr. Berkan. Perhaps some semantic search engine will connect the two spellings given the context. I cannot predict whether that will be meaningful or not. :)
I am also sorry, and I must get my eyes checked but I could not find any description or statement about a unified theory of semantics in any of Webster's on-line products. I checked my bookshelf and also came up empty. I must be completely in the dark.
Would you be so kind, Dr. Berkan, to offer me and this forum a statement of a concrete theory of meaning; not a definition for semantics- as we all know those definitions.
You do not have say how you implement it, just tell us, if you please, about this universal semantic theory. Becasue if it exists, it should have the power to unify the community of practioners and I do not see that happening.
-Ken Ewell
Riza ‚Äì I agree: ‚Äú‚Ķnew technology is not supposed to build barriers, it is supposed to tear them down.‚Ä?
I disagree: ‚ÄúI cannot see how it will happen now: billions of people will abide by the rules of the new Semantic Web publishing standards? I am sorry, I am stuck in this passage and my mind refuses to bend.‚Ä?
New tools and a supporting infrastructure will reduce the barriers to the semantic web.
Ken Ewell - May be I do not understand your quest of finding a "unified theory" as well as how some theory can be labeled "unified". There are some resources at ontologicalsemantics.com which I had proposed earlier. This is academic information. As far as I am concerned, "meaning" or Text-meaning-representation (TMR) is the process of mapping concepts and sense relationships to identify the right context for a given sentence. I believe 99% of the computational linguists will have the same basic view. How to do it differs. You can do it deterministically or via black-box approach (like statistics). I am sorry I cannot be more help to you in your investigation. Perhaps we meet one day and understand each other better.
Mike Duffy - I hope you are right. Somethings that never happened before should not be a mind-set of looking forward, I agree. Again, time will show.
Thank you Dr. Berkan; that turned the light on and maybe I can be more clear. I do look forward to meeting you one day when we can compare experiences.
I found a very good, objective public article about text meaning representation (TMR) that explains TMR in a machine translation context. In doing so, it nicely illuminates the way text meaning representation is done by computers. It was written by two students at New Mexico State University for a conference and it is available on line at: http://clr.nmsu.edu/Research/Projects/mikro/htmls/flairs96-htmls/flairs96.html. Applying TMR at the search engine is a brilliant idea.
I have been investigating meaning since working on a translation project a few decades ago. The problem we had working on the language pairs like English-Arabic and Arabic-Russian was that there was no theory of meaning. There were also no semantic primitives of any sort (other than Roger Shanks primitives in his conceptual dependency theory). That is going back to work in the 1960's and 70's.
The problem is though there has been more research, the community does not have anything much better today. Today there are many more choices for these semantic primitives but there are no universal primitives and no universally accepted semantic principles for text and inquiry analysis and comprehension.
Though I believe quite strongly that the tripartite approach, like that of TMR, is a correct approach. Briefly this is a rather deterministic approach where the words and other items in a text (and query) are contrasted with definitions and word senses from a lexicon and rules about how the world works from an ontology with semantic primitives. Those who are unfamiliar with this kind of operation can think of it as a mash-up of sorts, where words from the text are mashed-up with word senses from a lexical resource (e.g., WordNet) and an ontology with some rules about the a possible world. The key components to the system, I think you will agree Riza, are a) the semantic primitives from which the relations between meaning representations emerge and b) the theory and axioms or rules to access them.
Can you tell this forum what sorts of semantic primitives you rely on, perhaps with some examples and cardinalities? Maybe you can tell us if there is any taxonomic organization of primitives or any form of inheritance, or if it is possible to compose properties with your primitives in the core meaning representation as we can with natural language. It would be interesting also to know what kind of equality tests can be done, for example, is it simple identity or can your algorithm calculate links between the semantic primitives? And lastly, can it go beyond search, I mean could one make arbitrary first or higher order assertions over the meaning representations?
I think that would illuminate the semantic scene here.
-Ken Ewell
Never mind Riza, after thinking about it, I don't think anyone (besides me of course) wants to know. Don't bother responding to #29.
After following links and reading up, I have the picture. I think the proof is in the pudding and Hakia is on line and anyone can check the results out for themselves.
I still believe that the semantic primitives are key and matter very much. I was a little disappointed to learn that Dr. Raskin gave up his belief in parsimony, and the fact that he followed Patrick Hayes is a little dismaying; particularly given your own stand against RDF semantic standards authored by Hayes.
Nonetheless; it is nice we have the Hakia beta up for comparison. Thank you for that.
Ken, what made you say that I had given up on parsimony and that I had followed Pat Hayes on that? I actually like Pat a lot but I have never quoted him, I don't think. Have he and I said something similar on parsimony? I am just intrigued. On the substance of the matter, I am very committed to parsimony. I preach and practice it. But is is mature parsimony. The negative experience of the "componential" analysis of the 1930-50s has taught us that we cannot hope to reduce the million or so meanings to 20 or so semantic features just because it takes only 20 binary features to differentiate two to the power of 20, i.e., greater than a million classes. Nor does the parsimony of an axiomatic theory, my favorite type of theory, work out very nicely for a huge multiplicity of meanings. In my forthcoming post on the nature of ontology in Ontological Semantics, I actually celebrate the parsimonious nature of it: it uses the number of concepts an order of magnitudes or two less than the number of meanings it describes in its terms.
Hello Dr. Raskin, it is certainly an honor. I read this passage from the on-line draft of your forthcoming book; Part 1, section 3: The study of meaning.
"Over the years of work in linguistic and then computational semantics, the early aspirations for parsimony of primitive elements for describing lexical meaning have gradually given way to a more realistic position, first stated by Hayes (1979), that in computational semantics (and, for that matter, in all of artificial intelligence) a much more realistic hope is to keep the ratio of description primitives, a′, to entities under description, a, as small as possible: a′/a
Hey, you are right, Ken: I did write that! Thanks for reminding me. I was indeed wondering if you had the dream of sound symbolism in mind: every sound corresponds to some elementary meaning, and every word has a meaning which is a combination of the meaning of the sounds that make it up. The stuff of Plato's dialogue "Kratylos." The naturalists vs. the conventionalists. In every generation, scholars turn to some form of this dream. I don't know Mr. Pospelov's work: I was lucky to emigrate from Russia 34 years ago, but his appears to be the modern, logical version of the dream. A beautiful dream. And then there is reality, and we have to handle it. Thanks for the kind words about my work! The next book may well be about hakia.com--after it proves itself as a true and effective semantic search. In the meantime, all the best to you--and keep posting!--Victor