ReadWriteWeb

Is Google a Semantic Search Engine?

Written by Guest Author / March 26, 2007 1:00 PM / 35 Comments

Written by Phill Midwinter, a search engineer from the UK. This is a great follow-up to our article last Friday, Hakia Takes On Google With Semantic Technologies.

What is a Semantic Engine?

Semantics are said to be ‘the next big thing’ in search engine technology. We technology bloggers routinely drum up articles about it and sell it to you, the adoring masses, as a product that will change your web experience forever. Problem is, we often forget to tell you exactly what semantics are - we just get so excited. So let's explore this...

Wikipedia says:

‚ÄúSemantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign) refers to the aspects of meaning that are expressed in a language, code, or other form of representation. Semantics is contrasted with two other aspects of meaningful expression, namely, syntax, the construction of complex signs from simpler signs, and pragmatics, the practical use of signs by agents or communities of interpretation in particular circumstances and contexts. By the usual convention that calls a study or a theory by the name of its subject matter, semantics may also denote the theoretical study of meaning in systems of signs.‚Ä?

...which is absolutely no help.

Semantics as it relates to our topic, search engines, actually covers a few closely related fields. In this instance what we are looking at deciphering (as a basic example) is whether a computer can discern if there is a link between two words, such as cat and dog. You and I both know that cats and dogs are common household pets, and can be categorized as such. The human brain seems to comprehend this easily, but for a computer it is a much more complex task and one I won‚Äôt go into here - because it would most likely bore you.

If we take as read then, that the search engine now has semantic functionality, how does that enable it to refine its search capability?

  • It can automatically place pages into dynamic categories, or tag them without human intervention. Knowing what topic a page relates to is invaluable for returning relevant results.
  • It can offer related topics and keywords to help you narrow your search successfully. With a keyword like sport the engine would offer you a list of sports perhaps as well as sports related news and blogs.
  • Instead of offering you the related keywords, the engine can directly incorporate them back into the search with less weight than the user inputted ones. It‚Äôs still contested as to whether this will produce better results or just more varied ones.
  • If the engine uses statistical analysis to retrieve it‚Äôs semantic matches to a keyword (as Google is likely to do) then its likely that keywords currently associated with hot news topics will bring those in as well. For example, using my engine to search for the keyword police, brought up peerages (relating to the uk‚Äôs cash for honors scandal recently).

So, according to me:

‚ÄúA semantic search engine is a search engine that takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase.‚Ä?

This is not in line with the purists of what is known as ‘The Semantic Web’, who believe that for some reason we should spend all our time tagging documents, pages and images to make them acceptable for a computer to read. Well, I’m sorry but I’m not going to waste my time tagging when a computer is able to derive context and do it for me. I may have offended Tim Berners Lee by saying this, but as the creator of the Web he should know better.

How does Google match up?

Until extremely recently, Google‚Äôs semantic technology (which they‚Äôve had now for quite a while) was limited to matching those adsense blocks to your website‚Äôs content. This is neat, and a good practical example of the technology - but not relevant to their core search product. However if you make a single keyword search today, chances are you may spot a block like this at the bottom of your results page:

This is more or less exactly what I was just writing about. They’re offering you alternatives based upon your initial search, which in this case was obviously for citizen. Citizen is a bank, a watchmaker and (if I’m not mistaken) it means you’re a member of a country or something. This is the first clear example of Google employing a semantic engine that works by analyzing the context of words in their index and returning likely matches for sense.

Some of you may be wondering why they aren’t doing this for multiple keyword phrases, which I can take a guess at from some of my own work. Analyzing the context of a word statistically is intensive and slow; and if you try and analyze two, you slow the process further and so on. It is likely they have problems doing so for more than one keyword currently, and Google as ever is cautious about changing their interface too radically too quickly. This implementation of semantics gives hope that they haven’t adopted the purist view of ‘The Semantic Web’ where everything is tagged and filed neatly into nice little packages.

Google is all too aware of the following very large problems with that idea:

  • Users are stupid.
  • Users are lazy.
  • Redefining the way they‚Äôve indexed what is assumed to be petabytes of data would require them to effectively start again.
  • It‚Äôs not as powerful or dynamic.

How Google can utilize Semantic technologies

It’s my belief that Google will increasingly tie this technology into their core search experience as it improves in speed and reliability. It has some phenomenally powerful uses and I’ve taken the liberty of laying out a few of my suggestions on where they can go with this:

Self aware pages

  • Tagging pages with keywords has always been used on the internet to let search engines know what kind content the page contains.
  • Using a Google API we can generate the necessary keywords on the fly as the page loads. This cuts out a large amount of work for SEO.
  • A Google API enabled engine wouldn‚Äôt even need to look at these keywords, it could generate them itself.
  • Not only a page can be self aware these days, people tag everything - including links. The Google API could conceivably be used to tag every single word on a page, creating a page that covers every single keyword possibility. This is overkill - but a demonstration of the power available.

Narrow Search

  • When you begin a search, you enter just one or two keywords in the topic you‚Äôre interested in.
  • Related keywords appear, which you can then select from to target your search and remove any doubts about dual meanings of a word for example.
  • This step repeats every time you search, also possible is opinionated search.

Opinionated Search

  • Because of the way Google statistically finds the senses of keywords from the mass of pages in its index, what in fact it finds is the majority opinion from those pages of what the sense of a word is.
  • At the base level, you can select from the average opinion of related keywords and subjects from its entire index.
  • You can find the opinion at other levels as well though, and this is where the power comes in in terms of really targeting what the user is looking for quickly and efficiently. All the following mean that this is the first true example of social search:
    • Find the opinion over a range of dates, good for current events, modern history, changes in trends.
    • Find the opinion over areas of geography, or by domain extension (.co.uk, .com).
    • Find the opinion over a certain group of websites, or just one website in particular - compare that with another site.
    • Find the opinion not only over the above things but also subjects, topics, social and religious groups.
    • At the most ridiculous example level, you could even find what topics 18 year olds on myspace living in Leeds most talk about - but that I could probably guess. The point is that this is targeting demographics on a really unprecedented level.
  • Add the sites or web pages to your personal profile that you think most closely reflect your opinions, this data can then be taken into account in all future searches returning greater personal relevancy.

Conclusion

Google is using semantic technology, but is not yet a fully fledged semantic search engine. It does not use NLP (Natural Language Processing), but this is not a barrier to producing some truly web changing technology with a bit of thought and originality. NLP may well be (I hate myself for writing this) web 4.0 and semantics is web 3.0 - they are in fact different enough to be classified as such in my eyes and the technology Hakia is developing is certainly markedly distinct from Google’s semantic efforts.

There are barriers that Google needs to overcome... is it capable of becoming fully semantic without modifying it’s index too drastically; can Google continue to keep the results simple and navigable for its varied user base? Most importantly, does Google intend to become a fully semantic search engine and to do so within a timescale that won’t damage their position and reputation? I like to think that although the dragon is sleeping, that doesn’t mean it’s not dreaming!


3 TrackBacks

Listed below are links to blogs that reference this entry: Is Google a Semantic Search Engine?.

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2079

Das geht schneller als man denkt: eben schreibt meine Kollegin über die kommenden Versionen des Webs und macht folgende gewagte Aussage: Spätestens wenn man anfängt über Web 4.0 zu reden, wird jeder wissen, was Web 2.0 ist/war. Na dann müssten wir... Read More

» SearchCap: The Day In Search, March 27, 2007 from Search Engine Land: News About Search Engines & Search Marketing

Below is what happened in search today, as reported on Search Engine Land and from other places across the web:... Read More

Below is what happened in search today, as reported on Search Engine Land and from other places across Read More

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • "Some of you may be wondering why they aren‚Äôt doing this for multiple keyword phrases, which I can take a guess at from some of my own work."

    I've been seeing this for plenty of multi-word phrases. Search 'hip hop music' in Google. It gives you these refinements at the bottom:

    music
    hip hop music history
    hip hop music online
    hip hop songs
    hip hop artists
    bet
    hip hop radio

    Posted by: Hashim | March 26, 2007 2:06 PM


  • You're right. I've been playing with this a little more today - not only does it work for one word searches but also for pre categorised phrases it seems... such as car insurance, or indeed hip hop music. I'll find out as much as I can but on first impression it's not an entirely automated process yet.

    Posted by: Phill Midwinter | March 26, 2007 2:18 PM


  • The real question here I was driving at is why it's not available on almost all searches as opposed to a select few.

    Posted by: Phill Midwinter | March 26, 2007 2:20 PM


  • Is this the same as AI search, perhaps?

    Posted by: Ali | March 26, 2007 2:34 PM


  • When I relate "cat" and "dog", I don't go looking through all of the instances of "cat" and "dog" in my life and see how often they co-occur. From your examples, that's all that Google is doing: they're looking for keywords that show up in documents frequently and then allow you to refine your query based on that. That seems like a very weak form of semantics. For example, it would be difficult, to classify dogs and cats as animals using that method -- and it doesn't seem like Google can do that. Also, you reference document classification (e.g. auto-tagging) and it's not clear that Google can do that either.

    So how is Google a semantic engine?

    Posted by: Mark Johnson | March 26, 2007 2:50 PM


  • bad:

    "the purist view of ‘The Semantic Web’ where everything is tagged and filed neatly into nice little packages."

    good:

    "The Google API could conceivably be used to tag every single word on a page, creating a page that covers every single keyword possibility."

    So, I take it you have problems with the nice little packages? Doesn't every categorization system including Google's ultimately have similar problems?

    In any case, I have trouble seeing ways that having a coherent, consistent system for tagging would be a bad thing since that does not block other systems for doing things.

    On a related matter, you're right to say bad things about yourself for using the term Web 4.0. I have yet to encounter a serious usage of Web 3.0 or any other such variation that spoke to a shift of the depth and complexity of Web 2.0.

    Not to sound like a religious nut or anything!

    Posted by: Clyde Smith | March 26, 2007 2:56 PM


  • I am a smarter person now that I've read this post.

    Posted by: Ryan Fujiu | March 26, 2007 3:02 PM


  • That was a great piece Phil.

    I agree with your insight on Google and your generalization of semantics as sense information. I am not very impressed with Hakia as there is too much empty jargon in their explanations of what they are doing to make any kind of evaluation. I also like your definition of semantics as well.

    As you have implied, word sense can be explicated, yet, if you are someone who asserts that meaning is personal and takes place in the mind. That essentially means that meaning is ignored for you when using today's technology.

    Proclaim that the semantic web is about making sense of what words mean to individual browsers of resources on the Internet, and almost every computer professional involved in the semantic web would say that is not what the semantic web is about. This is because one of the tenets of the symbol grounding problem in computing suggests a fundamental decision be taken to neglect meaning and concepts of the mind altogether. It will take another fundamental decision to change the status quo.

    You probably would agree, Phil, that mind does not register or record every word imagined and do statistics on them. It works with much smaller sets of cognitivistic concepts that such large numbers of words attempt to represent in some ways. For this reason, I do not think Google, or any other search engine using keyword based indexes and some NLP or AI, could achieve an opinion search as you have outlined. Readware software can.

    Readware is a mature semantic framework that delivers and utilizes sense information from words. It can also 'discern' concepts and topics and categories from the discourse on web pages read by the analyst sever. While Readware was not listed among the alternate search engines listed by Read/Write articles, it may be the only search engine that realizes how to seek, identify and compute semantic relevance from the words used in discourse and context.

    While many researchers and developers may cringe at the prospect that the mind is where meaning is found, we can show that it possible to use cognitive objects and a conceptual model to identify sense information and precisely weigh and measure relevance among variable words forms and grammars in various forms of text and discourse.

    Posted by: Ken Ewell | March 26, 2007 3:07 PM


  • I completely agree with Mark Johnson. The only thing Google is doing at the moment is using statistical information to help people specify or narrow down their search queries. In my opinion, this has little to do with semantics.

    To use your "citizen" example: if Google would ask me if I am looking for the financial institution "Citizen bank" of for the Crystal manufacturer "Citizen crystels", than I would call this (the first step of) semantic search.

    By the way, I don't think that manual tagging / annotation is necessary for the semantic web to become reality. A lot of work has already been done in order to perform automatic annotion on webpages. Performance is getting better and better and I think it will be a matter of years before the big search engines start annotating webpages using (semantic ) ontologies.

    Posted by: Jochem Prins | March 26, 2007 3:09 PM


  • Aren't we already tagging the web?

    From my site:

    I'm reading about the semantic web and ontologies in last week's Economist (Tim Berners Lee) and a question arose in my head. Aren't we already creating this? Aren't del.icio.us and technorati examples of tools that allow for the emergence of a organized structure of information the same way a mass brain would? Doesn't the tagging going on today by millions of users ultimately allow for and supply the underpinnings of the pattern recognition algorithms our brain uses every day? The principles described in Jeff Hawkin's book "On Intelligence" have direct implications to this and may very well redefine how we get information from the web both personally and on a machine to machine basis.

    see: www.theatomicweb.com

    Posted by: Mark | March 26, 2007 3:54 PM


  • Check out Google Sets, a product under Labs. You'll be surprised by some of the results you get from it.

    Posted by: Paul Jensen | March 26, 2007 3:54 PM


  • Even if Google had jumped ahead and implemented semantic search technology...I am not sure how effective this would be. As far as I have read...effective semantic search will need websites that have utilised semantic technology.

    I don't see a semantic search a reality until the basic infrastructure of the web is changed.

    Posted by: Adrian Keys | March 26, 2007 4:06 PM


  • Phil,

    I think that another way of describing the semantic engine is a tool that allows you to search at the topic level. Once you have the topic of interest, then drill down to the specific document.

    Just how you categorize documents by topic is the big unknown, but that's the basic concept as I see it.

    Make sense?

    Posted by: Terry Steichen | March 26, 2007 7:06 PM


  • Well I like drumming up articles about worthless search engines designed by rocket scientists so that narrow minded Google worshipers have something to negate :)

    Some of these engines are quite excellent, and compared to LORD GRAND POOPA of GOOGLE, possibly even more effective.
    I shutter to think at how much nay saying we would have if we were talking about inventing something that could stream text and pictures through little wires over thousands of miles at the speed of light!

    Is there anyone out there that was not made to feel superior or cool because they had the distinction of being the first in their neighborhood to type Google?

    OMFG. I am so amazed that millions of people can talk, inspect, research and finger manipulate a new cell phone ring tone or wideget so insignificant it will be worthless in 3 months, yet people attempt to create something astounding and some geeks just want to their little cubicle to stay safe.

    Take some time, email the people involved, get them to explain what they are doing and perhaps understand that anything is possible still. Go to hakia or Powerset and talk with the people, perhaps they can delineate themselves from delicio lmao.

    BTW the basic infastructure of the web is still the page, which typically has words on it. So, if one can make a several data sets from those combinations of words and use semantics to return less than the "billions" of permutations that Google's algorithms produce, then it is possible to do a much more effective and even faster search.

    Always, Phil

    Posted by: Phil Butler | March 26, 2007 8:36 PM


  • They trying to be one? yes. are they? not at all

    Posted by: Acronyms | March 26, 2007 9:58 PM


  • "As far as I have read...effective semantic search will need websites that have utilised semantic technology.

    I don't see a semantic search a reality until the basic infrastructure of the web is changed."

    Yes, this is exactly what the evangelists tell us. However, it's not necessary! The basic structure of the web is not going to change and that's a good thing.

    "Aren't we already tagging the web?"

    Yes we are, del.icio.us is a good example - but ultimately, everything delicious does could be done by a machine. I've successfully programmed applications that will automatically tag in the same way Google suggests alternatives.

    "You probably would agree, Phil, that mind does not register or record every word imagined and do statistics on them. It works with much smaller sets of cognitivistic concepts that such large numbers of words attempt to represent in some ways. For this reason, I do not think Google, or any other search engine using keyword based indexes and some NLP or AI, could achieve an opinion search as you have outlined. Readware software can."

    I very much like this comment, and a few of you have said similar things. In fact, this is a psychological argument and when I was programming my first semantic tools it was something I grappled with a lot. I came to the conclusion that the human brain, as say a young child just learning to talk begins to derives senses from words in 2 basic ways.

    - Association
    - Questioning

    Questioning is of course... asking what a word means or what an object is called.

    Association is the statistical part, when a child frequently hears the words cat and kitten together, combined with questioning they build their own semantic ontology that changes and is redefined with age. Which is a very important reason for programmers NOT to use restrictive defined ontologies needing them to 'restructure the web' to 'solve' the problem. Instead they need to rely more heavily on association, with the limited questioning (through search queries) they receive.

    Posted by: Phill Midwinter | March 27, 2007 1:01 AM


  • they can not be "doing this for multiple keyword phrases", cause multiple words ARE semantics.
    If two words stand ogether and make some sense, that means that they have proper relationships, and relationships between words are semantics, this is one of two core principles of semantics.
    So, my idea is that they simply haven't overcome this step yet. Because it's like "semantics inside semantics", which, for now at least, is probably too much for them. ))

    Posted by: Alex | March 27, 2007 1:02 AM


  • Yes there's a snag there you're right. There are a number of ways to do it but which is best?

    You could run the same semantic search on each keyword.

    You could run the semantic search over the keywords as a phrase.

    You could do both and correlate results from each.

    I know how mine works, but I couldn't tell you if it was 'right' because I don't exactly know how we as human beings do it yet.

    Posted by: Phill Midwinter | March 27, 2007 1:24 AM


  • Well, linguistics as a science explains that as well.
    I believe the approach they're "confessing" is wrong somewhere deep, much deeper than just selecting from those three options.
    It's like when you're drowning you think what to chose to save - to shout for help, to learn how to swim or to drown.. When the right answer is "yuo shouldn't have drunk that much")))) Sorry, that's may be a very strange example, but it was the first to occur)) to explain, that may be they have missed something somewhere on the earlier stage.
    Anyway, i don't think they'll become totally semantic. Just because today its semantic, tomorro...???
    The maximum is to make a whole new "googled" search engine, but does it have any sense?

    Posted by: Alex | March 27, 2007 1:37 AM


  • You lost me at:

    "...tagging documents, pages and images to make them acceptable for a computer to read. Well, I’m sorry but I’m not going to waste my time tagging when a computer is able to derive context and do it for me."

    I'm sorry, but documents must be marked up appropriately so that they can be consumed by a spider in a manner in which the spider is able to garner some information about the content of the document. The Semantic Web has absolutely nothing to do with tagging in the sense of the word that you have used.

    Posted by: Rob Scherer | March 27, 2007 4:10 AM


  • Documents don't need to be marked up appropriately, it does help if they're written with good english, varied vocabulary etc.

    I program web spiders and search engines for a living, and at the risk of offending the purists - such as you. It's been done by many others even apart from myself without the need to tag everything in sight.

    I think I made the point quite clearly that 'The Semantic Web' has nothing to do with tagging in the sense of the word that I have used. I don't think it should have anything to do with it because it's a useless idea.

    Posted by: Phill Midwinter | March 27, 2007 4:18 AM


  • Phill, i think it IS quite possible to make it without tags (though i think that tags are not the most senseless option))), but i really can not understand - is there any place in your article for Latent Semantic Indexing? or is it meant "by default"? Is this used at all or not? It seems that some points in your article quite deeply interwined with LSI?
    It would be a pleasure to hear your opinion on that, as i seem to be a bit lost.
    Thanks, Alexandra.

    Posted by: Alex | March 27, 2007 8:36 AM


  • What I'm saying is that this is latent semantic indexing (some argue that Google already incorporate it into their algorithm) or that it is the most likely method Google are using to do this. It's also the method I specialise in and advocate strongly as opposed to NLP or 'The Semantic Web' of course :)

    Taking what we know from early Google white papers and other research, when Google spiders the web it breaks a page down into 'barrels'. Which is just a collection of all the words on the page essentially.

    If you then take these barrels and see how frequently words occur together on a page over say a 1000 pages, you build up a stastistical picture of words that are likely to be related. Of course the more pages you do this across the high the statistical probability is that they actually are related.

    If you also store the position in the page at which the word occurs, you can factor in the distance between the words.

    You can use this to speed up your algorithm substantially because effectively your select from database statement only has to pull in the say four words before and after the one you're looking for semantic links to. ie. the words occuring within sentence range of your keyword on a page.

    I think I'm the first to relate that this in fact is generating a mass average opinion of a keyword's context and that this is itself something that could be used to show different context over subgroups of society, or even through dates.

    Posted by: Phill Midwinter | March 27, 2007 9:02 AM


  • "If you then take these barrels and see how frequently words occur together on a page over say a 1000 pages, you build up a stastistical picture of words that are likely to be related."

    I don't think you were very clear in that last post Phil. Word statistics accross these barrels may uncover co-occurrences (i.e. this is where a semantic relation exists) but little else. They do not tell you how those co-occurrences fit into the rest of the semiotics of the indvidual and his or her world: culture, society or clique, i.e. their 'worldview'.

    Say, for example, you want to use your search engine to explicate your query for 'a good car' well-enough that it can include documents about 'high performance automobiles' and those that say 'clean, one-owner vehicles', 'a perfect ford for...' , 'a hot chevy with low milage', and documents with relative idioms I don't care to apprehend-- any methods depending on co-occurrence, noun or verb phrases, etc., LSI-wise or otherwise will fail and need to be re-trained for every case. Am I right or wrong?

    Posted by: Ken Ewell | March 27, 2007 4:21 PM


  • "If you then take these barrels and see how frequently words occur together on a page over say a 1000 pages, you build up a statistical picture of words that are likely to be related."

    I don't think you were very clear in that last post Phil. Word statistics across these barrels may uncover co-occurrences (i.e. this is where a semantic relation exists) but little else. They do not tell you how those co-occurrences fit into the rest of the semiotics of the individual and his or her world: culture, society or clique, i.e. their 'worldview'.

    Say, for example, you want to use your search engine to explicate your query for 'a good car' well-enough that it can retrieve pages/resources about 'high performance automobiles' and those that say 'clean, one-owner vehicles', 'a perfect ford for...' , 'a hot Chevy with low mileage', and pages with many of the relative idioms I don't care to apprehend-- any methods depending on co-occurrence, noun or verb phrases, etc., LSI-wise or otherwise, will fail and need to be re-trained or re-indexed for every case.

    Developers should also remember that meaning is built up from smaller elements. There documents semantics and sentence semantics and there must also be semantics at all levels of composition. Consider how the prefix /re/ changes the meaning of words above.

    Now there are those that will argue that linguists know all about that 00and of course they do. What they do not do is consider or model how the addition of the prefix links to human perception. Neither do search engines.

    While search engines do a great amount of indexing, they do not know what it means to index let alone what the significance or consequence (cause or effect) may be of having to reindex. I'll bet Google engineers do.

    Posted by: Ken Ewell | March 27, 2007 4:37 PM


  • I think you're over analysing. Whereas in a single case of one web page you are completely correct, by taking the mean of such huge data sets as are available to a search engine - these things can be virtually ignored.

    Posted by: Phill Midwinter | March 28, 2007 1:33 AM


  • Isn't that the problem--sensibility is ignored? Isn't that what people are expecting from a semantic search engine: sensibility in the results?

    It is not too much analysis we do it in milliseconds while indexing. The problem with non-semantic search engines is that they do not register or index or link the semantics in phrases such as "a reliable car" and "this Chevy is dependable transportation" for example. Not in one or many pages...

    In this example, reliable and dependible refer to the same concept. A semantic web product using WordNet should be able to discern that much. In addition, there is a pragmatic relation between car and Chevy that any search engine claiming to be semantic should be capable of capturing.

    The difficulty is that are are milions, zillions, of words and phrases. They are not distilled or filtered by finding their mean in actuality. They are distlilled into topics and filtered into categories using a keen sense of semantics to recognize the essence of the message from the cultural code of the language.

    Posted by: Ken Ewell | March 28, 2007 8:02 AM


  • Isn't that the problem--sensibility is ignored? Isn't that what people are expecting from a semantic search engine: sensibility in the results?

    It is not too much analysis; we do it in milliseconds while indexing. The problem with non-semantic search engines is that they do not register or index or link the semantics in phrases such as "a reliable car" and "this Volvo is dependable transportation" for example. Not in one or many pages...

    In this example, reliable and dependable refer to the same concept. A semantic web product using WordNet should be able to discern that much. In addition, there is a pragmatic relation between car and Chevy that any search engine claiming to be semantic should be capable of capturing. Tagging, btw, has nothing whatsoever to do with any of this. Tagging is purely a data-processing crutch (that helps a little).

    Think of the name Chevy as a tag, a socio-cultural tag, for a car. Names are tags. Names and tags are arbitrary. Individually indexing each and every occurrence or averaging all of them for their mean does what-- is useful for something? Finding phrases like those above by discerning the nature and import of the relations between specific names and other specific tags (in context) is a semantic process useful for discovery. Discovery is on the weak side of search engines

    The difficulty is that are are millions, zillions, of words and phrases. They are not distilled or filtered by finding their mean, most frequent or least frequent, in actuality. To my mind, all that is read is distilled into topics and filtered into categories using a keen sense of semantics to recognize the essence of the message from the cultural code of the language.

    Without going too far here, let me just say that it must be true that individuals may have different sensibilities, different modes of expression, different views, affinities, biases and beliefs. Each individual can have their own special way of interpreting what is going on while thoughts and perceptions race through their minds. Yet every individual has the same perceptual apparatus.

    It is also true that shared (interpersonal) perceptions do not present themselves differently to each culture or society, let alone each individual. I mean that every individual in the world is subject to the same existential affairs. Unlike the (imperfect, changing) language, the semantics of such affairs must be universal that each individual may interpret as they will.

    Posted by: Ken Ewell | March 28, 2007 9:03 AM


  • Ooops, sorry for the double post. I thought I had canceled the post #27 from sending, sorry.

    Posted by: Ken Ewell | March 28, 2007 9:07 AM


  • Nothing needs to be filtered.

    This is statistics, the more times you perform an operation the higher the percentage probability that the answer is correct.

    It doesn't matter that people have imperfect language, or individuals have different semantics.

    Posted by: Phill Midwinter | March 28, 2007 9:22 AM


  • Let's compare the query "iPhone" on Google and Quintura (a semantic SE) and see who shows a more comprehensive context so far. Only when Google understands that the iPhone logo belongs not only to Cisco or Apple, will it be a semantic search engine.

    Posted by: CoStas | March 29, 2007 3:47 AM


  • Let's compare the query "iPhone" on Google and Quintura (a semantic SE) and see who shows a more comprehensive context so far. Only when Google understands that the iPhone logo belongs not only to Cisco or Apple, will it be a semantic search engine.

    Posted by: CoStas | March 29, 2007 4:08 AM


  • I did not notice if anyone had been to the hakia labs site, so in case I missed it please forgive me. Here is the link for those of you interested:

    http://labs.hakia.com/

    I spoke with Dr. Berkan about some more graphic representations and he told me he would get me some video and other information to supplement my pea brain lol. This is exciting stuff :)

    Phil

    Posted by: Phil Butler | March 30, 2007 10:13 PM



  • If you want to check how NLP can improve current search engines when analyzing users' queries, check out our demo of NaturalFinder integrated with MSN Search for English:

    http://demos.bitext.com/MSNen
    user: readwriteweb
    pw: bitext

    Bitext's technology can be easily integrated with any search engine like Google Search Appliance, Autonomy, dtSearch, Lucene, etc.

    For more info, check out www.bitext.com

    Posted by: Enrique Torrejon | April 4, 2007 4:21 AM


  • I think that many people here are misunderstanding the definition of a symantic, or more essentially of a language:

    Mark Johnson -
    "It would be difficult, to classify dogs and cats as animals using that method - and it doesn't seem like Google can do that."
    No, Google may not be able to explicitly decide that they belong to "animals" as such, however Google can realise that they belong to a group that relates to what we would call "animals" (see google sets).

    "Also, you reference document classification (e.g. auto-tagging) and it's not clear that Google can do that either."
    Google wouldn't be great at auto-tagging documents with tags that were english words, as the information about the page does not measure the meaning of the page using english words as symantics. It does "auto-tag" the document with semantics, and it is these semantics that are used in search.

    "That seems like a very weak form of semantics."
    Actually, compared to the raw semantics of language, this does a remarkably good job. This is because the layout of the text is taken into consideration, which effectively creates new semantics.

    Phill Midwinter:
    I think that the main problem you have seen is a problem with linguistics, not with the semantic technology behind it.
    The concept space is, (it can be analytically proved - assuming it is using a SVD) making as good an approximation as possible. Likewise, we are only searching for the overall concept of a document, so the concept vector must be a very good representation of the document. The problem is in the search query, and in the wide distribution of how people use a language.

    This is what google's personalised search is clearly going to try to do, determine how you use linguistics to build a map into the concept space that is more characteristic of your use of the language than the average map. Eventually they will probably be able to do this for the map from a website into the concept space too.

    It's been good to come across you all, Phill - phillmidwinter.wordpress.com has gone in my RSS feeds, and I am sure I will be commenting on your blog as I am working on my own semantic search algorithms.

    If you want to read my blog, click the link on my name.

    Posted by: Tim Wintle | April 5, 2007 11:14 AM




RECENT JOBS


RWW READERS


TEXT LINK ADS


RWW PARTNERS

adaptiveblue

Yahoo Buzz