Written by Phill Midwinter, a search engineer from the UK. This is a great follow-up to our article last Friday, Hakia Takes On Google With Semantic Technologies.
Semantics are said to be Äòthe next big thingÄô in search engine technology. We technology bloggers routinely drum up articles about it and sell it to you, the adoring masses, as a product that will change your web experience forever. Problem is, we often forget to tell you exactly what semantics are - we just get so excited. So let's explore this...
Wikipedia says:
ÄúSemantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign) refers to the aspects of meaning that are expressed in a language, code, or other form of representation. Semantics is contrasted with two other aspects of meaningful expression, namely, syntax, the construction of complex signs from simpler signs, and pragmatics, the practical use of signs by agents or communities of interpretation in particular circumstances and contexts. By the usual convention that calls a study or a theory by the name of its subject matter, semantics may also denote the theoretical study of meaning in systems of signs.Ä?
...which is absolutely no help.
Semantics as it relates to our topic, search engines, actually covers a few closely related fields. In this instance what we are looking at deciphering (as a basic example) is whether a computer can discern if there is a link between two words, such as cat and dog. You and I both know that cats and dogs are common household pets, and can be categorized as such. The human brain seems to comprehend this easily, but for a computer it is a much more complex task and one I wonÄôt go into here - because it would most likely bore you.
If we take as read then, that the search engine now has semantic functionality, how does that enable it to refine its search capability?
So, according to me:
ÄúA semantic search engine is a search engine that takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase.Ä?
This is not in line with the purists of what is known as ÄòThe Semantic WebÄô, who believe that for some reason we should spend all our time tagging documents, pages and images to make them acceptable for a computer to read. Well, IÄôm sorry but IÄôm not going to waste my time tagging when a computer is able to derive context and do it for me. I may have offended Tim Berners Lee by saying this, but as the creator of the Web he should know better.
Until extremely recently, GoogleÄôs semantic technology (which theyÄôve had now for quite a while) was limited to matching those adsense blocks to your websiteÄôs content. This is neat, and a good practical example of the technology - but not relevant to their core search product. However if you make a single keyword search today, chances are you may spot a block like this at the bottom of your results page:
This is more or less exactly what I was just writing about. TheyÄôre offering you alternatives based upon your initial search, which in this case was obviously for citizen. Citizen is a bank, a watchmaker and (if IÄôm not mistaken) it means youÄôre a member of a country or something. This is the first clear example of Google employing a semantic engine that works by analyzing the context of words in their index and returning likely matches for sense.
Some of you may be wondering why they arenÄôt doing this for multiple keyword phrases, which I can take a guess at from some of my own work. Analyzing the context of a word statistically is intensive and slow; and if you try and analyze two, you slow the process further and so on. It is likely they have problems doing so for more than one keyword currently, and Google as ever is cautious about changing their interface too radically too quickly. This implementation of semantics gives hope that they havenÄôt adopted the purist view of ÄòThe Semantic WebÄô where everything is tagged and filed neatly into nice little packages.
Google is all too aware of the following very large problems with that idea:
ItÄôs my belief that Google will increasingly tie this technology into their core search experience as it improves in speed and reliability. It has some phenomenally powerful uses and IÄôve taken the liberty of laying out a few of my suggestions on where they can go with this:
Self aware pages
Narrow Search
Opinionated Search
Google is using semantic technology, but is not yet a fully fledged semantic search engine. It does not use NLP (Natural Language Processing), but this is not a barrier to producing some truly web changing technology with a bit of thought and originality. NLP may well be (I hate myself for writing this) web 4.0 and semantics is web 3.0 - they are in fact different enough to be classified as such in my eyes and the technology Hakia is developing is certainly markedly distinct from GoogleÄôs semantic efforts.
There are barriers that Google needs to overcome... is it capable of becoming fully semantic without modifying itÄôs index too drastically; can Google continue to keep the results simple and navigable for its varied user base? Most importantly, does Google intend to become a fully semantic search engine and to do so within a timescale that wonÄôt damage their position and reputation? I like to think that although the dragon is sleeping, that doesnÄôt mean itÄôs not dreaming!