ReadWriteWeb

Did Google Just Expose Semantic Data in Search Results?

Written by Marshall Kirkpatrick / January 6, 2009 6:36 PM / 81 Comments

In what appears to us to be a new addition to many Google search results pages, queries about birth dates, family connections and other information are now being responded to with explicitly semantic structured information. Who is Bill Clinton's wife? What's the capital city of Oregon? What is Britney Spears' mother's name? The answers to these and other factual questions are now displayed above natural search results in Google and the information is structured in the traditional subject-predicate-object format, or "triples," of semantic web parlance.

The answers aren't found structured that way on the web pages they come from - Google appears to be parsing the semantic structure from semi or unstructured data. That's something Microsoft paid over $100 million to try to do this summer when it acquired Powerset. Check out these screen shots below.

semgoog2.jpg
semgoog4.jpg
semgoog6.jpg

We're sure that Google's been doing this analysis for some time behind the scenes, but for the company to expose the data in this structured way and to include a link to view other sources appears new to everyone we've asked about it so far. We've got inquiries in with some people who specialize in search but our semantic web contacts say they've not seen it before. (Update: Some readers have said in comments that they've seen variations of this for some time, including a three year old Google program called "Direct Answers." None of the coverage we've seen of that program offers the kind of examples we're seeing here - but we're not sure what to think! We'll see how feedback goes.)

It appears that the feature isn't being bucket tested, either, it is globally available. Could 3rd parties make use of the data now that it's available in a structured format? Possibly. The search results pages aren't being marked up with RDF in the HTML, which is a shame.

Is Google Creating Structured Data Where There Was None Before?

Bruno Haid of Austrian enterprise semantic startup System One pointed all this out to us and offers the following:

What's interesting is that while Justin Timberlake's mother is being parsed, amongst others, from http://www.celebritywonder.com/html/justintimberlake.html , there is no structured source visible that holds "Lynne" as string for Britney Spears mother. So either Google utilizes a trusted source that is not listed in "more sources" or they really extract that information from the unstructured text at http://ububu.com/BritneySpears.html . Which would make this whole thing quite huge.

This is really the crux of the question. To conclude that there is semantic analysis going on just because some of the info displayed appears in subject-predicate-object format would be a mistake (an after the fact, therefor because of the fact fallacy) but if those connections were being discovered by Google automatically when they where not displayed in a structured or straightforward way before - then we could conclude there's some semantic analysis going on. That appears to be the case, but we may be wrong! (Update: For what it's worth, Google's Matt Cutts, often company's public face when it comes to search algorithm changes, gave this very blog post a thumbs up on FriendFeed. On the other hand, ex-Googler Jonathan Betz says in comments that he lead Direct Answers when he was at the company and believes we're just seeing an expansion of that program.)

Yahoo, Ask.com and Live.com are all unable to answer these same questions so clearly.

Many of the data points are being pulled in from the structured part of Wikipedia entries, which is interesting. Other sources are wide ranging, from a license plate website to Jason Calacanis's Mahalo.

We're not sure what to make of this - have readers seen it before? We think it's new and we think it's pretty interesting.

Why is This Important?

As we've said about the semantic web before: Once our software is capable of deriving meaning from web pages it looks at for us, there's a whole lot of work that will already be done, allowing our human, creative minds to reach new heights. Structured data is a layer of standardized abstraction upon which new innovation can be created.

That's why we're interested to see what Google is doing.

The answers aren't always accurate - try searching the birth date of Jesus Christ, for example. Yahoo! has far more clearly articulated what they intend to do with semantic data. None the less, Google now appears to be doing something that no one else is doing. Maybe readers here search for "Britney Spears' mother" all the time, though, and have already seen this. We believe this may be different from the kinds of info-tips that have been shown above search results in the past, however.

If this speculation based on limited observation and Google is not exposing semantic data in search results - then a logical question would be, why not? Creating structured data where there previously was none is much harder than you might think. We hope that's what Google is doing!


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Google's had these sort of results for a while, actually -- years, maybe? They've always seemed to me like they're going against Google's principle rule of getting people off of the search result page as quickly as possible and onto a different website, but, hey, if that means you stare at the right-hand ads for a few more seconds, I imagine they're not going to complain.

    Also, I'm sure these have had nothing but positive effects in their testing. It'll make people Google more as opposed to just going to Wikipedia or some other site directly.

    Posted by: Ben | January 6, 2009 7:59 PM



  2. Ben - you've seen them for years exposed like this? You're the first person we've been able to find who says that - though I'm totally open to the possibility. I'm going to reserve final judgment until more feedback comes in.

     Posted by: Marshall Kirkpatrick Author Profile Page | January 6, 2009 8:07 PM



  3. This looks to be a new addition to me. From playing around with doing some searches it appears that the semantic dataset is a bit basic at the moment, but it's a start and it is great to see Semantic Search becoming mainstream.

    Posted by: Simon Gianoutsos | January 6, 2009 8:12 PM



  4. Beside not having noticed them ever before (altough the Wikipedia extracted ones seem to be around for quite some time, years+), there are two queries that puzzle me:

    The Britney Spears one, because i can't find just "Lynn", neither in Wikipedia nor in one of the other structured sources.

    On the other hand the result for "Justin Timberlakes mother" has been extracted from http://www.celebritywonder.com/html/justintimberlake.html (See the "aka. Lynn Harless, manager of the all-girl group Innosense; born on March 19, 1961" 'flaw'), whereas Britneys mother is also being mentioned with full name at Celebritywonder.

    So either they expanded their parsing to more 'trusted sites' and obfuscate a little bit by pretending to have it from unstructured sources or they really started deriving this from freetesxt.

    Posted by: Bruno Haid | January 6, 2009 8:13 PM



  5. The thing that bugs me the most about Google is that it's so hard to find someone to officially answer questions like these.

    If they are the 'open' and 'transparent' company they claim to be, then they should be pro-active about answering the community when questions like these arise.

    Imagine if the question was not about a potential neat new feature but rather about something more serious.

    Posted by: Chris Saad | January 6, 2009 8:17 PM



  6. Yikes. I remember reading, a long time ago, that a Google team in manhattan was working on revolutionizing structured information.

    Anyway, not much to say. I can only hope that they fall asleep and end up behind us all ;)

    Too much to ask? maybe...

    Marshall, one small correction: tripple --> triple

    Posted by: Aldo Bucchi Posted on FriendFeed   | January 6, 2009 8:17 PM



  7. They could be parsing infoboxes in wikipedia or Freebase data. Would be good to check if that's the case (which is pretty much what Powerset was doing too)

    Posted by: Deepak Posted on FriendFeed   | January 6, 2009 8:19 PM



  8. mndoci we did check and though in some cases they are doing that, it appears that's not in every case. see the source, for example, on the oregon capital search above

    Posted by: Marshall Kirkpatrick Posted on FriendFeed   | January 6, 2009 8:21 PM



  9. I've definitely seen Google do this before, it was a handy tool for finding what day a holiday falls on or, like the example above, capitals.

    It always seemed in line with their calculator/converter tool but it looks like they just enabled it for a broader range of queries.

    Posted by: Mike Woods | January 6, 2009 8:22 PM



  10. Oh cool. Missed that part. Hmmm ... Interesting

    Posted by: Deepak Posted on FriendFeed   | January 6, 2009 8:23 PM



  11. This has been around for awhile, well 'm not sure about the "Who is X's mother", but the city ones and such have been around for sometime now.
    On a second note it is nice to see someone from Oregon blogging on such a popular blog. Weird to be looking through RSS and bam stuff about Oregon :).

    Posted by: TG2345 | January 6, 2009 8:25 PM



  12. I have not seen this before and it does look like a trial semantic structure. Odd that it isn't leveraging Freebase data.

    Posted by: AJ Kohn Posted on FriendFeed   | January 6, 2009 8:37 PM



  13. I don't know enough detail about semantic data to comment about Google's use of it. But I do know that every time I have a conversation about the semantic Web for long enough, I hear someone bring up Wikipedia as the world's largest disambiguation project, a potential tool for teaching computers that there's a difference between White the color and the White album. There are currently hundreds of thousands of pages just for disambiguation, not to mention the semantic data in real encyclopedic articles.

    Posted by: Steven Walling | January 6, 2009 8:39 PM



  14. @Chris Saad Completely Agree!

    Posted by: Alex Iskold | January 6, 2009 8:47 PM



  15. Umm, Marshall? They added this over three years ago:

    http://www.pcworld.com/article/120362/google_intros_qanda_service.html

    Posted by: Aaron Swartz | January 6, 2009 8:50 PM



  16. Marshall,

    This was long in coming/making and I am not surprised. We've seen bits of it bubble up before and it seems like more and more is coming to the surface.

    Certainly Google is a great position to deliver this 'light' semantic technology to the masses.

    Alex

    Posted by: Alex Iskold | January 6, 2009 8:51 PM



  17. I'd agree with #1 that this has been implemented in some form for a while now. I remember falling on it when I was searching for the population of various countries. That said, they may have expanded their dataset recently so more people are noticing on their searches.

    Some of the more popular ones I remember are for sporting events like the World Cup and Olympic schedules. I have to imagine that they used the same technology to parse the proper information from the respective sites for these events.

    Posted by: Amit | January 6, 2009 8:51 PM



  18. This looks different to me. The presentation in the 'one box' above the normal search results is the same as it might be for ...

    Flight Tracking

    http://www.google.com/search?q=united+airlines+flight+89

    Local Weather

    http://www.google.com/search?q=walnut+creek+ca+weather

    Conversion Counters

    http://www.google.com/search?q=feet+to+meters

    But these results look different. They're answering a question, not delivering information on your search. By that I mean, the result set doesn't have the keywords for your query - unlike the results above which all do.

    Here, they're taking a set of data based on a query string and providing structured results meant to answer that query. Not just deliver relevance but to provide resolution. Big difference in my opinion.

    The Jesus reference actually supports the idea of a semantic structure since Chris Ferguson is known as Jesus in poker circles. So the semantics actually work here, they've just delivered a different type of Jesus.

    I'm a big proponent of microformats (it was one of my 2009 Internet Predictions) but Google has been slow (at best) to embrace them.

    These instances might show the reason - they've figured out a way to get semantic data from sites without the additional structure. That would increase the ability to deliver and leverage semantic data since there would be no requisite coding.

    Great find and interesting stuff. Please update us on what you learn.

    Posted by: AJ Kohn | January 6, 2009 8:54 PM



  19. That is pretty huge. And typically the way they do it (silently releasing new features).

    Posted by: Laura Norvig Posted on FriendFeed   | January 6, 2009 8:55 PM



  20. By the way, the Jesus example supports the semantic structure. Chris Ferguson is well known as Jesus in poker circles.

    Posted by: AJ Kohn Posted on FriendFeed   | January 6, 2009 8:58 PM



  21. It's new to me.
    I have just searched for queries I search often and "google's answers" appear on the top of the search page.

    Posted by: ben | January 6, 2009 8:58 PM



  22. "what is the weather in Brooklyn" vs "brooklyn weather" - guess which query will give you structured results. It's the latter one. I'm not quite sure Google has gone semantic but it has gone structured from time to time.

    Posted by: Allan Benamer Posted on FriendFeed   | January 6, 2009 9:00 PM



  23. Marshall,

    Good find, but I'm be hesitant to hype it up as "semantic data" yet. I've seen this feature before but Google has definitely expanded it. Ask.com and 1

    It seems like this is a simple expansion of this. Additionally, A9 - Amazon's now gutted search engine - used to have functionality similar to this and what Ask.com is trying to do today.

    Heck, go on Ask.com and search for "Capital Oregon" and you'd get this:
    http://www.ask.com/web?qsrc=2417&o=0&l=dir&q=capital+oregon

    I wouldn't proclaim Ask.com semantic search engine all of sudden.

    For awhile now, you could get quick answers for typing simple questions into Google like "Weather 94109", "define: email marketing" to "(100/.123)+123".

    Daniel

    Posted by: Daniel Riveong | January 6, 2009 9:00 PM



  24. Not sure how accurate it is:

    ===
    Laura Bush — Spouse: Greatest President
    According to http://www.wikiality.com/Laura_Bush

    http://www.google.fr/search?hl=en&q=laura+bush+husband&btnG=Search
    ===

    Posted by: ben | January 6, 2009 9:05 PM



  25. @Aaron Schwartz - that does look like a possible explanation - but the examples given in that article don't correspond with the type of results the same queries get today. "Who is Jane Fonda?" for example, doesn't return the result that PC Mag post discussed:

    "The query "who is Jane Fonda?" triggers the answer "...is an Academy Award winning American actress, model, writer, producer, activist and philanthropist" and results in a link to the Wikipedia online encyclopedia's entry for the actress. "

    That's not the case any more. Search now for "who is Jane Fonda's husband?" and you'll see the kind of results we discuss in this post.

     Posted by: Marshall Kirkpatrick Author Profile Page | January 6, 2009 9:12 PM



  26. I can also confirm that these have been running for years now. The Search Marketing Community has been calling them "Direct Answers" and you can search for that term and find blogs discussing this years ago.

    Perhaps the parser is just a bit more aggressive and non search folk are beginning to see it more often.

    Posted by: Robert Gentel | January 6, 2009 9:17 PM



  27. I just tweeted about this but it seems on doing a vanity search, Google is now using vcard information in lieu of a meta description tag.

    http://bit.ly/1ftKbq

    There are a couple interesting things about this:

    1) I just put this information up yesterday and I am surprised at the indexing speed
    2) I was using an email obfuscator http://bit.ly/147vo and Google decoded this with ease.
    3) The vcard info was hidden with a display:none but this was ignored. Which kinda makes sense.

    Posted by: Jauder Ho | January 6, 2009 9:35 PM



  28. I just tweeted about this but it seems on doing a vanity search, Google is now using vcard information in lieu of a meta description tag.

    http://bit.ly/1ftKbq

    There are a couple interesting things about this:

    1) I just put this information up yesterday and I am surprised at the indexing speed
    2) I was using an email obfuscator http://bit.ly/147vo and Google decoded this with ease.
    3) The vcard info was hidden with a display:none but this was ignored. Which kinda makes sense.

    Posted by: Jauder Ho | January 6, 2009 9:36 PM



  29. Yes! I have just found Elvis!!

    "Where is Elvis?"

    "Elvis" — Location: Us
    According to http://www.chapters.indigo.ca/books/Elvis-N-A/9781848171053-item.html - More sources »

    Now, if I could only get an exact map pin...

    Posted by: Erik Olson | January 6, 2009 9:57 PM



  30. Been seeing these for a long time too.. several years at least. It's really common for factual info like state capitals, phone numbers, or values of measurement conversions - looks like they are adding vital stats from various sources. Seems like it would be for them to identify recurring queries or keyword conjugations like "britney spears mother" or "britney spears mom" and then provide basic info scrubbed from sources like wikipedia.

    Posted by: Jos | January 6, 2009 10:25 PM



  31. I've always felt that semantic web search is a lousy business to be in exactly for this reason.
    If there's any real value for consumers there (and I guess there is), then why does people think anybody other than Google will own that market just as they own traditional search?

    Google will be the semantic search engine of the future for 3 reasons:
    1) They can: they have the money to build the best semantic search engine in the world.
    2) they dominate search: meaning they can test like hell and can hand out semantic results only when needed.
    3) this is their core business (and pretty much the only business where they make real money).

    To beat Google here you don't have to be first mover, you have to have a significant and ownable advantage (algorithm anyone?), and you'd better have one that Google can't match in a number of years, since people won't "get" semantic search in a day, and I won't change my default search engine for a bunch of niche searches.
    I just don't see any strategic advantage for a new comp

    Posted by: Simone Posted on FriendFeed   | January 6, 2009 10:36 PM



  32. The search results pages aren't being marked up in HTML, which is a shame.

    Um…don't you mean are being marked up in HTML? When I tried this, I got results whose HTML source was b elements.

    Now, what would be really interesting is if Google returned such results inside a containing element with a well-known ID and marked up the contents with RDFa! That would turn Google's SERPs into even more of an API than they are now.

    Posted by: Meitar Moscovitz | January 6, 2009 10:43 PM



  33. who is Justin Timberlake's girlfriend?

    This query retrieves a list/array (of 2 names) and their corresponding details...

    Who is Gandhi's wife? or Who is Mohandas Karamchand Gandhi's Wife?

    This doesn't give a semantic result despite lots and lots of sources having this info explicitly.

    Hmm.

    Posted by: Gubbi | January 6, 2009 10:57 PM



  34. Now when will Google tell me where I left my car keys?

    Posted by: Mert | January 6, 2009 11:09 PM



  35. capital of australia

    Australia — Capital: +1hr, begins last Sunday in October; ends last Sunday in March

    capital of bangladesh

    Bangladesh — Capital: 23 43 N, 90 24 E

    Is it really that hard to test all the world capitals before going live?

    Posted by: Sam | January 6, 2009 11:41 PM



  36. A list with the keywords resulting in semantic results would be more than great.

    Posted by: George Tziralis Posted on FriendFeed   | January 6, 2009 11:59 PM



  37. Seems similar to what http://www.trueknowledge.com/ does.

    They have the community add/verify facts

    Posted by: Matthias | January 7, 2009 12:31 AM



  38. Google's been doing this kind of stuff for years, from what I've seen, but their scope has been dramatically increasing.

    For example, a few years ago I was looking up some science questions and it displayed them there, but I haven't seen anything like familial relations until recently. Whatever they're doing, they've been doing it for awhile, but it's growing.

    Before you know it, you'll type in, "What should I do with my life" and Google will analyze your typing patterns and IP address to determine who you are, match you with an optimum life path that seems in line with your apparent goals, and put anyone who asks google of it onto the fast-track to glory, fame, or what-have-you.
    Or it may just end up saying everyone should be an exotic dancer... only time will tell...

    Posted by: Me a moi | January 7, 2009 12:35 AM



  39. It sure looks like Google is stepping up their semantic results. On the flip side our latest release not only spices up your Googling with semantically relevant content, but also connects it to your social circle.

    Check us out at http://headup.com - No registration required.

    Feedback would be nice though...
    : )

    Cheers all,
    Mike
    "I tweet @headup"

    Posted by: Mike Darnell | January 7, 2009 12:49 AM



  40. Marshall (and Chris) you may be seeing multiple different results-enhancing features - remember, Google runs 50 to 200 experiments at once, and lots of those make it through, if they enhance certain kinds of queries.
    That's also why it can be hard to answer questions about this 'officially' as Chris Saad asks, as there are a lot of separate heuristics running that overall showed up as better results."Every day in every way we get a little better" should be the search experiments team's motto.

    Posted by: Kevin Marks | January 7, 2009 12:57 AM



  41. Speaking of structured data, you're missing some in your own blog post -- malformed HTML in the 6th paragraph has eaten almost half a paragraph. :)

    Posted by: AW | January 7, 2009 1:26 AM



  42. That's pretty old, although it has expanded in scope. See http://glinden.blogspot.com/2005/04/google-and-question-answering.html for example

    Posted by: Nick Lothian Posted on FriendFeed   | January 7, 2009 1:34 AM



  43. @Marshall: "who is Jane Fonda's husband?" triggers the Q & A stuff discussed in http://www.infoworld.com/article/05/04/07/HNgoogleqanda_1.html (from 2005). I think they might have stated using a few more sites as answer sources, but there is no RDF at work here, just plain old (lower case) semantic web (ie http://nicklothian.com/blog/2008/04/30/the-ssemantic-wweb/ )

    Posted by: Nick Lothian | January 7, 2009 2:02 AM



  44. (an after the fact, therefor because of the fact fallacy)

    very poorly written

    Posted by: Ben | January 7, 2009 2:17 AM



  45. This feature is quite old. The source for this is (among other sources) Wikipedia's template feature.

    http://en.wikipedia.org/wiki/HD_DVD has a template called infobox media:

    {{Infobox media
    |name=HD DVD
    |logo=[[Image:HD-DVD.svg|200px|HD DVD logo]]
    |image=
    |type=High-density [[optical disc]]
    |encoding=[[VC-1]], [[H.264]], and [[MPEG-2]]
    |capacity=15 [[Gigabyte|GB]] (single layer)
    30 [[Gigabyte|GB]] (dual layer)

    |read=1× @ 36 [[Megabit per second|Mbit/s]] & 2× @ 72 Mbit/s
    |write=
    |standard=
    |owner=[[DVD Forum]]
    |use=Data storage, including [[high-definition video]]
    |extended from=
    |extended to=
    }}

    If you google any parameter from this list, you will get the output accordingly:

    http://www.google.com/search?hl=en&q=capacity+HD-DVD&btnG=Google+Search&aq=f&oq=

    HD DVD — Capacity: 15 GB (single layer) 30 GB (dual layer)


    It is about as "semantic" as shooting with a shotgun is applied brain surgery.

    Posted by: Mathias Schindler | January 7, 2009 2:36 AM



  46. I've been trying to find out what semantic technology (if any) Google have been developing with for some time. I recently managed to get a hold of some of their engineers in an open letter:

    http://hibbins.wordpress.com/2008/11/24/i-know-theres-an-answer/

    This is clearly part of something bigger, glad that they've begun to expose smaller parts of their (no doubt-grand) scheme, to improve visibility on the topic that's had relatively zero attention so far..

    Posted by: Marc Hibbins | January 7, 2009 3:02 AM



  47. Given that Picasa originally began as Windows PC software, you might be surprised at how many Macs you'll find floating around our Santa Monica office (which is where Google's photos-related work mostly takes place). Of course, Picasa Web Albums, our online photo-sharing site, is browser-based, and used by millions of Mac folks every day, so much of what we do is platform-independent.

    Posted by: New From Google Blogs | January 7, 2009 4:36 AM



  48. Just tried it;

    http://www.google.com/search?q=who+is+Stanley+Kubrick%27s+wife%3F

    Results suck.

    But this is new since I remember running this same query a couple years ago and I did not get that "Stanley Kubrick: Wife" in the results.

    A colon does not mean its semantic data or a proper triple!

    Posted by: Todd | January 7, 2009 5:49 AM



  49. Where exactly is the 'exposing' happening here? Parsing a query using NLP is a different kettle of fish from exposing semantic data -- a crucial distinction that most buzzword chasers seem to not understand.

    This, I would guess, is a bit of pseudo NLP. You can form query format templates using the query data they already have. Using that it is easy to do something like this.

    Link analysis and linking volume already would allow them to do this without any semantic data involved in it. Just that it won't work perfectly and it won't work all the time.

    Posted by: shyam | January 7, 2009 6:02 AM



  50. Hi Marshall,

    I led this project at Google for a long time - we originally launched in April 2005 (http://googleblog.blogspot.com/2005/04/just-facts-fast.html).

    The efforts to add new data sources, especially unstructured data sources, have been going on for a long time. It looks like what you're seeing is a refresh of the data sources used.

    One of the great things about extracting structured data from the entire web corpus is that each new extracted fact can form the basis of extracting thousands of more facts - similar to Sergey's Snowball system way back in 2000 (http://portal.acm.org/citation.cfm?id=336644).

    Cheers,
    Jonathan Betz

    Posted by: Jonathan Betz | January 7, 2009 6:08 AM



  51. 1 2 Next
RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS