searchmonkey - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/searchmonkey en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sun, 22 Nov 2009 12:00:55 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Yahoo Search To Offer Abstracts of Search Results, Determine Intent Next year, Yahoo will introduce new technology to augment their Yahoo Search results: abstracts of key information alongside URLs. Instead of just offering a list of links, Yahoo's search results will include machine-extracted information that is relevant to the URL returned. Sound familiar? The technology is very much like SearchMonkey, except for one thing: this time the technology is being built in-house and not by independent third-party developers.

]]>Sponsor

]]> SearchMonkey Goes In-House

"When you type in a search today, you get a list of URLs, and they are not very informative," said Rajeev Rastogi, vice president of Yahoo Labs Bangalore, where the technology is being developed. The lab is working on automated information extraction which goes into URLs and extracts relevant information. For a hotel, the additional information returned may include an address, phone number, map to the hotel, and its rating. For products, you may get an image of the product, the name of the manufacturer, and the price.

At first, that sounds a lot like Google's Universal Search which returns results from across the Google Search verticals (images, news, etc.) when performing a search at google.com. The difference is that the abstracted information will appear under the URLs listed in a fashion that's very similar to Yahoo's SearchMonkey experiment, a technology that allows independent developers to enhance their site's appearance in the Yahoo search results. This is done by using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction.

As great as SearchMonkey is (we included it in our top 10 semantic web products for 2008), not everyone is using it. Says Rastogi, "Clearly we don't expect that everybody will adopt SearchMonkey, so this 'rich results' piece is our in-house effort to automate the information extraction for large classes of web sites." In other words, Yahoo will "SearchMonkey up" the web sites for you.

New Technology Will Also Recognize Intent

In addition to changing the appearance of the results themselves, the new technology will also offer users help with refining their queries. How closely this aspect ot the technology will resemble Yahoo Glue is yet to be revealed, but it could have some similarities. What is clear, however, is that the feature will go further than current technologies like Yahoo's Search Assist which auto-completes search queries as you type. Instead, the new technology will prompt users to narrow down their queries by recognizing user's intent. In order to determine what that intent is, it will examine previous behavior like the user's prior web searches, visits to various Yahoo web properties, and "other information."

Although that mysterious "other information" cited in the PC World article makes us curious about privacy issues and tracking cookies, the idea is intriguing if you can get past the unsettling feeling that your search engine will get to know you a bit too well. We imagine this to be like an even smarter version of Google's Auto-Correct feature, something that currently gives you the option to search on the correct spelling of a word instead of the misspelling you had typed in.

So, unlike Google, which simply asks you "Did you mean: porshe" (when you queried on the word "porsh"), Yahoo's technology may, in theory, ask "Did you mean: porshe 911 turbo in blue?" That's just freaky.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_search_to_offer_abstracts_of_search_results_determine_intent.php http://www.readwriteweb.com/archives/yahoo_search_to_offer_abstracts_of_search_results_determine_intent.php Products Fri, 05 Dec 2008 08:08:38 -0800 Sarah Perez
Yahoo Search Integrates Citysearch and Zagat - Slowly Gaining Market Share Again? searchmonkey_logo1.pngYahoo's SearchMonkey platform allows publishers to easily write applications that integrate structured data from their own sites into Yahoo's search results. Most of these applications still have to be turned on explicitly by the user, but Yahoo has also started to integrate some of them into its regular search results. Today, Yahoo turned on results from the Citysearch and Zagat SearchMonkey applications for all users.

]]>Sponsor

]]> While Google tends to integrate data from its own products like Google Finance, Images, or Maps into its search results, Yahoo is staying true to its promise of 'open search.' Yahoo started to integrate SearchMonkey results in June, but back then, users still had to enable them one by one. Now, Yahoo is starting to surface more results from trusted SearchMonkey apps in its standard results.

yahoo_search_citysearch.png

Yahoo Search Market Share Back on the Rise?

As Stephen Shankland reported yesterday, some of these initiatives are slowly starting to pay dividends. The latest data from ComScore suggests that Yahoo was able to win back a little piece of the search market back from Google in September (20% vs. 19.6%). Though, as Shankland also points out, Google's results for September were its second best, while Yahoo's were its second-worst.

It is hard to say whether initiatives like SearchMonkey are responsible for this (short-term) gain, but if anything, it is good to see that Yahoo is starting to integrate more SearchMonkey results into its search. While its new advertising campaign might also help Yahoo to gain back some market share as well, in the end, the only thing that matters to most users is the quality of the search results.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_search_integrates_cityse.php http://www.readwriteweb.com/archives/yahoo_search_integrates_cityse.php News Thu, 16 Oct 2008 13:46:36 -0800 Frederic Lardinois
Yahoo Search Comes to the iPhone yahoo_logo_white.jpgGoogle has long been offering iPhone-optimized sites for most of its services, as well as a dedicated search application for the iPhone. Yahoo, however, had mostly been lagging behind with respect to dedicated iPhone offerings. Now, Yahoo has unveiled a dedicated iPhone version of its search service, which, among other things, integrates results from SearchMonkey modules and also does a good job at displaying Flickr photos or movie showtimes in the results.

]]>Sponsor

]]> One of the best features of the web application is Yahoo's Search Assist, which suggests completed search terms as you type. Also, if you are logged into Yahoo already and if you have activated any SearchMonkey extensions, those will also work in the iPhone web application.

Nothing Special

yahoo_iphone_search.jpgOverall, however, the Yahoo web application, while nice, can't compete with the native Google app (iTunes link) or Google's mobile sites for Safari. Just like the Yahoo web app, the native Google application also suggests search terms, but besides that, it can also display results from your contacts and it can display the actual search results as you type. The Google app also features dedicated searches for images, news, shopping, as well as a Wikipedia search, something that is missing from Yahoo's offering.

Yahoo's iPhone-optimized search does what it promises to do, but it is far from being an exciting service. If you are a dedicated Yahoo Search user, then this new site is for you, but overall, we don't think this will get any Google users to switch to Yahoo for their search.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_search_comes_to_the_ipho.php http://www.readwriteweb.com/archives/yahoo_search_comes_to_the_ipho.php News Thu, 21 Aug 2008 10:09:22 -0800 Frederic Lardinois
Yahoo Announces Winners of SearchMonkey Developer Challenge searchmonkey.pngWhen Yahoo announced Search Monkey, its developer platform for search, it also announced the SearchMonkey Developer Challenge, which was going to reward the best search extensions based on SearchMonkey. Today, Yahoo announced the winners: StumbleUpon, BooRah, computer scientist Greg Schechter, and David Hickley from GEDview.com. The grand prize of $10,000 went to Marco Vitanza, for his Blogspot Infobar.

]]>Sponsor

]]> While it didn't win the grand prize, the most interesting and probably also the most useful of these extensions is StumbleUpon's, which won the prize for "Most Innovative Use of Structured Data." Once you activate StumbleUpon's Infobar, every search result in Yahoo Search will display a set of tags, as well as user reviews of every listed site and a list of users who liked the site on StumbleUpon.

stumbleupon-searchmonkey.png

Also interesting is BooRah's Infobar, which automatically displays restaurant reviews from BooRah users together with the search results. BooRah won the prize for "Best Infobar."

Just this week, Yahoo opened up its search APIs even wider, when it announced a new program named BOSS (Build Your Own Search Service). While developers who participated in the SearchMonkey Challenge were restricted to mashing up their own data with results from Yahoo's search, BOSS will allow developers to use all of Yahoo's search index in (hopefully) innovative ways.

While Yahoo is clearly struggling in many ways today, it is great to see that it is still trying to foster innovation in the search space.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_announces_winners_of_sea.php http://www.readwriteweb.com/archives/yahoo_announces_winners_of_sea.php News Fri, 11 Jul 2008 18:15:02 -0800 Frederic Lardinois
11 Search Trends That May Disrupt Google My first post for ReadWriteWeb (nearly a year ago) started with the premise that search was "game over", that Google had won and the only opportunity left was (re)search - i.e. what one does after the basic search. Unfortunately, none of the search start-ups since then has made a dent in Google's relentless march towards search market dominance. In this article, we outline 11 search trends that may change that.

]]>Sponsor

]]> The proposition that launched countless search start-ups was: "If we can get just 1% of the search market, we will have a very valuable business". That may be true, but getting 1% has proved elusive. It has been an all or nothing game.

That may be about to change.

It is possible that Google will not be beaten by one big competitor. It is possible that they will be pecked at by thousands of tiny start-ups using a new outsourced infrastructure.

But before getting to that punchline, here is my 11 point recap of the search market:

1. Disambiguation is (still) not enough motivation to switch. All those learned PhDs with backgrounds in natural language search and AI explaining that the words "paris" and "apple" have multiple meanings that Google cannot parse from a single search, massively miss the point. The average user has figured that out and either enters multiple words or refines the search based on the first search. Using natural language search - which is complex to code and expensive to process - is a classic "hammer to crack a nut" solution.

2. Webmaster push-back and basic economics will accelerate the trend towards an outsourced crawler market. Webmasters won't accept a proliferation of crawlers as some of them maybe malicious and all of them impact performance to some degree. Google Yahoo Microsoft (GYM) will always be accepted as they drive enough SEO, but marginal crawlers will struggle. Basic economics mean that only a very small number of players will be able to afford the giant server farms needed to index the whole Web. The YM parts of GYM (as well as Amazon) will increasingly offer their infrastructure to anybody who can build value on top.

3. Yahoo Search Monkey may have arisen from desperation, but we may also be witnessing a "Linus moment". SearchMonkey is the most well-defined entry into the outsourced crawler market. It comes from their recognition that it is too late to beat Google in a head to head battle, so it could be dismissed as a sign of desperation. However I prefer to see it as a "Linus moment", that point in time when Linus Torvalds simply said "here is what I have done so far, anybody who can take it to the next step is welcome to try". To be truly disruptive, Yahoo may need to open this up even more than they have to date.

4. There will be many more attempts to monetize Wikipedia. Well-funded search ventures such as Powerset have retreated to the much narrower goal of searching Wikipedkia. Freebase also uses Wikipedia as the their core data. Walking around the RPI Web Science Research Initiative, I could see many interesting R&D experiments coming out of Academia all of which used Wikipedia as a base. Wikipedia has just enough structure and normalization to be useful. Above all, the History feature makes "data provenance" possible and that is critical for trust.

5. Core search is still getting funded. This is not what one would expect in what is by any definition a consolidated market with one mighty big gorilla sitting on top. Look at Blekko getting $2m without even a prototype to show the world. Are the investor's nuts? Possibly, but they include some pretty smart guys like Marc Andreessen and the founder Rich Skrenta is clearly a smart guy (his Blog is a good read). Or look at Cuill, which got $25m as recently as April. Maybe they are idealists tilting at windmills. Maybe they know something that the rest of us don't. Only time will tell. These new entrants will eschew any hype, which they know has not one single point of value in adoption.

6. Image search is another "hammer to crack a nut". Searching images, video and audio is one of those "non-trivial" computer science projects that great engineers love to tackle. However great investors should steer clear. It is hard to code and incredibly expensive to process. The competition is tagging (see next point) which is classic "just good enough and improving all the time at virtually no cost" that is impossible to beat.

7. Tagging is quietly but massively disruptive. The fact that thousands of webmasters and bloggers tag their content so that they can be found by Google is Google's secret weapon. But it could get turned against them. A small incentive to be found by other search engines will change tagging behavior. This is likely to play out in lots of vertical niches, where a small change in tagging behavior can make a huge difference in findability and that can make a big difference to both buyers and sellers. Whether people use RDF or Microformats or some other defacto vertical standard will continue to be the subject of much debate, but the format itself is not the issue. The human drive to tag (to order one's world) is deep and strong and has financial motivations as well.

8. Whitelist is a good way to kill spam. Spam is the big problem for search as well as email and whitelists work well for both. In search this is done by a site that uses something like Google Custom Search Engine (or Search Monkey) to define what sites to search within a defined domain. Even if that means defining 1,000 sites and adding new ones every day, that is well within the range that a single human curator can do within a single market domain. The human curator deletes any spam sites manually.

9. P2P search could still be a long-term disrupter and Microsoft's route back to relevance. The only way to do search without putting all the Web's pages into one server farm is via P2P. I have written about Faroo's attempt here. It relies on .Net and this maybe Microsoft's card to play but only if Vista gets real traction. This is a real long shot, but an intriguing one.

10. There is tons of great data inside relational databases that is quite easy to search. It is the HTML layer that is getting in the way. As more sites learn how to expose their structured, relational databases as Web Services APIs, a lot more data will be available that does not rely on word search on HTML pages.

11. It's the Adwords, stupid! All the search wizardry don't matter a hoot if the monetization is not done right. There is plenty of motivation out there. Sellers want cheaper search words to buy. Publishers want a bigger piece of the cake. Buyers/searchers may even want cash back (we will see if Microsoft's crude tactic, lambasted in the Blogosphere, makes it in the real world).

Conclusion

Most of these trends point in the direction of search as infrastructure feeding thousands of innovators in niche markets - a long tail approach, in other words. Google will play in this infrastructure game - they already do with Google Custom Search - but it is vendors such as Yahoo, Microsoft and Amazon with equally deep pockets and much more to lose from total Google dominance, who will be the disrupting innovators in this next phase of the search market.

Image credit: davemc500hats

]]>Discuss]]>
http://www.readwriteweb.com/archives/11_search_trends.php http://www.readwriteweb.com/archives/11_search_trends.php Search Services Mon, 16 Jun 2008 14:45:57 -0800 Bernard Lunn
Weekly Wrapup, 2-6 June 2008 Here are some of the highlights from the week's Web Tech action on ReadWriteWeb. On the product side we analyzed Adobe's new Web Office suite, investigated a worrying exodus of sellers from eBay, looked some more at Yahoo's Search Monkey, and showed you 6 tools to save links with. On the trends side we explored the latest Web happenings in Asia, provided an overview of I.T. 2.0, analyzed the exploding popularity of online video, and checked out the readiness of banking customers to use Web gadgets.

]]>Sponsor

]]> Web Products

Adobe Launches Online Office Suite and New Flash-Enabled Acrobat 9

Back in March, we said that Adobe was slowly building an online empire. This week, that news turned out to be true. Adobe launched their version of an online office suite at Acrobat.com, complete with word processor (Buzzword), web conferencing/whiteboard app (ConnectNow), online file sharing (Share), file storage, (My Files), and PDF converter. To complement this launch, Adobe also announced a brand-new version of Adobe Acrobat, Acrobat 9, the biggest release since the initial one that introduced Acrobat to the world. The remarkable change in this new version is that Adobe is now incorporating Flash into the PDF experience.

Trouble at eBay

"I think [fixed prices] will disappear online, simply because it is possible - cheap and easy - to vary prices online." That was MIT Media Lab's Patti Maes in 1999, at a time when eBay's business was booming and auctions were seen as the future of ecommerce. Flash forward 9 years, and BusinessWeek this week called online auctions a dying breed, Nick Carr is wondering if auctions were a fad. Indeed, the fixed price ("Buy it Now" only) format is beginning to dominate eBay, and the company has taken recent steps push fixed price even harder. But the death knell of the online auction format is not eBay's biggest problem -- no, that would be the small exodus of sellers from the site.

Yahoo! Pushes Search Results Customization to Users

Yahoo!'s SearchMonkey platform got a little more public this week with the unveiling of the Search Gallery -- the platform's official application repository. The gallery has already been open to developers and curious bloggers for a couple of weeks, but Yahoo! is now pushing it to the public at large via a "Customize" drop down menu on all search results. In addition, starting this week developers can share applications via external links even if they haven't yet been approved for inclusion in the official gallery.

6 Great Tools to Save Links for Later

Unfortunately, there just aren't enough hours in the day. This seems to be especially true when you take on a lot of projects. Between blogging, researching, emailing, and real life, reading all of your feeds isn't something we can do all the time. Sometimes, we see something that we'd love to save it for later without cluttering up our bookmarks. Here are 6 tools to get the job done.

See also: RSS Reset: Dump Your Feeds for a Month

SEE MORE WEB PRODUCTS COVERAGE IN OUR PRODUCTS CATEGORY

Web Trends

OpenWeb Asia: Opening the Asian Web to the World

asiaopenweb.jpgEveryone working on the web around the world would like to connect with people in Asia, but it's not easy to do. That dynamic and populous region is often focused inward and it's made inaccessible to outsiders because there is so little information about what goes on there available in the web's dominant language, English. OpenWeb Asia is a new project that aims to change those trends.

See also: C-Shirt: Remixable T-shirts by Mobile Phone and Nico Nico Douga and the Simulation of Real Time (two Japanese web apps that Marshall checked out during his recent trip to Japan)

I.T. 2.0: How Changing Technology is Having Big Impacts on Business

In case you haven't heard yet - the I.T. world is changing. The rise of social computing technologies, generally branded as "Web 2.0" and including things like wikis, blogs, social networking, RSS, and more are slowly making their way into the business world. This new movement is called Enterprise 2.0, and it's no small shift. They're even having a conference about it next week. But the change encompasses more than just the introduction of new, social software into the formerly stodgy business world - it also includes the movement of server software from in-house data centers to the cloud, the rise of a mobile workforce, the rebirth of thin client computing, a self-provisioning user base, and more.

See also: Introducing the Enterprise 2.0 Launch Pad Finalists

The Numbers Are In, Live Video Online Is Blowing Up

ustreamlogo.jpgLive video broadcasting service Ustream.tv announced this week that live feeds on the company's website and distributed video players got a combined 10 million unique viewers last month. That's a major validation of live streaming video on the web. When YouTube Live launches later this year, this medium is only going to get bigger.

See also: Watch Out TV: YouTube is Taking Over

Survey: 48% of Bank Customers Want Web 2.0 Gadgets

WorkLight, a startup that offers enterprise 2.0 products, recently did a survey among Facebook users to find out their willingness to use Web 2.0 tools for secure banking. The survey was conducted among 1000 Facebook users between the ages of 18-34. The fact that the survey was conducted among Facebook users gives it a bias towards tech-savvy people. However there are some surprising findings.

SEE MORE WEB TRENDS COVERAGE IN OUR TRENDS CATEGORY

That's a wrap for another week! Enjoy your weekend everyone.

]]>Discuss]]>
http://www.readwriteweb.com/archives/weekly_wrapup_2-6_june_2008.php http://www.readwriteweb.com/archives/weekly_wrapup_2-6_june_2008.php Weekly Wrapups Sat, 07 Jun 2008 05:00:00 -0800 Richard MacManus
Yahoo! Pushes Search Results Customization to Users Yahoo!'s SearchMonkey platform got a little more public today with the unveiling of the Search Gallery -- the platform's official application repository. The gallery has already been open to developers and curious bloggers for a couple of weeks, but Yahoo! is now pushing it to the public at large via a "Customize" drop down menu on all search results. In addition, starting today developers can share applications via external links even if they haven't yet been approved for inclusion in the official gallery.

]]>Sponsor

]]> "This is the first phase of a larger plan to provide opportunities for viral distribution of SearchMonkey apps," said Yahoo! Search Product Manager Amit Kumar on the Y! Search Blog. "We're continuing to develop new ways to surface and share useful and high-performance applications in users' search experience and more broadly on the web, so expect more in the near future."

As of launch, the gallery contains 39 approved applications. These range from apps enhancing Yelp! and LinkedIn results to one that provides a code reference for Ruby related searches.

SearchMonkey has the potential to be very disruptive in the search space. It gives web developers the ability to enhance the display of search results without the ability to influence search rankings. Who better to know how to best display content than those who created it? Unfortunately, the search applications I tried out today mostly didn't seem ready for primetime.

As you can see in the screenshot above, the Yelp! application that I installed didn't really enhance search results beyond slapping a logo next to the URL. In theory, my search for the Petite Deli in San Francisco should have yielded a Yelp! result with the deli's address, phone number, rating and links directly to user reviews and photos. The IMDB application (which admittedly appears not to have been created by IMDB) gave me connection errors, and the Last.fm app behaved similar to the Yelp! application for me -- it enhanced nothing.

Did Yahoo! jump the gun on pushing the Search Gallery public? Or perhaps have their gallery approval standards been too low and these apps just haven't been tested thoroughly enough? Or is this a local problem? Let us know your experiences with the new search apps in the comments below. Yahoo! will win no converts with applications that don't work as advertised.

Edit: I eventually got some of the faulty apps to load after repeating my search a few times. As Greg in the comments below noted, this might be a caching issue. That's still something that Yahoo! should deal with, as first impressions are often lasting impressions.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_search_gallery.php http://www.readwriteweb.com/archives/yahoo_search_gallery.php Yahoo Thu, 05 Jun 2008 07:14:10 -0800 Josh Catone
Semantic Search: The Myth and Reality For a few years now people have been talking about semantic search. Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different. What are we doing wrong?

]]>Sponsor

]]> For example, when asked, What is the capital of France? both approaches come back with the correct answer - Paris. Also, a lot of queries that we are used to typing into Google in abbreviated form, come back with similar results if we type them using natural language. Clearly something is off. We all know that semantic technologies are powerful, but how and why? In this post we will show that the problem is that we are asking wrong questions.

The mistake is that semantic search engines present us with Google-like search box and allow us to enter free form queries. So we type the things that we are used to asking - primitive queries. It never occurs to us to type in What actor starred in both Pulp Fiction and Saturday Night Fever? or What two US Senators received donations from a foreign entity? We type simple questions, but this is not where the power of semantic search lies. Lets look at the spectrum of semantic technologies from Google, to SearchMonkey, to Powerset, and Freebase to understand what is going on.

What Problem Are We Trying to Solve?

The first confusion in the space comes from the fact that semantic search is being positioned as the answer to all possible problems - from modern search, currently dominated by Google, to problems that are computationally impossible. The situation is made more difficult by the fact that right now there is only a thin range of problems where semantic search can clearly do better. This range is complex queries involving inferencing and reasoning over a complex data set.

As shown in the diagram above basic queries are easily handled by Google. Sadly, natural language processing gives little advantage when it comes to this category of problems. Google correctly answers the question about Leonardo Da Vinci's birthday leaving no opportunities to improve the search by understanding the nouns and the verbs that user typed in.

Before looking at the problems that are perfect for semantic search, lets look at the hardest problems. These are computationally challenging problems that really have nothing to do with understanding semantics. The misconception has been perpetuated since early days of the Semantic Web that somehow, because we will annotate the web, we will be able to solve these super complex problems. This is simply not true. There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.

The good news is that there is a set of problems that are great for semantic search. These are the problems we have been solving so wonderfully with relational database. Way too often we forget that semantic technologies are here to help us represent relational data spread over the entire web - so it should be no surprise to us that it is relational queries that semantic search engines would excel at.

The Spectrum of Semantic Search Players

But semantic search is not just about the questions that we are asking. Because the web is just a bunch of unstructured HTML pages, semantic search is also about the underlying data. At its most structured extreme we find Freebase - the semantic database of everything. Freebase is accessible via free text search, but more importantly via MQL (Metaweb Query Language). MQL is essentially JSON with wildcards. Using it you can construct any query against Freebase and the result will be the same query with answers filled in.

Powerset, in a way, is just a relational database. It operates against certain, structured information. On the other end of the spectrum is Google, which is all about statistical frequencies and very little semantics. The recently launched SearchMonkey from Yahoo! is an interesting twist. It does not add anything to the result set, but instead uses semantic annotations to present a richer, more interactive and useful user interface.

Companies like Hakia and Powerset are probably working the hardest. These companies are trying to simultaneously build Freebase-like structures on the fly and then do natural language queries on top of them. The difference is that Hakia is using (likely similar) technology to query over the entire web, while Powerset has (probably shrewdly) chosen to restrict the search to Wikipedia.

Are Hakia, Powerset and Freebase All That Different?

This analysis brings up a question - which of these technologies are different and which are essentially the same? Lets get the easy one down first. Yahoo!'s SearchMonkey is no different from Google or any other search, as far as the core search technology is concerned. The difference is simply in the presentation layer. SearchMonkey is smart about creating a better user experience by letting publishers present the search results to the users in the best possible way.

But when it comes to Hakia, Powerset and Freebase the situation is much more complicated. On the surface all these products are different - Hakia lets you search the whole web, Powerset is restricted to Wikipedia (and Freebase!) and Freebase itself has two search interfaces - the search box and query language. Here is the problem - the natural language interface has nothing to do with the underlying data representation.

The fact is that all of these semantic search technologies allow people to type in arbitrarily complex questions and then interpret these queries and execute them against their databases. Fundamentally, Hakia, Powerset, and Freebase are databases. Fundamentally, all of them have some kind of Natural Language Processing that translates the question into a canonical query over the database.

To gain insight into all of this, think about Freebase and its query language MQL. Unlike natural language, which allows all sorts of constructs, MQL is non-ambiguous. This JSON-like language allows users to construct precise statements against Freebase. The fact that Powerset allows natural language queries does not mean that inside Powerset there is no database. For sure, though, there is a similar kind of database as there is beneath the Freebase search box. What is really different about Freebase and Powerset is the data gathering approach and user experience.

Back to the Future: It's All About UI

Probably the most striking revelation about the semantic search space is User Interface. First, to go on the tangent, Powerset got it right by realizing that semantics needs to be surfaced in the UI. After a user searches Powerset, a contextual gadget, aware of the semantics of the results, helps the user complete the search experience.

Yet the biggest mistake that I think Powerset is making is also in the UI. The search box that everyone is familiar with via traditional web search engines needs to go. Having a simplistic search interface hurts Powerset and Hakia, and to a lesser extent Freebase, which is not positioning itself as generic search.

Think about the recent launch of Powerset. The company released a vastly better way to interact with one of the most important sources of information on the web - Wikipedia. But what did the critics say? Lets see if this is a Google killer. And the answer to that is "no."

But what if Powerset restricted what can be searched? What if instead of a search box there was another interface or what if they told users not to look up things that they can find easily on Google? Why is it that new companies are expected to improve on the algorithm that has ruled the web for over a decade? Instead, the expectation should really be to solve the problems that can not be solved by Google today.

Conclusion

Semantic search is an upcoming technology that has set the expectations way too high. We have all been misled into thinking that these technologies are here to dethrone Google by delivering better search results. Neither of those things are true. What is true, however is that semantic search is going to be big and it is going to help us answer questions that we simply cannot answer today - complex, inferencing queries asked over the entire web as if it was a database.

In order for these semantic search technologies to make a dent in the market, they need to clean up their messaging and most importantly, their user interface. Presenting a search box is both misleading and detrimental, as people associate it with the simplistic questions that Google solves without any problems. To really showcase semantic search, these companies need to come up with innovative UIs that will help users to understand the power that is being put at their fingers.

As always, please tell us what you think. What should semantic search companies do to gain their place in the marketplace?

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php Trends Thu, 29 May 2008 14:15:01 -0800 Alex Iskold
Making the Web Searchable: The Story of SearchMonkey Last week at the SemTech 2008 Conference that took place in San Jose, Yahoo! Researcher Peter Mika spoke in detail about the company's new SearchMonkey search platform initiative. Mika talked broadly about his work looking at metadata on the web, and how that led to the birth of SearchMonkey. This post is based on notes from that talk.

]]>Sponsor

]]> History of Web Page Annotations

The motivating question for Mika's presentation was: How can we make web search better by leveraging web annotation? There are many kinds of annotations, but Mika focused on simple data and lightweight semantics, and began by reviewing the history and evolution of annotations to explain how we got to where we are today.

One of the first methods of annotating HTML was Simple HTML Ontology Extensions (SHOE). This method allowed for the declaration of ontologies as well as relationships between the entities on HTML pages. The problem with it was that it introduced new tags that were not part of standard HTML and were not recognized by most browsers.

In 2003 Tantek Celik started work on Microformats - a way to embed light semantics using XHTML. Microformats are now driven by a community of developers, which evangelizes existing formats and is working on new ones. The major focus of this effort is to leverage standards, but Microformats are limited because they don't share common syntax. Every microformat looks different and there are no ontologies, and no schemas.

Things get particularly complicated when you start combining different Microformats, for example, when you describe that a person wrote a review at a particular event. In addition to this, Microformats have no concept of unique identity, and for this reason are largerly incompatible with other Semantic Web efforts. Yet, Microformats took off and have become somewhat widespread. So, the take away here is that simple things can quickly gain adoption.

Another way of providing metadata that emerged recently is tagging. As an example, Flickr uses tags for photos to enable its users to annotate and describe the content. The problem with tags is that there is no agreement on meaning, so the same tag on Flickr and del.icio.us can mean different things, and there's no way to be sure which tag means what. Tags are a much more personal way of annotating information; they are not objective.

In 2005, Ian Davis, CTO of Semantic Web infrastructure company Talis, proposed eRDF - a form of RDF that can be embedded into HTML (compatible with HTML4). There is a simple mapping from eRDF to RDF so you can use any RDF/OWL vocabulary. But eRDF is not full RDF -- it has limitations. For example, there are no data types and there no blank nodes. Also, each page can only "talk" about itself and not about other pages.

Finally, the W3C published RDFa the latest embedding of RDF in XHTML, which has full RDF support. RDFa adds complexity in terms of implementation, but at the moment, gives the best way to embed RDF into HTML.

How Much Metadata is Out There?

Given the increasing trend towards web annotations, the natural question is, Just how much metadata is already out there?. Peter Mika set out to answer this question and created a prototype, called Microsearch. The idea was to look at web pages and to see how much metadata was there. Beyond that, Mika was also interested in what type of metadata, as well as the ratio between annotated and plain HTML pages.

With the Microsearch exercise, Mika wanted to demonstrate what could be done to enhance search with this information. For each type of metadata, Mika augmented search results with additional links and information. For example, maps, events, information from hCard, etc. are presented in an enhanced way, unlike what we're used to seeing with today's search engines.

Mika discovered a few interesting things. First, about 53% of queries have 1 page with metadata in the top 10 results. However, lots of the data Mika saw was not clean and contained information that was not well formed, and performance was pretty poor due to lack of an index. So the unfortunate conclusion that Mika came to was that RDF templating was difficult and the approach was not easily scalable. Finally, Mika realized that metadata really needs to be on the page for users to see, because otherwise there is a big opportunity for semantic spam.

The Birth of SearchMonkey

The point of any experiment is to draw the right conclusions. Looking at the facts, Mika and the Yahoo! search team realized that they could not count on enhancing search by leveraging metadata on today's web - it simply does not exist to the extent needed. At the same time, it was clear that enhancing search results and cross linking them to other pieces of information on the web is compelling and potentially disruptive. Yahoo! realized that in order to make this work, they need to incentivize and enable publishers to control search result presentation. And thus, SearchMonkey was born.

SearchMonkey is a system that motivates publishers to use semantic annotations, and is based on existing semantic standards and industry standard vocabularies. It provides tools for developers to create compelling applications that enhance search results. The main focus of these applications is on the end user experience - enhanced results contain what Yahoo! calls an "infobar" - a set of overlays to present additional information. For example, with SearchMonkey, LinkedIn is able to surface additional information from the user profile, Netflix can present a blurb a about plot and a rating for a movie, and Barnes & Nobles can embed a preview of a book.

SearchMonkey's aim is to make information presentation more intelligent when it comes to search results by enabling the people who know each result best - the publishers - to define what should be presented and how.

A Better Search Experience Ahead

This first version of Search Monkey is just the first small step towards creating a better search experience. Much more is planned, but even with this first simple version, we can clearly see the power of semantics and annotations in web pages. By creating the right incentive for publishers and putting them in control, Yahoo! is aiming to up the bar on search results, and, who knows, maybe even start attracting converts from Google's plain-looking results.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semtech_making_the_web_searchable_searchmonkey.php http://www.readwriteweb.com/archives/semtech_making_the_web_searchable_searchmonkey.php Semantic Web Tue, 27 May 2008 20:29:34 -0800 Alex Iskold
Reuters Launches Calais 2.0 - Now With Pop-Culture Thomson Reuters' Calais, a semantic markup API that we first reviewed in February, has reached its 2.0 release. The latest version aims to fix one of the main issues with Calais -- that it was too focused on business. Because Calais has roots as Clearforest, the rules it applies while parsing text are biased toward the language of business, which meant that its utility was limited. Version 2.0 has added new semantic entity types in an effort to rectify that.

]]>Sponsor

]]> Calais 2.0 has a dozen new semantic entity types, which Reuters says will increase its utility for "pop-culture publishers and bloggers covering media, music, entertainment and sports, as well as those covering pharmaceuticals, medicine and healthcare." In addition to expanded semantic identification capabilities, Calais 2.0 can now prints results in the Simple Tags format and Microformats, as well as the original RDF.

More than 3,200 developers have signed up to work with Calais since launch, according to product lead Thomas Tague, who said in a press release that Calais and plugins and services built on the API will "make it easy to kick-start metatagging and enter the era of the Semantic Web."

Along with an updated web site, a handful of new code samples and libraries, Thomson Reuters is announcing three new plugins that utilize Calais.

  • Calais Marmoset is a tool that enables developers to automatically create metadata for use with Yahoo!'s open search platform, Search Monkey (our coverage).
  • Calais is also announcing the official release of Tagaroo, a Wordpress plugin that allows bloggers to automatically tag relevant people, places and things in their posts, as well as pull in semantically relevant Flickr photos. We wrote recently about an unofficial Wordpress plugin for Calais, and noted that its utility would be limited mainly to business and tech bloggers because those were the API's strengths. Calais 2.0 should theoretically improve the utility for both plugins for a wider variety of bloggers.
  • Though they've been out since last month, Thomson Reuters is also officially introducing their Calais plugins for Drupal, a popular content management system, that it developed with Phase2Technology.

Calais is an awesome top-down semantic API that can help fuel the bottom-up approach by combing unstructured data and spitting out structured tags. We're excited for the second version of Reuters' product and the added utility that new semantic entity types should bring.

]]>Discuss]]>
http://www.readwriteweb.com/archives/calais_20_launches.php http://www.readwriteweb.com/archives/calais_20_launches.php Products Sun, 18 May 2008 21:01:01 -0800 Josh Catone
The Monkey is Out of the Bag: Yahoo! Opens Search Developer Platform Paul Miller reports that Yahoo! is today opening up its open developer platform for search SearchMonkey. SearchMonkey, which we reported on at the Web 2.0 Expo, is a component of a major overhaul at Yahoo! across all of its properties to "rewire" for the social graph and data portability. SearchMonkey allows developers to build applications for Yahoo! search "that enhance the usefulness and relevance of search results," according to Amit Kumar, Product Manager for Yahoo! Search.

]]>Sponsor

]]> "With SearchMonkey, developers have a hand in shaping the next generation of search by building customized search results and mash-ups that users can add to their Yahoo! Search experience," said Kumar.

The SearchMonkey platform has three main components, according to Yahoo!:

  • "Site owners share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction.
  • Third party developers build SearchMonkey applications.
  • Consumers customize their search experience."

SearchMonkey applications come in two flavors: Enhanced Results and Infobars -- though both theoretically enhance search results. Apps are triggered when organic search results include a specific URL. Enhanced results replace a normal search result and must include information only from the site referred to in the actual result. Infobars, which appear directly below results, can include links to other resources or calls for user action.

For example, if you owned a Lebron James fan site, you could create an Enhanced Result that replaced instances of results from your site with a box showing James' latest stats and news articles pulled from structured data on your site.

Yahoo! Developer Network today released a quick guide to adding Microformats to your site. Indexing Microformats, and then further sweetening the pot by allowing developers to create applications that use that structured data and enhance the actual search results, should help push the use of semantic markup across the web. Besides potentially creating a better user experience via search results widgets, by incentivizing the use of semantic markup Yahoo!'s new open developer platform will help get us to a world where the bottom-up approach to the Semantic Web is feasible.

Yahoo! is also hosting a SearchMonkey Developer Challenge with $10,000 in prizes going to winners in 4 categories: Best Enhanced Result, Best Infobar, Most Innovative Use of Structured Data, Best Data Service, and Grand Prize (best over all categories).

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_searchmonkey_launches.php http://www.readwriteweb.com/archives/yahoo_searchmonkey_launches.php Yahoo Thu, 15 May 2008 10:30:11 -0800 Josh Catone