ReadWriteWeb

Google Custom Search: Setting The Bar For Vertical Search Engines

Written by Alex Iskold / September 6, 2007 11:27 AM / 25 Comments

Google already dominates the web search market, with between approximately 55% and 65% of the market depending on who you ask. The company's flagship product has been responsible for its phenomenal growth and everyone knows that Google made its fortune by tying its genius search algorithm to advertising. It is perhaps less known, however, that the web giant has opened its search engine for use on any web site, by any service. Dubbed Google Custom Search Engine or CSE, the product exposes the API behind the world's most powerful search engine. Why is Google offering this API? How can it be used? And what is the connection to vertical search? We explore the what, how, and why of Google CSE in this post.

The Basics of Google Custom Search

You can think of Google Custom Search as a filter over the main Google search engine. This is a bit of a simplification, but a good way to initially wrap your heads around the concept. By creating a filter, CSE allows its users to restrict search results to particular sites that match URL patterns or keywords.

The resulting engines can be searched via API or a search box that users can place on their web sites. The monetization strategy for Google is straightforward, and not surprisingly it is based on ads. Unless used for academic purposes, custom search engine results display contextual ads just like the regular search engine results do. Creators of custom search engines can earn a cut of the ad revenue by linking their engine to an AdSense account.

For example, you can create a custom search engine that only searches one site. Many sites, have done that, instead of building their own search solution. Another thing that you can do is to restrict the search to a specific list of sites, in essence creating a vertical search engine, which we will discuss at length below.

Custom engines can be created and managed using a simple visual interface or, for more advanced users, an XML file. The UI version is essentially a wizard where the user is prompted to fill in the basic information about their search engine, a list of sites for the engine to index, and to define look and feel of the search and results pages and configure other advanced options. You can make the search engine private or have it listed in Google's custom search directory. Interestingly, you can invite other people to collaborate with you on creating your search engine. The process of creating an engine takes just a few minutes, and when you're done you get a page that looks a lot like Google itself with just a search box.

Custom Search Engine In Action

For this example we created a search engine for music reviews by telling Google Custom Search to index only sites that feature music reviews. In a way, this is like teaching Google semantics, because the sites that we hand pick contain mostly content for music reviews. There are two major types of sites that we picked - music magazines and music review blogs.

We then searched for a recent album by Josh Ritter - "Historical Conquests of Josh Ritter." The results from CSE only have links to the album review pages:

If we were to search Google directly with exact same phrase, we would not get just reviews. The matches there would lead to Wikipedia, the artist's home page, and album links at various retail sites, all mixed with the review pages. Interestingly, when we added the word 'review' to the search, the results from Google were similar to the ones returned by our custom search engine.

Still, the results returned by the specialized engine were more precise and targeted. The key to good results is a good selection of sites. The more high quality music review sites that we add to this engine, the better it will perform. It does not need to be a large number of sites, however. Even our initial set of 20 high-quality sites returned good results for a lot of recent music albums.

Powering Up Vertical Search

Google Custom Search Engine is a platform for building vertical search engines. What if the engine contained links to electronic sites, would it be close to Retrevo? Imagine keying every active blog on the Internet into a custom search engine (there is an API, so the process does not need to be manual). Could that yield a search engine that compares to Technorati or Google's own Blog Search? The answer is - very likely. Consider an example of a startup that is doing just that.

Colorado-based Lijit, allows people to search the web through the experiences of other people. One of Lijit's core ideas is that each of us is an expert in a particular area. For example, Brad Feld is an expert in Venture Capitalism and Investment. When you are looking for quality information about venture capital, it makes sense to ask Brad. Lijit's search engine does exactly that by searching through the various pieces of Brad Feld's online existence, including his blog, del.icio.us bookmarks, and Facebook profile, etc.

Behind the scenes, Lijit actually creates an instance of Google Custom Search Engine to do the search. This engine is configured with links to blogs, social network profiles, photos, videos and everything else that defines a person as a vertical. By leveraging Google's infrastructure, Lijit has given themselves a huge jump start. If they had to actually build a crawler, likely all technical efforts would be consumed doing that. Instead, the team built on top of Google's offering and focused on presenting the best way to search through online personal experiences.

Vertical Search Is Reduced To UI

Lijit's example naturally leads us to the this question: What is the impact that Google CSE has on the vertical search space? Does it make it a commodity? Not entirely, but it does commoditize the infrastructure. There is no longer any need to build custom crawler. Crawling and indexing web sites and other online information is a huge problem that requires a lot of resources, and even if you have them, there exists a very real chance of not being able to get it right. Look at Microsoft -- they still can't crack it.

So if the infrastructure problem is solved, the innovation is pushed up to the UI level. How the results are presented is what can make a difference. For example, Retrevo further clusters results on their vertical search engine into different categories, and distinguishes reviews, product manuals, etc. It adds semantical understanding not only to the filtering of the underlying sites, but also to the presentation of the results. Given that filtering can be done using Google CSE, the innovation is basically in the presentation of the results.

Conclusion

Google CSE is an interesting piece of web infrastructure. On one hand, it simply opens up a different use for Google's core technology. On the other hand, though, it commoditizes the backend of any vertical search engine. However, we think that it's more of a blessing than a problem for the vertical search players, as they can now focus on their core specialty - presentation of the results in the given domain.

Please share with us interesting examples of Google CSE that you've seen online and tell us your thoughts about what Google CSE means for the vertical search space.


1 TrackBacks

Listed below are links to blogs that reference this entry: Google Custom Search: Setting The Bar For Vertical Search Engines.

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1588

» Weekly Wrapup, 3-7 September 2007 from Read/WriteWeb

Here is a summary of the week's Web Tech action on Read/WriteWeb. Note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email. Web Future Week This week we focused on the future... Read More

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Great post! Couldn't agree more. Duzzio is an example of a Google CSE dedicated to Web Technology. As in your music reviews example, restricting the context to which a query is applied and hand picking the sources returns really good, focused results. We also added an index to make it easy to look up companies and products and see what has been written about them over time. Comments and suggestions are welcome.

    Posted by: Daniel | September 6, 2007 1:00 PM


  • There are a few of my favorites over at AltSearchEngines:

    http://altsearchengines.com/2007/09/04/googles-custom-search-engines-run-amuck/

    Posted by: Charles Knight | September 6, 2007 1:07 PM


  • When custom search came out, I immediately created a bunch of custom search engines for my needs based on people whose information I trusted. Lijit essentially made those redundant via the way it is set up and the ability to broaden your network. It's become my de facto search engine for anything related to my areas of interest.

    Posted by: Deepak | September 6, 2007 2:23 PM


  • Very interesting ideas here.

    "...this is like teaching Google semantics..."

    imagine what could happen if users could start tagging different custom searches with their own labels? and then Google leveraged that layer of categorization to enhance their regular search?

    Google's getting smarter...

    Posted by: Mike Arauz | September 6, 2007 3:33 PM


  • It seems like a great thing for vertical search.

    Posted by: Coleman Foley | September 6, 2007 4:18 PM


  • Hi Alex, my first CSE experience was about 1 year ago; I created a decent Turkish blog search engine under 30 mins - just for learning purposes and fun. It was a straightforward process. One thing wort mentioning is your Google CSE efforts actually can feed back Google and make them more contextually aware about the sites that you add. A great example of collaborative filtering. x is the number of people who make a Google CSE. Say, 10 is the average number of contributors in each CSE - considering that some of these contributors will get to know with this tool and create their own CSE as well, this gives Google the opportunity to boost its results by an incredible exponential rate.

    Posted by: Emre Sokullu | September 6, 2007 5:14 PM


  • We used CSE to create a greener google http://google-black.blogspot.com

    Posted by: google-black | September 6, 2007 10:16 PM


  • I use Google CSE every day - love it. Nice post on the deeper issues, Alex.

    Posted by: Marshall Kirkpatrick | September 6, 2007 10:47 PM


  • One problem with Google CSE is that you can't use it as a search engine for your own site. Google doesn't crawl small site, as it does large site, and that means that you need to wait for your updated site to be included in the search...I'll be glad to get suggestions of a search engine code that will give me an updated results with my site look and feel...

    Posted by: Shay | September 7, 2007 12:59 AM


  • I think one the HUGELY important things about CSE is the fact that there is a programmable API that allows you to create CSEs on-the-fly: Dynamic Google Custom Search Engines

    tony

    Posted by: Tony Hirst | September 7, 2007 2:22 AM


  • I've used both a Google Custom search and a Lijit Search for my site, and I've been very impressed with both products.
    Visitors to my site that click on my search page can use the Lijit box to search my blog and other web accounts of mine, while the Google Custom search harvests results from my blog and its linked pages.
    Both have a niche and both seem to evolve over time, tweaking their results as they build up the data on each custom engine.
    Overall I've been very impressed with both products, which make it possible to offer very high-powered custom search results to any site regardless of number of users.

    Posted by: Dan | September 7, 2007 2:50 AM


  • impressed and interested, need to check it out.

    Posted by: omad | September 7, 2007 5:20 AM


  • Here is two good search engines:
    Readle - Quality controlled search engine:
    http://www.readle.net/

    Clipoid - The world's largest video search:
    http://www.clipoid.com/

    Posted by: stefan svartling | September 7, 2007 8:04 AM


  • I've been playing with tagZar for a bit, which kinda does this, but in a "community" way.

    Posted by: Tom | September 7, 2007 8:13 AM


  • Wow - this was a great, detailed, and well-written post! Thanks for the info. I will be sure to bookmark and revisit often.

    P.S. Found via Digg.com

    Posted by: Mike Gilmore | September 7, 2007 8:42 AM


  • I wrote a script to search all top-level-domains for a specific URI... That's kinda like custom search.

    Posted by: sean | September 7, 2007 9:28 AM


  • I also heard that with the google search engine, their also adding a free text messaging app like what peekamo does. Just another great app in google.

    Posted by: Dean | September 7, 2007 9:36 AM


  • Good post, thank you. I've been playing with CSE, and might just be in love.

    Want to contribute?
    http://www.google.com/coop/cse?cx=004418454459962176525%3Ayijs9wpbl84

    Social elements need more google love though.

    Also see http://blogoscoped.com/archive/2006-11-15-n50.html for Creating [More] Advanced Custom Search Engines.

    :)
    http://uxdesign.com

    Posted by: uxdesign.com | September 7, 2007 1:03 PM


  • "it's more of a blessing than a problem for the vertical search players, as they can now focus on their core specialty - presentation of the results in the given domain."

    Not sure I agree. Vertical search also needs to figure out monetization - and on one hand Google makes that possible - a lot of the margin remains in their hands - so direct relationships with advertisers may be of more value in the long run.

    Vertical search players have an advantage here - because they have specialized knowledge of the market - and should be able to use that to idenfify advertisers directly.

    Posted by: Peter Childs | September 7, 2007 4:38 PM


  • I looked at Google CSE as an option to build vertical search.
    I agree with all the points in this article but Google index has huge holes based on topic and websites in question.

    I conducted extensive search results tests. Googles did not have 70% of URLs I needed in its index, i.e out of 100 URLs only 30 were indexed. It is not acceptable for vertical search engine. I must say that quality of Google's index is based on topic and websites in questions. For my topic and websites Google index did not work.

    CSE is a great product but beware that you do not have control over index, make sure to test quality of Google index for your vertical search before committing to it.

    My opinion, in a long run it is better to build your own index then rely on Google’s (scope of vertical search is very selective and focused, domain knowledge can be applied to indexing)

    Posted by: Armen | September 7, 2007 7:32 PM


  • I did a couple of video tutorials on the subject:

    Setup:
    http://www.conversationmarketing.com/2007/07/video_tutorial_google_custom_s.htm

    Analyzing search data:
    http://www.conversationmarketing.com/2007/07/analyze_internal_search_data_w.htm

    Posted by: ian | September 7, 2007 11:27 PM


  • Hey, really interesting,

    we're getting started with something similar, but counting
    more on people to pick the right results.

    take a look at searchons.com

    it's google's plain adsense search integrated but with people's choice on top.

    it's just getting started, so maybe you will have to add your best pick for your search term in order to see the power behind it.

    nice post btw.

    thanks,

    oh and let us know what you think about searchons,

    we're adding the user / group module as we speak.

    Posted by: searchons admin | September 8, 2007 4:29 PM


  • One major problem with Google CSE is that it does not return results from Google's supplemental index when you're including more than 3 sites in your CSE. This is a real problem, since many smaller sites are in the supplemental index, and when you're creating vertical search engine you want to be able to reach such content, which is otherwise hard to find.

    Other than this really painful issue it's a great tool - I've created site based almost solely on Google CSE and I see nice traffic and profits.

    Posted by: Amit | September 8, 2007 11:48 PM


  • Where this is very interesting is for brands, for example Whole Foods create a branded CSE for recipies? Raise brand awareness and retention by offering a free service.

    This is an area, we're calling 'appvertising' sounds like a cross between an ape and a french green rinse, but it's all about brands offering a service. As you mention the value is in the UI, let Google do the heavy lifting and focus on the user experience.

    Posted by: Adam Martin | September 12, 2007 8:35 AM


  • Here are the reasons why Google CSE will not work for vertical search:

    1.) Verticals need complete control over the crawler. It's not just about adding or removing whole sites that CSE allows you to do but crawling relevant content from sections of larger sites or smaller websites with a few pages. Control is also required for the revisit interval.

    See the forum thread...
    http://groups.google.com/group/google-custom-search-results/browse_thread/thread/f37972cdb1ffadca/3ab7e79574e2e1a4?lnk=gst&q=crawler&rnum=1#3ab7e79574e2e1a4

    2.) Deep indexing required for Verticals. Google does not always index all pages from a given website since it was built to be that way. Google is focused on horizontal search rather than being deep in every area.

    See the forum thread...
    http://groups.google.com/group/google-custom-search-creating-and-editing/msg/768e0a6b7b93ee15

    3.) In horizontal search, you always get something back relevant or not. But in Vertical, what you bring back matters. Search techniques/syntax and relevance make a true difference. This is the reason why Google has not made much headway into the Enterprise search market where products like Autonomy, FAST, Endeca, SearchBlox etc will work better.

    See the forum thread...
    http://groups.google.com/group/google-custom-search-results/browse_thread/thread/4e9614f885bacf5e/b9f97ed9d35927ec?lnk=gst&q=search+&rnum=3#b9f97ed9d35927ec

    TSS

    Posted by: TSS | September 15, 2007 4:51 AM




RECENT JOBS


RWW READERS


TEXT LINK ADS


RWW PARTNERS

adaptiveblue

Yahoo Buzz