Last week I had the pleasure
of interviewing the head of Google's Webspam team, Matt Cutts. The topic of our conversation was
Next-Generation Search. In my pitch to get an interview with someone at Google, I
explained how Read/WriteWeb has
been
covering Next-Gen search a lot and so it
would (obviously) be great to get Google's views on this topic!
Matt Cutts is a well-known Google identity, who apparently gets mobbed by fans at SEO conferences. His Wikipedia page states that he co-invented one of the most well-known patent filings from Google, involving search engines and web spam. One note about the following interview: Google has a policy of not discussing competitors, so a few of my original questions had to be dropped or re-phrased.
Richard: When we write about 'next-generation search' on Read/WriteWeb, a lot of times we position it as: how can a startup become the next Google? But obviously Google is also hard at work with next-generation search technologies. Can you give us an overview of what Google is working on in regards to next-gen search - e.g. personalized search, AI, etc.
Matt: I think personalization has a very high chance of being able to improve search for the average user. One of the great things about it is, you don't really have to do a lot of work. Once you decide this is something you're interested in, Google can take care of a lot of the details. I recently saw a post online where somebody was complaining about metadata and having to make metadata - and the nice thing about personalization is that it's free for the user. So as far as the next generation of search, I think that is something that is very exciting.
Richard: Can you give us a couple of examples of how Google is implementing personalization?
Matt: I think of localization as a type of personalization. If you type in a query like "football", that will give you different results in the US versus the UK. And a query like "bank" done on Google - in New Zealand it will get New Zealand banks, in Australia it will get Australian banks. And it makes a big difference to know those sorts of things. So that's just personalization at a country level, but it already shows the sort of potential that you can reach.

Matt Cutts flanked by two fans; Photo by Chris
Pirillo, inspired by Jim Boykin.
Richard: Also recently Google implemented personalization with Google Accounts, so I believe personalization can happen out of that, via the main Google search?
Matt: Absolutely, yes. It's nice because the mental model that users have to keep has been simplified. So now if you're signed into Google search, we will be able to help personalize your search results. And that's a really nice win, because it's much easier for people to know. If I don't want personalized search results, I can just click in the top right and sign out. But if I am signed in, I can check that by just looking at my email address in the top right - then I know that I'm benefiting from personalization automatically.
Richard: What do you think about semantic technologies (like for example Hakia)? How important is natural language understanding for search and is Google doing anything in this direction?
Matt: We do pay a lot of attention to a lot of different technologies, so I would define Google's approach as very pragmatic. And we keep an eye on the entire space and we try to say, ‘ok what are the areas that are most promising for users?’ Historically it's always interesting to view the progress of semantic technologies. For example if you do a search like: 'how many states are in America?'. Some search engines that claim to be semantic won't do a good job in delivering the right results, whereas Google can do a very good job - even if you think, ‘ok how can they handle natural language, or how can they handle the semantics of that search.’ And I think what Google benefits from is the sheer size of the Web and the sheer amount of data, and it really does help us understand the meanings of words and synonyms. So we do have a pragmatic approach and we don't necessarily place all our bets on one particular way of doing things. We are exploring a lot of different things all at once.
Richard: So you would say that Google is already doing that kind of semantic technology, that it's just integrated into the current service you provide?
Matt: Yeah, I would say there's a lot of semantic technology already built in, under the hood of Google.
Richard: One of the most popular posts this year on R/WW was one called The Top 100 Alternative Search Engines. What are some of the "alternative" search engines that have most impressed you lately? Or if you can't mention names, what are some of the technologies that impress you? The February list had 32 changes and so it perhaps indicates the sheer speed of innovation in search.
Matt: You also did a really good job in another post, where you had a poll that asked what would be next [in search]. It was interesting that 209 votes were for personalized search, and after that Artificial Intelligence. I think a lot of those trends are very interesting. Having a lot of data, we are able to try things as different as visualization, all the way up to things like clustering, or query refinement. Sometimes at the bottom of our search results, if we think it's relevant, we'll take the user's query and suggest other related queries. And that's something that Google didn't launch for a while, but we wanted to test it and get the best possible result. It didn't make sense to launch it until we found a combination that we thought was very good for the user. But I do think that we watch a lot of those different technologies and try to stay aware of what people are doing in the industry and what people are trying.
Richard: SearchMash is an experimental site from Google [introduced around Oct/Nov 2006], with some new Ajax-powered UI ideas. Can we expect any of the SearchMash features to be implemented into the main google.com UI any time soon?
Matt: There is a possibility, but not a guarantee that the features you see on SearchMash will be seen on Google search. It's always a trade-off and we have to consider things like how well something might be supported by different browsers, how much users like it, and also how much screen real estate or time to ramp up on a feature it might take. For example there was an interesting feature on SearchMash where you could start typing anywhere on the page and it would start filling in the search box for you. But that wouldn't work with every single browser. I think the big value in SearchMash is that it lets us try a lot of very different user interfaces - things that might throw your average user. And we can try out those really unusual interfaces and see how people respond.
Richard: On our Alt Search Engine list, there were some search engines with amazing UIs - e.g. one had a talking avatar. So I guess you could, in future, experiment with that kind of UI on SearchMash...
Matt: Yeah, it's fun because once you step off the Google domain, you've got a lot more freedom to try different things - including bringing in image results, results from news, all sorts of fun things. So it's a fun playground to have, and I'm glad that we introduced it.
Richard: Google Base is essentially a database of structured content and home for many different verticals currently (jobs, vehicles, classified). There's also GData and the Google Base API. Can you explain how all these things fit together and what (if any) impact it will have on search going forward? I presume that structured data will become very useful for Google search over time, so perhaps you could help our readers understand that some more...
Matt: It's certainly the case that structured data is really interesting, because once you have data in different fields, you can imagine doing different types of searches over it. And GData is especially interesting, because it almost provides a way to plug data into Google. Which throws up a lot of interesting possibilities. For example, Google's had a couple of other types of searches - we've had patent search, code search, book search - and those are slightly different verticals, a little more free-form. But you could certainly imagine being able to search over new verticals; and having that fielded search, or the structured content (however you want to refer to it) can definitely be really useful as far as letting people have more flexibility. So I'm pretty excited about it, but it's always hard to say how things will go in the future and the direction things will go.
Richard: Do you have any plans for vertical search beyond blogs, I mean the major verticals... for example Microsoft bought a health search company recently. So is Google going to do anything in those major verticals?
Matt: Well, there are two answers to that. Firstly things like patent search, code search, book search - whether you want to call them vertical search is kind of up for dispute. They search over different types of data. So for example with Google Calendar, being able to search over calendar data or Gmail being able to search over email, is an entirely different and new capability. And really, really interesting. I'll let you decide whether to call that vertical search or not.
My second answer though, is
that I think it's really interesting that Google has taken a step back and looked at the
general issue of vertical search - and as a result has introduced Google Custom Search Engine (CSE). It's built on
the power of Google Co-op, and the wonderful thing about it is that it lets anybody
define their own custom search engine. And not just something feeble, we're talking about
the ability to add 5,000 URLs very easily - and not just to filter over them, but to be
able to boost for some sets of URLs, and detract or downgrade other sets of URLs.
So what's really interesting to me is if you think about a new vertical, for example podcasts, you could certainly have Google say: ‘well ok, how do we search over podcasts?’ But if you go into Google Custom Search Engine, I think there's been dozens of people who've actually made their own podcast search engines - by using the power of CSE. For example, the other day I found a search engine for 'engineering podcasts', so you could search for Google and get all the podcasts about tech talk, etc. I think that's a really interesting approach. I'd certainly say that we want to return the best results to users, so in some cases it might make sense for Google to look at individual areas. But the general issue is often well addressed by giving the power to the people, so to speak, and letting them build their own search engines. So it's really been fun to see just how many people have signed up for it, and how much growth the custom search engine area is getting.
Richard: Your particular area of expertise is fighting spam. Can you tell us the latest on how Google is trying to keep its results pure... what are some of the trends in fighting spam?
Matt: We've done a lot of stuff to return better search results for users over the last year, including on web spam. For example, we've got internal metrics that we keep track of to show that we're doing a much better job than even a couple of years ago, to make sure that a user doesn't randomly come across spam. One of the big trends last year and continuing into this year is internationalization. It's really important for us to be able to offer spam-free search in any language, whether it's French, Italian, German, Chinese or Japanese. So a lot of what my team looks at is trying to make sure that any new approach that we do, we are also able to do in a scalable and robust way across many languages. So that's probably the biggest trend.

Image courtesy of
stefan2904
Richard: With the acquisition last year of YouTube, together with Google Video, being able to search and index video is obviously a key thing going forward. Not to mention being able to insert advertising into video. What kind of things is Google doing in the area of video search?
Matt: Video by itself is a lot more interesting and challenging to search, because it's got audio and visual components that are interesting and sometimes more difficult to index than words alone. The nice thing is that by using a lot of different information, Google can often return a very good set of search results. Even more than you would expect sometimes, given how hard a type of content like video is to index.
But it's also fun because in the Web we have this notion of reputation - which is PageRank, it's how many people link to your site and it's also the quality of that incoming set of links. So it's fun to think about things like reputation in video search - whether it be for Google Video or YouTube - because you don't have links necessarily. You might have things that are somewhat similar to links, but you look at the quality of the users, the quality of the ratings. I think in lots of ways it gives Google good practice to think about the power of people, and the power of trust - and how to apply that in a lot of different areas.
Conclusion: Our thanks to Matt for the illuminating interview about Google and Next-Generation search! We would love to get peoples comments or questions on this topic, so do leave a comment below.
Update: See also Video PageRank: Google Searches for The Holy Grail
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2041
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Very clarifying, great interview!
Much of the interview confirms what I already suspected: that Google is way ahead of all competitors in the search field. I have yet to see a new search engine - vertical, semantic or other - that comes close to Google from any perspective. Google does have at least one problem, though: huge amounts of spam due to its enormous exposure. In certain fields (and with good timing), Google's inability to fight spam (or simply produce relevant results) might open some opportunities to verticals-
Be great if they could achieve an end to web spam... awful stuff.
a good interview
regards
Reading interviews like this, it's no wonder google stock keeps going up.
The personalization of search is great for local companies competing in global markets. It will be interesting to see the evolution of natural language in search. If i was google, i would be doing my best to purchase Wikipedia. They are sending them most of their traffic... so why not?
Matt Cutt says
"Historically it's always interesting to view the progress of semantic technologies. For example if you do a search like: 'how many states are in America?'. Some search engines that claim to be semantic won't do a good job in delivering the right results, whereas Google can do a very good job - even if you think, ‘ok how can they handle natural language, or how can they handle the semantics of that search.’ And I think what Google benefits from is the sheer size of the Web and the sheer amount of data, and it really does help us understand the meanings of words and synonyms"
Should we take some search puppies for a spin?
Let us look at what Goog does...
http://www.google.com/search?hl=en&q=how+many+states+are+in+America%3F
now let us look at Live
http://search.live.com/results.aspx?q=how+many+states+are+in+America%3F&form=QBRE
hmmm...interesting :)
Good Post !!
great interview
The future belongs to local search engines, the ones that will be able to know whatever happens in your community and serve the most relevant information accordingly.
Amir, you're aware of GOOG's big drop that happened just hours before you made your comment, right?
It's a nice interview - but again I'm not seeing anything new here beyond what I've been hearing for the past two years. Google is looking increasingly stagnant from my point of view as a search engine developer.
If they are truly keeping to release often, release early; then they haven't actually done much of importance for quite some time to improve search.
Phill, I'd argue that Google has been doing a lot of different things - e.g. Google Base, CSE, the recent personalization with Accounts. Those are things Matt talks about above. But you may be right that they're not integrating all of those things enough with the core search experience.
Very informative.... thanks to Matt and RWW team!.
Hello, i was inspired so much that i made some little experiment of Search2.0 check please alpha ver at www.linkshouter.com
Interesting interview
It's a pity Matt doesn't mention Spanish among the languages listed within the spam-free search.
In addition, it would be interesting to know Matt's opinion about "Search Wikia".
"?Quieres dormir con fosforo?" Why is there so little evidence of AI in Google Translate? If Google were going to produce precision search in the near future they would understans the difference between the different types of "match".
BTW - The tea shirt I would wear would say
"Quire dormir con fosforo"
Personalization means better ads which means more money. I never really thought about it, but Google Checkout is HUGE because now they also have access to credit ratings. Think about when they comingle Google Checkout with Personalized Search. They can charge 10 times what they are now, to provide leads to a very specific population that makes a certain amount of money and has bought x,y and z before. I am officially scared of them now.
Richard,
Pretty good interview, congrats, in our understanding the biggest competitive advantage of Google is its reach and massive index to play with… Aside, it becomes clear enough that all new start ups in the field are left far behind and it is obvious that whatever technology or approach those smaller guys develop or take Google will do the same, if not already done, sooner or later, even better.
Conclusion: the site that would potentially undermine Google’s dominance in finding information on web will be anything else but not Google-style search engine.
There are such sites rising upon us, but only the future will tell us if they will work things out or not.
Cheers,
Web2innovations.com
Richard,
Very informative and interesting interview. Hope you get access to more people from the BIG players in the future. Interestingly, an article in the latest BusinessWeek talks about the limitations of Local Search provided by all the major search players (Google, Yahoo, Live, Ask). The results seem to be all over the place depending on the query. I think local search is a huge underdeveloped domain with a lot of monetization potential. The problem is indexing the information properly since it may require a lot of manual data processing rather than just indexing and ranking web pages.
Nice very informative.As much localization will be put into these engines the more they will get relevant.
Interesting interview! Thanks for posting!
Interesting Interview. It reveals a lot..
Thanks Richard, you done a great job in this interview to bring up a lot about the search technology/ approaches..
Congrats..!
Will social networks and vertical search combine to challenge Google?
Publishers and advertising agencies have a very difficult challenge ahead as traditional ‚Äúhorizontal‚Ä? media like newspapers, TV channels and magazines see their traditional demographics and advertising revenue streams fragmented by the increasing preference of consumers for online access and the huge presence of Google eroding their audiences and potential future revenues.
Perhaps they should remember the words of Sun Tsu, who once said: ‚ÄúWhen the enemy is too strong to attack directly, then attack something he holds dear. Know that in all things he cannot be superior. Somewhere there is a gap in the armour, a weakness that can be attacked instead.‚Ä?
Google‚Äôs major strength - the clean search box and the ease of use, commoditised ad revenues, perhaps masks its principal weakness. As media content and advertising revenues fragment to serve thousands and thousands of ‚Äúvertical‚Ä? online communities based on lifestyle or profession, Google may suddenly seem standardised, commoditised and lacking a sense of unique community. Is Google becoming Wal-Mart, while vertical communities may prefer Harrods?
Whilst ‚Äúhorizontal‚Ä? media companies are similar to supermarkets, specialist professional ‚Äúvertical‚Ä? publishers are very specific in serving niche communities with totally relevant content and requirements. However, the publisher‚Äôs principal operating difficulty in becoming adaptive to this asymmetric Web 2.0 opportunity is that most tend to run each of their print, exhibition and online titles/businesses as separate profit and loss items on their balance sheet. As a by-product the vast majority tend not to have a centralised IT infrastructure or the human IT skill sets to manage a large scale data centre or web spidering facility - the prerequisites needed to datamine and aggregate open source, user generated and blog content to create vertical slices of the Web that are relevant for their audiences. Publishers will also need to integrate this content into the online extensions of their print brands and thereby allowing advertisers the opportunity to target high value communities. In addition, the datamining, crawling and hosting to identify relevant open source content will also need to be a continual process due to the continual growth of user generated and open source content.
Convera have two very large data centres, an extensive web spidering capability and a web index. Convera are now partnering with a significant number of specialist B2B publishers to create a range of vertical websites for specific professional communities. The first example of this is Searchmedica.com with UBM.
In building the deep vertical search portals, the key is to reach into the specific professional community in a number of ways. First, you can combined the trade publisher’s knowledge and contacts in the profession with community appeals that engage the specific audience in a way that general search cannot, and also by taking special care to use the taxonomies common to the targeted profession in organizing search results so that the user feels more at home and among peers. Building a good vertical engine can be costly and time consuming, and getting a critical mass of users to de-Google their search habits into more specialized engines is potentially a tough sell. However, in tests with focus groups from different professional communities to test these vertical search properties against Google, the results are hugely encouraging.
In building the beta test sites, the specialist publishers are providing Convera with ‚Äúwhite lists‚Ä? of data sources online and websites that would be most relevant to its readers so that the searches are restricted to reliable and trusted information. Publishers are also securing agreements with owners of key proprietary content not normally crawled by Google by leveraging some of its contacts and resources so that Convera can crawl and deliver some of their proprietary content.
Another key consideration is getting the user community engaged in the process as co-developers. No matter how bad the results at Google or Yahoo may be for a given professional segment, the interface is familiar and the destination is always at hand. Getting users to think of a specialized brand as the go-to place for business information is the challenge.
A number of publishers are actively assessing the potential of adding social networking to the mix in order to get professionals interacting with each other and adding weekly podcasts by industry experts on issues affecting the community - these additional services will create more community loyalty and also additional advertising and sponsorship opportunities.
The publishers can also use their print titles to drive the audience to the new online areas and this will also assist the transition of their high value print ad revenues to online. Publishers also have exhibitions, seminars, events and email newsletters to assist this transition - and recent research suggests that professional communities will actively attend seminars and events to meet peers and other members
of their community. The theory goes that once you get some professionals involved then the viral mechanism or behavioural ‚ÄúHive Mind‚Ä? also kicks in and professional workers start referring to the vertical portal as a community source. It is also allows advertisers and public relations organisations access to a clearly defined, affluent, influential and stable audience.
Google does not allow you to have a beer with a potential business partner - it doesn’t have that sense of community. But Google is fighting back - the recent launch of Google Custom Search and acquisition of teenage social network sites indicates they are aware of their weakness - but specialist publishers see this as a Trojan Horse. Social networks for teenagers are highly transient and target a demographic that is volatile, unpredictable and has a low level of disposable income - whereas a social network alongside a vertical search service for 22,000 bio-chemists, 55,000 UK GP’s, 55,000 insurance risk assessors or 120,000 US psychiatrists is stable, affluent and attractive for advertisers.
Local Search Formula:
On-Demand Listings + Community Reporting + Quality Assurance = Accuracy
This has got to be the most boring interview that I have ever read ! Wonder what people learnt from it that they didnt already know!!!