Google today announced that it is now indexing the amazing amount of 1 trillion unique URLs. Google's first index in 1998 only had 26 million pages and by 2000 that number had jumped to 1 billion. Today, the Google index is growing by several billion pages per day alone. Not too long ago, Google used to have a counter on the front page of its search engine, displaying the number of sites in the index, but they dropped this information from the site around 2005.
When there was still real competition between search engines in the late 90s, the size of a search engine's index was one of the main methods of comparing search providers. Today, the number of pages in any given search engine's index has dropped out of our collective conscience - and that might be a good thing, as the focus has shifted towards returning relevant search results over the ability to return the most results.
That, after all, was the real advance that Google brought to the search engine market. Early search engines like Altavista, Excite, or HotBot simply weren't able to return the most useful results to users.
However, we are also getting close to reaching a new crossroad again, where even Google's results are often polluted by spam. Yet, at the same time, the great promise of semantic search engines is still just a promise for now. Given the latest data about the search engine market and the end of the Microsoft/Yahoo negotiations over acquiring the Yahoo search business, Google is pretty much becoming the de-facto standard search engine for most people.
Chances are that anybody who wants to enter this market and compete with Google is simply going to be bought by the search giant, so if anything, Google's strong position is going to get even stronger in the foreseeable future.
For now, though, the real question about Google's search index is when it will reach a size of 1 googol...
Photo by Flickr user Mykl Roventine.
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/4532
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Wow..that's phenomenal! Google = Headed for world domination.
Posted by: Nick Stamoulis | July 25, 2008 5:16 PM
I wonder how many of these pages are fakes/spam.
Posted by: Yasser | July 25, 2008 6:46 PM
Google's index is in the tens of billions of pages, nowhere near a trillion. They only announced that they've identified a trillion unique URLs, they only index a tiny fraction of that. There's rumors that this announcement was purposely put out ahead of some major news coming next week from one of their competitors...
Posted by: Dan Grossman | July 25, 2008 7:06 PM
I have about 120.000 pages on site A. Yahoo has indexed about 90%. Google has indexed about 50%. MSN Live has indexed just 59 pages!
I'm using Google Sitemaps, Yahoo Site Explorer and Live Webmaster Central, all sites are authenticated and XML sitemaps are in place, but MSN is seriously lacking in discovering all pages.
I really think that they would love to add more pages, but their infrastructure can't handle it. I think MSN could use some Hadoop?
Posted by: Tinus Guichelaar | July 25, 2008 11:19 PM
I agree with Dan - smacks of sensationalism and spoiling tactics. If it turns out to be the case I for one will be disappointed with Google - they are so dominant now as to not have to behave like a bully-corporation. They'll get much more cred' positioning themselves 'for the people' as opposed to 'competing' like the likes of Coke and Pepsi.
Posted by: Website Designer Perth | July 26, 2008 12:56 AM
one's most cherished insights .. mere data
lot of info, and no wisdom ... google will never replace awareness
Posted by: gregorylent | July 26, 2008 1:30 AM
Come on guys!
This was a velvet article.
Arrington was a lot more teasing!
Is search market changing in one week?
I think not but everybody can realize a lot things are running in the background.
Posted by: panos | July 26, 2008 9:43 AM
I wonder how many of these pages are fakes/spam
Posted by: promosyon şapka | July 26, 2008 11:54 AM
Innovation in search is stifled by the huge costs of crawling the web.
I would like to see a company that opens up their crawling infrastructure, allowing developers to write small modules that will run on every page the crawler passes, and add the generated data to the main index.
A developer will be able to define on what kind of pages his module will run, and in what processing rates, and will be charged according to the module's cpu consumption.
There should also be a choice whether the new data is public or private, and choosing public will give a significant discount to the developer.
This way it will be possible for small companies (and even individuals) to run significant scale experiments in indexing, and to create search engines that integrate all the innovative ideas on how data should be extracted from the web
Posted by: Amit Aviv | July 27, 2008 5:15 AM
And here's the news they were trying to wash out. Cuil launched today with an index of over 120 billion webpages, which is about 3 times larger than Google's.
Posted by: Dan Grossman | July 27, 2008 10:44 PM
Google is a habit now. It is one of those habit that is hard to give up as well. Can it be done? Lets just Google it and find out.
Posted by: Harish Agrawal | July 28, 2008 3:27 AM
Cool that's phenomenal
Posted by: free movies | July 28, 2008 9:25 AM
If Cuil's advantage over Google is in the size of their index, then they are in big trouble.
Posted by: Luis Pereira | July 28, 2008 4:33 PM