ReadWriteWeb

The Real Time Google Index: Will It Be a Game Changer? (Open Thread)

Written by Marshall Kirkpatrick / March 5, 2010 10:49 AM / 21 Comments

Google is developing a system to ingest real-time content updates from any page on the web automatically, using the open PubSubHubbub Atom protocol, we reported on Wednesday.

Google already indexes a whole lot of content very quickly, will a real-time indexing system make a big difference? There are differences of opinion on the matter and we'd like to know what you think. Search analyst Danny Sullivan told us on Wednesday that he thought it could be "the next chapter" for Google. John Battelle said this morning: "In short, it's a new way for Google to get (more) real time signals. But honestly, not a huge deal. I don't think. Correct me if I'm wrong..." What do you think, readers?


We explained the specifics of how the Hubbub system might work in our earlier coverage so let's talk now about possible impacts (or lack thereof).

As we wrote on Wednesday:

PuSH is much more computationally efficient for Google but [Google's Brett] Slatkin says that even more important is the impact of such a move for small publishers. Right now many small sites get visited by Google maybe once a week. With a PuSH system in place, they would be able to get their content to Google automatically right away.

A richer, faster, more efficient internet would be good for everyone, but the benefits in search wouldn't be limited to Google, either. The PubSubHubbub is an open protocol and the feeds would be as visible to Yahoo and Bing as they would be to Google.

Readers Who Think This is Big

Sharon Kavanagh says:

This all sounds fantastic for the small guy as I have just created my first ever website which is for a reunion. The site will only be live for a short period as the date is May15th 2010 for the event and yet, it will probably take Google till then before my site is indexed and hence the peple I am trying to reach will never find it.

Scott Holodak says:

Previously you had to wait for spiders to crawl around the web to find changes on your site. Pages are crawled over again and again just to see if anything has changed. It's a pretty inefficient process. Now the spiders are going to be fat and lazy because you are going to deliver your changes directly to them.


No Big Deal

Reader comments arguing this is not a big deal.

"Scott" says:

A properly designed website already "pushes" to (more accurately: gets "pulled" by) search engines and the frequency of indexing by search engines is determined by the popularity of the website.

This information doesn't seem too new to me.

Bruce Wayne says:

Pushing unstructured content in real time can only mean the non relevant results will make it into the search results faster. To me this is another google hocus pocus distraction away from the the fact that search as it is today has hit a wall....millions of pages on unstructured data created exclusively to game the system....and now these pages of non relevant content can be pushed into the search stream in real time....

What Do You Think?

I think there is something fundamentally different about a web that Google's index subscribes to in real time vs. a web that Google has to plow through with a spider looking for new content. I'm still wrapping my head around it, but there's something about the PuSH method that feels like it would make the Google index a living, breathing phenomenon.

What do you think?


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. I think this changes everything -- certainly in the world where search optimization meets social media. Traditionally only sites with frequent updates would get crawled frequently and this was a way to garner better search results, which is why SEO folks are always advocating for blog content (among other things).

    If sites that don't have lots of frequently updated content (I'm thinking small business sites) can now get crawled/indexed quickly, I think that's a radical change. I'm actually stunned that it's even possible. But it is the Google, after all.

    @CarriBugbee

     Posted by: Carri Author Profile Page | March 5, 2010 11:03 AM



  2. Any time new content is occupying space on a Google SERP it is a BIG deal.

     Posted by: Jaan Author Profile Page | March 5, 2010 11:12 AM



  3. Being in Google's index faster is nice. But being in the index doesn't mean you will rank.

    If you understand that being indexed and ranking well are not synonimous then it's a good idea for those who publish quallity content often. But then, chances are that if you publish quality content often, and have an RSS feed, you are already being indexed in the matter of minutes.

     Posted by: Louis Durocher Author Profile Page | March 5, 2010 11:13 AM



  4. Huge Deal!

    Realtime is the "context". When I look for earthquake now, I hope I won't see the same results that last week, when the Chilean one happend.

    There is nothing worse for Google than people going to Twitter search to learn about an earthquake that just happend, because for once, Google is completely out of the loop.

    Combine realtime + geolocation and you have a much much smarter search engine that anticipates what I'm really looking for based on what's currently happening where I am.

     Posted by: superfeedr Author Profile Page | March 5, 2010 11:17 AM



  5. Clearly it's important and IMO what PubSubHubbub was always intended to be used for.

    With the mass adoption of PuSH, Google and other search providers will now going to be hand delivered new content while it's still hot from the oven. Think about what that means for Google's bottom line. How much infrastructure must they have devoted to crawling the web? How much of it would go away if the web as a whole adopted PuSH?

     Posted by: Darren Bounds Author Profile Page | March 5, 2010 11:28 AM



  6. It's gamechanging in many senses. But in the sense that it levels the playing field for the small, undiscovered publisher, it's especially gamechanging because it begins to unravel the lock and democratize space. Focus will shift to the swarm in real-time from unrelated sources, which is much more difficult to game, if it can be gamed at all.

     Posted by: Karoli Author Profile Page | March 5, 2010 11:34 AM



  7. The potential is exciting for a small businesses to get indexed quicker. It does pose a few questions. Will real time indexing effect how results are served up? Will time of day factor into when you should upload your content?

    I don't think that this removes the basic fundamentals of SEO. The impact of getting SEO right before publishing will increase. Perhaps more inhouse SEO will be needed. Very interesting.

    Posted by: Derek Hanson | March 5, 2010 11:39 AM



  8. It's a win-win for webmasters -- their most frequently-updated content hits instantly, there'd be less of a need for copious robots.txt rules, and overall server performance improves when Google uses my feed rather than crawling my site repeatedly throughout the day. The impact on Internet traffic will be interesting to watch...

     Posted by: Jared Smith Author Profile Page | March 5, 2010 11:48 AM



  9. Going up one level, to the impact of all the major data sets updating in real time ( with PuSH ), beyond just search engines to include personal financial services, government, academia, media, health and of course, wacky social networks - all delivered to mobile devices with one's physical location applied as an attribute...

    Ummm yeah, that's a "game changer" to say the least.

    Didn't you people ever read the sprawl trilogy?

    http://en.wikipedia.org/wiki/Sprawl_trilogy

    We're there (!), or just a few months away, Gibson's "fiction" is now fact.

    "game changer", sheesh, try cultural revolution.

    Posted by: Todd | March 5, 2010 12:46 PM



  10. It's not that big of a deal. It's cool but quality content is what I want not faster crap content. Google has always been fairly good with getting the quality to the top, and if they get it there faster? Then that's just gravy.

    Posted by: Erik Bigelow | March 5, 2010 2:07 PM



  11. Well ... lots of things to think about, but two comments:

    1. There are two sets of "users" for search - people who are looking for stuff, *not* necessarily with intent to purchase, and people who have stuff for sale and want to be found. Right now, *some* of those who have stuff for sale and want to be found are subsidizing search, and some of them are attempting to spam their way to the top, paying as little as possible. As an aside, IMHO a tiny tax/cross-border tariff for sending an email would put the spammers out of business without seriously impacting legitimate digital marketing. Legitimate businesses are paying that tax now, they just aren't getting to choose who they pay it to. ;-)

    2. It's always going to be a numbers game. There are some "laws of nature" at work, probabilities, humans not being either purely rational or purely emotional, bubbles and crashes, black swans, etc. I'd be very surprised if it "democratized" *anything* or did anything to reverse the concentration of power on the web. Right now, the "power" lies in Google. If you're looking for democratization, think Wikipedia and Twitter. And that's Twitter as it is today, not necessarily Twitter as it will be after "Chirp". ;-)

     Posted by: Ed Borasky Author Profile Page | March 5, 2010 4:24 PM



  12. I don't expect google index is fundamentally changed after adding real time and social aspects. As I understand, google perfectly understands the distribution of search intents in real world and they always prioritize the features of google search/index to capture the largest market share possible. That's really smart! I agree with John Battelle: no big deal.

    But, I do think search for real-time and social info will bring monitoring, a cousin of search engine, to the mainstream.

    Posted by: AJ Chen | March 5, 2010 7:12 PM



  13. PuSH will help conserve server resources, but it won't change anything but the public's perception of "real-time" search results.

    Few people realize this, but Google crawls websites in real-time already. Your site doesn't need to be large to get crawled fast. It just needs to be updated several times per day. The more updates, the faster it indexes your content.

    I run http://fantasysp.com and Google will crawl and index any story from my site within 3 minutes of it being posted. How do I know this? Because Googlebot crawls my site an astonishing 1,000,000+ times per month.

     Posted by: Brant Tedeschi Author Profile Page | March 5, 2010 8:45 PM



  14. With web 2.0 we have already reached a point where information clutter is doubling every 11 hrs. Time has come when we need to move to a paradigm of cutting clutter and not increasing or duplicating it.
    Today, we need to provide new ways of expression and not just organization/indexing of existing content.

    Google to my knowledge must look at ways of making things further simpler and in the context of today's real time world. Getting real time feed is simpler, filtering and context is the new pain.


    Posted by: Sumeet | March 5, 2010 11:19 PM



  15. For me, getting indexed more frequently is not the problem. When I look at my server logs, I can see Google (and a few others) on there several times a day.

    As others have pointed out, that will not get you to the top of search results. SEO helps, but a lot of it is popularity and, also, how much you are willing to pay for clicks. If you are small and new, then you are hosed, as pointed out here: http://www.squidoo.com/google_good_or_evil.

    I think it is a pity, and I wish that Google would acknowledge that there is a problem, and do more to help otherwise good sites get noticed.

    Posted by: brian | March 6, 2010 4:11 PM



  16. I don't know the ins and outs of SEO but surely this will mean that clients can get results on Google search a fair bit quicker?

    Also, as mentioned above, it's good for small sites which get indexed less often.

    Case in point, I took down my website for a day or two and uploaded a new version, now search engine rank has dropped and I'll probably have to wait a week to get back to where I was.

    Posted by: Jamie Englert | March 7, 2010 5:10 AM



  17. Too late! bing was there much earlier and Bing's real time search MUCH better!

    Posted by: link building services | March 7, 2010 11:01 PM



  18. Good for small websites, yes.
    Cluttering and facing a new spam problem, yes (spam should be the rule 34-2 of the internet).

    As Sumeet wrote, the main problem is now the noise, and the clutter of information.
    Already, big G has implemented a starring system for search results, but I don't know if it's gonna be effective (see "promote" button).

    I wouldn't be surprised if during this year we saw a return of directories, in a form or another, to counter that.

     Posted by: Danny Gauthier Author Profile Page | March 8, 2010 12:32 AM



  19. In other words, if Google thinks something has some real-time component to it, then it will show the section. In particular, if Google sees a spike in information on a certain topic, along with queries on a particular topic, then it assumes there’s a real time situation happening — very simplified!

    Posted by: dsi xl | March 8, 2010 5:39 AM



  20. It will change everything because it will help the small blogs that write correct, for users, with relevant information on a certain topic. The big news sites, or big websites in general, will take a big bang in the head because they tend to be sloppy when it comes to writing about topics where is no competition :)

    Posted by: Mihai | March 10, 2010 2:56 AM



  21. It is about how Google filters out the unnecessary information pushed into the real stream. They should however be good enough at that as at any rate they do that in the "not so real-time" search results.

    Posted by: Arkid Mitra | March 14, 2010 10:35 PM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel



TEXT LINK ADS



RWW PARTNERS