ReadWriteWeb

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be "the next chapter" for Google.

Last Fall we were told by Google's Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.

Google senior product manager Dylan Casey said yesterday at Sullivan's Search Marketing Expo in Santa Clara, California that the company plans to soon publish a standard way for site owners to participate in a program much like that.

How The System Might Work

PuSH is a syndication system based on the ATOM format where a publisher tells the world about a Hub that it will notify every time new content is published. Subscribers then tell the Hub "when this Publisher posts new content, please deliver it to me right away." So instead of the Subscriber checking back with the Publisher all the time to see if there's new content, they just sit and wait to be told that there is by the Hub. The Publisher publishes something, then tells the Hub that it's available, then the Hub goes and delivers it to all the Subscribers. This can take as little as a few seconds.

If Google can implement an Indexing by PuSH program, it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found. Then Google would subscribe to those PuSH feeds to discover new content when it's published.

PuSH wouldn't likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google's existing index.

As Danny Sullivan told us today, Google would have to implement some sort of spam control and not just let content be pushed live to the index unvetted. That was what happened in the earliest days of search and it was a real mess, he told us.

The Advantages of a Real Time Google Index

PuSH is much more computationally efficient for Google but Slatkin says that even more important is the impact of such a move for small publishers. Right now many small sites get visited by Google maybe once a week. With a PuSH system in place, they would be able to get their content to Google automatically right away.

A richer, faster, more efficient internet would be good for everyone, but the benefits in search wouldn't be limited to Google, either. The PubSubHubbub is an open protocol and the feeds would be as visible to Yahoo and Bing as they would be to Google.

"I am being told by my engineering bosses to openly promote this open aproach even to our competitors," Slatkin says. That's a very good sign.

We expect this will be a very big deal and we'll be covering it more extensively in the coming days, as well as whenever Google has something to announce more formally.

Don't Forget: ReadWriteWeb recently released a big research report titled The Real-Time Web and Its Future, based on 50+ interviews with key innovators, like PubSubHubbub creator Brett Slatkin. Check it out!

Above: Slatkin's deck for a presentation about Hubbub at Facebook HQ last Fall.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Nice scoop, Marshall. I talked to a startup trying to do this -as a feeder to Google and other search engines- about 8 years ago. Guess they didn't get a patent on it. And yes it absolutely would be a big win-win.

     Posted by: Roy Rodenstein Author Profile Page | March 3, 2010 5:25 PM



  2. Google picks up my content within minutes of it being published without my having to do anything. I'm not sure what this does or ghow it makes Google's index better.

    Google's going to get a boatload of spam sites being pushed at it in real-time :)

     Posted by: Tom Author Profile Page | March 3, 2010 5:25 PM



  3. This is Huge.. But they must take care of spam control

    Posted by: Chethan | March 3, 2010 5:32 PM



  4. Tom, you write one of the top blogs in the world, of course you are indexed quickly. Your local community center swimming lesson schedule on the other hand would be much more quickly searchable.

     Posted by: Marshall Kirkpatrick Author Profile Page | March 3, 2010 5:32 PM



  5. Great news. What will this all mean for SEO? Does it level the playing field and demand people start write good, relevant content that is subscribed to by many people? How will this effect rankings etc?

    Posted by: David Tensen | March 3, 2010 7:12 PM



  6. Google's going to get a boatload of spam sites being pushed at it in real-time :)

    Posted by: NHL Jerseys | March 3, 2010 8:11 PM



  7. I'm not really understanding the "boatload of spam" comments. Google deals with boatloads of spam today. I don't see how this changes their methods of determining relevancy. Even if people feed them even more crap to digest, scaling doesn't ever seem to have been an issue for Google.

     Posted by: Nico Brooks Author Profile Page | March 3, 2010 8:38 PM



  8. Actually the spam is the first thing that came to my mind.
    I'm curious to see how Google can deal with a Real Time Link Farm.

     Posted by: Danny Gauthier Author Profile Page | March 3, 2010 9:00 PM



  9. it'll be interesting to see what google comes up with..sometimes my content gets indexed within munutes but sometimes it takes days and weeks but no response..hope google works this thing out..

    Posted by: logo design service | March 3, 2010 9:06 PM



  10. Wow, this is indeed good news. I wonder if this will stop the so called "Google Dance" where during indexing you fall right off the listing for a couple days?

    Jimmy
    www.anonymous-web.es.tc

    Posted by: Jimmy Sodaberg | March 3, 2010 9:10 PM



  11. Google's going to get a boatload of spam sites being pushed at it in real-time.

    Posted by: Chrome News | March 3, 2010 11:00 PM



  12. I don't see how this is different from feedburner etc...

    google should first stop the actual spam (or the sites that tread the line between spam and not spam) we get whenever we search for a high-profile keyword before trying anything new...

    it's incredible that stuffing keywords and then getting tons of random links still works for everyone everywhere.

    Posted by: ming | March 3, 2010 11:40 PM



  13. A properly designed website already "pushes" to (more accurately: gets "pulled" by) search engines and the frequency of indexing by search engines is determined by the popularity of the website.

    This information doesn't seem too new to me.

    Posted by: scott | March 3, 2010 11:43 PM



  14. after reading the "boatload of spam" comments for 3-4 times, WHY on earth would someone SAY THE EXACT SAME THING AGAIN... ?????

    Posted by: vetinary | March 4, 2010 12:53 AM



  15. What's the point for doing this? Spam?

    Posted by: Ravi Sheth | March 4, 2010 1:42 AM




  16. Pushing unstructured content in real time can only mean the non relevant results will make it into the search results faster. To me this is another google hocus pocus distraction away from the the fact that search as it is today has hit a wall....millions of pages on unstructured data created exclusively to game the system....and now these pages of non relevant content can be pushed into the search stream in real time....

    On another front.....I think that it is hypocritical and smacks of double standards to condem the past behavior of others tech companies as evil while condoning and applauding the same behavior from Google.....Googles current tactics are nothing short of attempting to become the uber standard of the internet...What if Microsoft would have put this idea and technology out.....Would we be so full of glee ?

    Posted by: bruce wayne | March 4, 2010 2:10 AM



  17. Well the problem here is not weather Google can implement PuSH which is surely trivial for them, it's wheather the whole web can implement it. For blogs themselves there is already pinging systems that Google participate in http://en.wikipedia.org/wiki/Ping_(blogging). However, this is not generally available on non-blog type content, so you wonder how long before content management systems start implementing PuSH and if Google will contribute code to help them.

     Posted by: Richard Cunningham Author Profile Page | March 4, 2010 2:11 AM



  18. If I push new content to google...can they push a notification back to me to let me know when they have generated revenue off of the content that i have pushed them ?
    Can they let me know how much revenue they generated from my content ?

    This could work for Facebook and Twitter as well and it would lead to transparency....


    http://www.factoetum.com/factoetum/Ted_Nelson_(Technology_Icon)

    Posted by: bruce wayne | March 4, 2010 2:37 AM



  19. Interesting news, Marshall. I'm left scratching my head a bit about the benefits outside of Google. Maybe I'm not getting something. But,It sounds like the driving force for this is efficiency for Google, as opposed to benefits for publishers of content/websites.

    Blog systems already publish in this mannner, don't they? So, it seems that blogs won't really benefit from this?

    For a business publishing pages to their website, I don't see how it'll move the needle too much in terms of attracting more traffic. Since most companies' PR cycles don't work in hours, I'm not sure I see much benefit to them. On the other hand, if this gets blown out of proportion by unscrupulous SEO "experts" and web designers, I can see them exploiting small businesses by forcing them to pay for new content management system implementations in order to "get indexed faster". Our struggle with SMBs is to convince them that SEO actually isn't that complicated, and that all it takes is a committment to content creation.

    Another unrelated questions... I'm wondering whether their experiments with indexing real time sites like Twitter and Facebook are causing them to rethink how they index the web? I'm wondering whether inxexing these sites is giving their index increasing relevance and giving users greater search satisfaction? Maybe they're realizing they want to figure out how to deliver that user search satisfaction w/out relying on Twitter or Facebook?

     Posted by: pc4media Author Profile Page | March 4, 2010 4:24 AM



  20. I don't understand how this differs from Google Sitemap. Anyone?

    Posted by: Will Smith | March 4, 2010 4:41 AM



  21. After developing so many hacks to get a new site indexed fast it turns out to be useless knowledge. That's life...

    Posted by: Σχολή Χορού | March 4, 2010 4:58 AM



  22. Tough to keep up with all this stuff going on.

    Posted by: Ma Diga | March 4, 2010 6:41 AM



  23. This is very cool, nice article and looking forward to the real time search push!!

    Posted by: Medisoft | March 4, 2010 6:50 AM



  24. ... it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found.

    Note that each web site will be able to "declare" only one hub which they want to have index their content using this method. This will force web site owners to choose one search engine over all the others. No web site owner will want to have declared a not so popular search engine and so everyone will rush to jump on the same bandwagon. And since Google built the first bandwagon that is the one everyone will jump on. So, by default, everyone will "declare" Google as their hub of choice, leaving the other search engines to crawl the old fashioned way. This will give Google an advantage over the others.

    This is definitely NOT "just like [declaring] where the RSS feeds they publish can be found." A web site's RSS feeds are stored on their own web server (or a server under their control) and any user is able to "subscribe" to that RSS feed simply by copying that RSS URL into their RSS reader (manually or by clicking a link or button). The users have complete choice as to what RSS reader they use and any search engine or other site is free to "subscribe" to that RSS feed simply by downloading the RSS feed any time they want to.

    This PuSH system gives the users and all site owners less choice, not more. If a user wants to get the most recently indexed content then they are forced to use one specific search engine. No other search engines or other sites will have access to this PuSH data for any specific site. Plus, the users will be forced to figure out which sites are PuSH indexed by which search engines and search accordingly. Google knows good and well that no one will be willing to do this. Thus they know good and well that they are creating a system whose sole purpose is to cement their standing as the most popular search engine.

    Google is basically using the same tactics that Microsoft used at the beginning of the DOS wars. It is what I call the "There can be only one ... bandwagon" phenomenon. You create a situation where people are forced to choose only one "system" of any sort. Throw in compatibility issues if the users choose anything other than the most popular "system" and bingo, you have built an instant monopoly out of "user choice."

    Posted by: Grant Robertson | March 4, 2010 7:13 AM



  25. yep,this could work for Facebook and Twitter as well and it would lead to transparency

    Posted by: Christian Louboutin | March 4, 2010 7:40 AM



  26. This makes me think about Technorati's approach to indexing based on pinging. It means more that Google is doing it, but interesting how it compares. I wonder what kind of additional load that will add to Google's servers?

     Posted by: justinkistner Author Profile Page | March 4, 2010 8:05 AM



  27. I've got a small blog, maybe 100 uv/day.
    Every post I write is indexed in less than 5 minutes, thanks to my sitemap and rss, so I don't really see why it would make google indexing better.

    I think that real time indexing will quickly lead to spam:
    _ Just find a buzzing subject.
    _ Write a shitload of optimized content about it and put your ads everywhere.
    _ Get indexed quickly.
    _ Profit.

    Posted by: Dofus Astuce | March 4, 2010 8:10 AM



  28. > Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.
    This sounds ridiculous. Will Google fall for all the SPAM that would keep on sending PING updates just to fol the service?
    And what about sites that dont offer RSS/ATOM ? Google was never biased and never will be

     Posted by: taranfx Author Profile Page | March 4, 2010 8:18 AM



  29. Dış Cephe Giydirme Kaplama Fİyatları Actually the spam is the first thing that came to my mind.
    I'm curious to see how Google can deal with a Real Time Link Farm.

    Posted by: Discephegiydirme | March 4, 2010 8:28 AM



  30. This is some exciting developments. Wordpress.com just implemented it with their content.

    At the same time, I have a small time blog and Google crawls and lists it within minutes. Interesting.

    Posted by: jbrickman | March 4, 2010 8:42 AM



  31. @Grant Robertson:
    >Note that each web site will be able to "declare" only one hub which they want to have index their content using this method. This will force web site owners to choose one search engine over all the others.

    Whoa, non sequitur whiplash. A Hub is not a search engine. They are not search-engine specific. Even if a website only gets to pick one hub (which is dubious), all search engines can look at the hub and get the updates.

    Posted by: sep332 | March 4, 2010 9:28 AM



  32. #24 and other people who wonder about provider choice:

    With PubSubHubbub, content publishers may declare multiple hubs they would like to use for syndication. Anyone interested in your site's updates can easily discover these hubs and subscribe. The hubs you push to may be run by yourself or run by independent companies like Superfeedr (http://superfeedr.com). For reference, Superfeedr is the most popular way to add a hub via a third-party (they're used by Tumblr, Posterous, Gawker Media, the Huffington Post, etc).

    The whole point of PubSubHubbub is decentralization. There's absolutely no requirement for publishers to connect with any specific company's hub to take part in the system. For example, Google Reader will discover any hub declared within a feed and subscribe to it, regardless of who the hub provider is.

    And here's a list of open source hub implementations people can use:
    http://code.google.com/p/pubsubhubbub/wiki/Hubs

    Posted by: Brett Slatkin | March 4, 2010 9:39 AM



  33. It is a great news for all the webmasters...

    Thanks for the info.

    Posted by: Nihar | March 4, 2010 11:31 AM



  34. This is very cool article and looking forward to the real time search from Google..The new search function from Masterseek.com allows users already to find all products and companies in the world in real time. For free!!. Imagine having infinite opportunities to find companies across the world.

    Posted by: Masterseek | March 4, 2010 11:53 AM



  35. Google seems to top all other search engines. What's better is that they're using their technology to advance and expand.
    Good for them!

    Posted by: MisplacedComedy | March 4, 2010 11:58 AM



  36. There is definitely potential for spam, but I think google will use factors like domain authority for deciding whether to trust the submissions or not.

    Posted by: techinterview | March 4, 2010 12:26 PM



  37. I want to like this but I confess I'm with those who don't "get" it.

    I mean if you can create your own hub, which you can, why wouldn't everyone do that, in which case you would end up with what you have today - a lot of places Google has to go to sniff out new content.

    Even given a limited number of hubs or assuming that Google/Yahoo/Bing each run their own hubs that you can publish to, won't they still have to sort the wheat from the chaff after crawling their own hubs, just as they do today after crawling the web?

    So, and I admit I don't "get" it, at the cost of adding another layer of complexity, not to mention a new protocol no less, what problem does this actually solve and how does it solve it without creating new problems?

    Posted by: Outtanames999 | March 4, 2010 12:33 PM



  38. I think the spam thing is a non-issue. The fact that content is updated does not imply it's ranking, relevance, or popularity have changed. This is just a way for a content publisher to notify the search engines (and any interested third party) that the content on a page has updated and that they should update their index. If have a website and edit a page 100 times in a day, your page isn't going to be on page 1 of a search result just because you updated it 100 times--it's only going to get a good ranking if users are viewing it. This is going to be a big feature for CMS and blog engines.

    Previously you had to wait for spiders to crawl around the web to find changes on your site. Pages are crawled over again and again just to see if anything has changed. It's a pretty inefficient process. Now the spiders are going to be fat and lazy because you are going to deliver your changes directly to them.

    Posted by: Scott Holodak | March 4, 2010 1:49 PM



  39. There's some talk about how Google would require Atom and ignore RSS if it offered this functionality using PubSubHubbub (PuSH).

    This isn't the case. PuSH is feed format-agnostic and supports both RSS and Atom. I go into more detail on my blog:

    http://workbench.cadenhead.org/news/3592/google-isnt-trying-screw-rss

     Posted by: Rogers Author Profile Page | March 4, 2010 1:55 PM



  40. How this developer and its impact on websites and SEO is certainly something to follow closely. A real time web without the use of services like twitter and social news sites is a great move imho. I see it more as an SEO benefit for websites responding to current and changing new over a real service to site readers who check a site when their ready for the latest news anyway.

    Posted by: scotchegg | March 4, 2010 3:04 PM



  41. @sep332 & @Brett Slatkin:

    Thank you for enlightening me. I guess I went off half-cocked. Ever since Eric Schmidt poo-poo'ed privacy I have been very concerned about Google turning "evil." Having watched Microsoft's maneuverings ever since they still had the dash in their name, I am also very sensitive to monopolistic tactics. Oh well, I will still remain ever vigilant. ;^)

    Posted by: Grant Robertson | March 4, 2010 5:24 PM



  42. How would google watch with all of the spams?I just wondered..

    Posted by: john | March 5, 2010 6:15 AM



  43. This all sounds fantastic for the small guy as I have just created my first ever website which is for a reunion. The site will only be live for a short period as the date is May15th 2010 for the event and yet, it will probably take Google till then before my site is indexed and hence the peple I am trying to reach will never find it. You all sound like you know what you are talking about and to be honest it's all a bit technical for me, but it was informative and it will make life a little easier for us small guys.

    Posted by: Sharon Kavanagh | March 5, 2010 8:43 AM



  44. @Will Smith > I don't see any difference. It's slow right now but the principle is the same. They just need to ramp the speed up on that sucker.

    Posted by: Ryan Tate | March 6, 2010 8:47 AM



  45. I kinda like the idea, it would make a lot of webmasters happy. But on the other hand, it's going to attract a lot of spammers which is going to reduce that chances of this being implemented by the search engines.

    Regards,

    Posted by: MLM Lead System Pro | March 6, 2010 5:34 PM



  46. I love this idea,but they should show this option more frequently when we search in google like in a separate column than choosing it from options,still a good feature.

    Posted by: vicky | March 6, 2010 7:12 PM



  47. i hope this feature will bring a good result for my blog, since my new article need a day to get indexed

    Posted by: nayantaka | March 8, 2010 1:08 AM



  48. I’d be very careful with Buzz. If Google also controls social media, navigators, search engines and advertisements, will the Internet be really free?.

    Also, Google accepts censorship in some countries, they penalize their rivals in searches, etc. This is not a free WWW.

    http://managersmagazine.com/index.php/2010/03/los-pecados-capitales-de-google/

    Posted by: Alberto122 | March 9, 2010 11:25 AM



  49. Great update and well presented. I made this one of my three featured links on my Design Thought for the Day blog:
    http://designthoughtfortheday.blogspot.com/2010/03/03-10-blog-needs-google-index-tutorial.html

    All the best, Ted

    Posted by: Ted Rex | March 10, 2010 7:24 AM



  50. I can see a very good application for this technology and mode of realtime search.

    My company, Realtime Transcription, Inc, provides streaming realtime transcription of live events, as they are happening. We can stream instant text of any speech to any website. Let's say you have a tech conference and we are streaming the presentations to the association's website. As soon as the text starts hitting the site, it starts showing up in the search results. This will drive traffic to the site as the event is still streaming live.

    A perfect application for this would be product announcements. Like when Apple announced the iPad, for example, I went to the internet right away looking for a transcript of the event. You know what I found? A "transcript" made up of tweets that were put out by people sitting in the audience. They were summarizing what was being said, as it was being said.

    We can stream the verbatim text live, and if someone wants to pick up a section of that stream and tweet it out, isn't that more effective and accurate?

    We then take the entire transcript and sync it to the video or the audio of the presentations, and it's all fully searchable, captioned for accessibility or translation, and can be played from any spot.

    Check out our website and let me know if you have any questions on how to bring this real-time SEO solution to your business or your clients.

    Tanya English, Technology Director
    Realtime Transcription, Inc.
    "Capturing Business at the Speed of Sound"
    tenglish@realtimetranscription.com

     Posted by: Tanya Author Profile Page | March 10, 2010 1:47 PM



  51. 1 2 Next

Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel





TEXT LINK ADS



RWW PARTNERS