Thanks to a link from Robert Scoble (who is always looking out for his readers, bless him), my issue with Technorati not indexing my blog came to the notice of Technorati Chief Dave Sifry. To his credit, Dave immediately jumped onto the case and Kevin Marks solved it today. I noticed others have been having similar issues with Technorati, so I'll briefly summarise what the issue was and how it was fixed.
The problem was that outbound links from my blog were not being indexed by Technorati. Why was this an issue? Because Technorati tracks conversations and conversations are the lifeblood of blogs - e.g. I link to you, you notice my link via Technorati and respond on your blog, I notice your response via Technorati and respond in kind, ad infinitum (or is that ad nauseum!).
According to Kevin Marks, Technorati was indexing my weblog via my homepage and not through my RSS feed. But my homepage has excerpts and not full entries! So therefore Technorati was not picking up the outbound links in my entries. Kevin explained in an email that Technorati gave my homepage priority over my RSS feed "as usually the rss feed is an abbreviated form of the homepage."
Technorati now realise that they cannot assume a blog homepage will have the full text of a blog's entries, so they have corrected their parser. In fact, as I pointed out to them in a follow-up email, many of the Web Design blog community have only excerpts on their homepages (that's where I got the idea from). And a number of those folks also have excerpted RSS feeds, so Technorati may still need to adjust their code to account for people who have excerpted homepages and RSS feeds.
Anyway now I am happy, as my outbound links are back to being indexed by Technorati. My blog lives and breathes in the Sphere again! ;-)
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2403
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
And a number of those folks also have excerpted RSS feeds, so Technorati may still need to adjust their code to account for people who have excerpted homepages and RSS feeds.
I would add this is asking quite a bit of the service crawling you.
Why? Note that this case would require Technorati to follow all the "Continued" links. So Technorati would have to figure out somehow that this link represents the permalink to the full post. Sure, they can hard code the "Continued" case, but then someone else will label the link "Read on", and another guy will label the link in Swedish, and so on. So the problem is not easy. OK, you might think, but the post has link internal to the site, so that's the permalink. Well, no, because sometimes a blogger will link to earlier posts on his site. In fact, in odd cases he might even link to an earlier post with the phrase "Continued..."!
So my advice: put full text on your page or in your feed.
[Rant induced from writing crawlers myself.]
Posted by: Gabe | December 14, 2004 9:29 PM
Ugh, my first paragraph was supposed to be in italics, since it's quoting you.
Posted by: Gabe | December 14, 2004 9:31 PM
Gabe, thanks for the detailed information. I myself have full-content RSS feeds, so Technorati should index me OK now. But as I mentioned, many web designers in particular don't have full content in either their RSS or homepage. Perhaps that's their problem, but like it or not it *is* also an issue for Technorati and other indexing services who aspire to index all of the blogosphere.
I'm not a technical expert on these matters, but because I auto-ping Technorati with each entry I create - why do they even need to follow links from my homepage or RSS feed? Isn't each ping telling them I have a new entry and so they should index that? Perhaps that is a naive question on my part, but if someone can educate me I'd be grateful.
Posted by: Richard MacManus | December 15, 2004 7:02 AM
I believe they need to start at your home page because this is the url usually passed in the ping. Consider the urls at the original ping site: http://www.weblogs.com/ They're all home pages. (Hmmm...I didn't think of the case where the post url is passed in the ping. Is this common? Do you do that?)
Your point is valid about it being a legitmate issue. Ideally, Technorati et al could tune their code to handle say, 95% of the excerpted home page/excerpted RSS cases. But it might not be their top priorty.
Posted by: Gabe | December 15, 2004 7:31 AM
Ah right, of course - the ping is just telling Technorati that my weblog has updated and so they should index it. The ping isn't giving them the post URL to index. My bad.
Perhaps future generations of blog indexing services will index the post URL itself rather than the homepage or RSS feed. For that to happen, there probably does need to be a simple way for users to send the post URL in each ping.
Posted by: Richard MacManus | December 15, 2004 8:17 AM