<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2008:/1/tag:72.47.210.69,2004://1.4320-</id>
  <updated>2008-08-22T19:09:49Z</updated>
  <title>Comments for Technorati Issue Solved</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.1</generator>
  <entry>
    <id>tag:72.47.210.69,2004://1.4320</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=4320" title="Technorati Issue Solved" />
    <published>2004-12-15T04:13:02Z</published>
    <updated>2007-12-16T23:15:37Z</updated>
    <title>Technorati Issue Solved</title>
    <summary>I summarise why Technorati was not indexing my outbound links and how they fixed it. This issue may be affecting others, particularly in the Web Design community...</summary>
    <author>
      <name>Richard MacManus</name>
      <uri>http://www.readwriteweb.com</uri>
    </author>
    
    <category term="Blogging" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p>Thanks to a link from <a href="http://radio.weblogs.com/0001011/2004/12/13.html#a8878">Robert Scoble</a> (who is always looking out for his readers, bless him), <a href="http://www.readwriteweb.com/archives/002553.php">my issue with Technorati</a> not indexing my blog came to the notice of <a href="http://www.technorati.com">Technorati</a> Chief <a href="http://www.sifry.com/alerts/">Dave Sifry</a>. To his credit, Dave immediately jumped onto the case and Kevin Marks solved it today. I noticed <a href="http://www.kbcafe.com/rss/?guid=20041213140112">others</a> have been having similar issues with Technorati, so I'll briefly summarise what the issue was and how it was fixed.</p>
<p>The problem was that outbound links from my blog were not being indexed by Technorati. Why was this an issue? Because Technorati tracks conversations and conversations are the lifeblood of blogs - e.g. I link to you, you notice my link via Technorati and respond on your blog, I notice your response via Technorati and respond in kind, ad infinitum (or is that <i>ad nauseum</i>!).</p>
<h2>The Nub of the Problem</h2>
<p>According to Kevin Marks, Technorati was indexing my weblog <b>via my homepage</b> and not through my RSS feed. But my homepage has <b>excerpts</b> and not full entries! So therefore Technorati was not picking up the outbound links in my entries. Kevin explained in an email that Technorati gave my homepage priority over my RSS feed "as usually the rss feed is an abbreviated form of the homepage."</p> 
<h2>The Solution</h2>
<p>Technorati now realise that they cannot assume a blog homepage will have the full text of a blog's entries, so they have corrected their parser. In fact, as I pointed out to them in a follow-up email, many of the Web Design blog community have only excerpts on their homepages (that's where I got the idea from). And a number of those folks also have excerpted <i>RSS feeds</i>, so Technorati may still need to adjust their code to account for people who have excerpted homepages <i>and</i> RSS feeds.</p>
<h2>All's well that ends well</h2>
<p>Anyway now I am happy, as my outbound links are back to being indexed by Technorati. My blog lives and breathes in the Sphere again! ;-)</p>]]>
      
    </content>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2004://1.4320-comment:35654</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2004://1.4320" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php#c35654" />
    <title>Comment from Gabe on 2004-12-14</title>
    <author>
        <name>Gabe</name>
        <uri>http://www.memeorandum.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.memeorandum.com/">
        <![CDATA[<p>And a number of those folks also have excerpted RSS feeds, so Technorati may still need to adjust their code to account for people who have excerpted homepages and RSS feeds.</p>

<p>I would add this is asking quite a bit of the service crawling you.</p>

<p>Why?  Note that this case would require Technorati to follow all the "Continued" links.  So Technorati would have to figure out somehow that this link represents the permalink to the full post.  Sure, they can hard code the "Continued" case, but then someone else will label the link "Read on", and another guy will label the link in Swedish, and so on.  So the problem is not easy.  OK, you might think, but the post has link internal to the site, so that's the permalink.  Well, no, because sometimes a blogger will link to earlier posts on his site.  In fact, in odd cases he might even link to an earlier post with the phrase "Continued..."!</p>

<p>So my advice:  put full text on your page or in your feed.</p>

<p>[Rant induced from writing crawlers myself.]</p>]]>
    </content>
    <published>2004-12-15T05:29:46Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2004://1.4320-comment:35655</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2004://1.4320" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php#c35655" />
    <title>Comment from Gabe on 2004-12-14</title>
    <author>
        <name>Gabe</name>
        <uri>http://www.memeorandum.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.memeorandum.com/">
        <![CDATA[<p>Ugh, my first paragraph was supposed to be in  italics, since it's quoting you.</p>]]>
    </content>
    <published>2004-12-15T05:31:34Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2004://1.4320-comment:35656</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2004://1.4320" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php#c35656" />
    <title>Comment from Richard MacManus on 2004-12-15</title>
    <author>
        <name>Richard MacManus</name>
        <uri>http://www.readwriteweb.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com">
        <![CDATA[<p>Gabe, thanks for the detailed information. I myself have full-content RSS feeds, so Technorati should index me OK now. But as I mentioned, many web designers in particular don't have full content in either their RSS or homepage. Perhaps that's their problem, but like it or not it *is* also an issue for Technorati and other indexing services who aspire to index all of the blogosphere.</p>

<p>I'm not a technical expert on these matters, but because I auto-ping Technorati with each entry I create - why do they even need to follow links from my homepage or RSS feed? Isn't each ping telling them I have a new entry and so they should index that? Perhaps that is a naive question on my part, but if someone can educate me I'd be grateful.</p>]]>
    </content>
    <published>2004-12-15T15:02:50Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2004://1.4320-comment:35657</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2004://1.4320" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php#c35657" />
    <title>Comment from Gabe on 2004-12-15</title>
    <author>
        <name>Gabe</name>
        <uri>http://www.memeorandum.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.memeorandum.com/">
        <![CDATA[<p>I believe they need to start at your home page because this is the url usually passed in the ping.  Consider the urls at the original ping site:  <a href="http://www.weblogs.com/" rel="nofollow"><a href="http://www.weblogs.com/" rel="nofollow">http://www.weblogs.com/</a></a>  They're all home pages.  (Hmmm...I didn't think of the case where the post url is passed in the ping.  Is this common?  Do you do that?)</p>

<p>Your point is valid about it being a legitmate issue.  Ideally, Technorati et al could tune their code to handle say, 95% of the excerpted home page/excerpted RSS cases.  But it might not be their top priorty.</p>]]>
    </content>
    <published>2004-12-15T15:31:22Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2004://1.4320-comment:35658</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2004://1.4320" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/technorati_issu.php#c35658" />
    <title>Comment from Richard MacManus on 2004-12-15</title>
    <author>
        <name>Richard MacManus</name>
        <uri>http://www.readwriteweb.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com">
        <![CDATA[<p>Ah right, of course - the ping is just telling Technorati that my weblog has updated and so they should index it. The ping isn't giving them the post URL to index. My bad.</p>

<p>Perhaps future generations of blog indexing services will index the post URL itself rather than the homepage or RSS feed. For that to happen, there probably does need to be a simple way for users to send the post URL in each ping.</p>]]>
    </content>
    <published>2004-12-15T16:17:36Z</published>
  </entry>

</feed>