<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:www.readwriteweb.com,2011:/1/tag:72.47.210.69,2007://1.2698-</id>
  <updated>2011-04-29T12:22:02Z</updated>
  <title>Comments for Why Aren&apos;t Alt Search Engines Crawling Websites?</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.35-en</generator>
  <entry>
    <id>tag:72.47.210.69,2007://1.2698</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=2698" title="Why Aren't Alt Search Engines Crawling Websites?" />
    <published>2007-08-06T19:44:26Z</published>
    <updated>2007-12-16T23:07:47Z</updated>
    <title>Why Aren&apos;t Alt Search Engines Crawling Websites?</title>
    <summary>Based on log file evidence from a friend who runs a personal website, Rich Skrenta claims that only 11 search startups are actually crawling the web. He wonders where all the alt search engines are? For some reason, Rich doesn&apos;t link to Charles Knight&apos;s Top 100 Alt Search Engine List in asking that question, but...</summary>
    <author>
      <name>Richard MacManus</name>
      <uri>http://www.readwriteweb.com</uri>
    </author>
    
    <category term="Search" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img src="http://farm1.static.flickr.com/25/102086333_3a2c4a3a61_m.jpg" align="right" hspace="5" vspace="5" />Based on log file evidence from a  friend who runs a personal website, <a href="http://www.skrenta.com/2007/08/the_11_startups_actually_crawl.html">Rich Skrenta claims</a> that only 11 search startups are actually crawling the web. He wonders where all the alt search engines are? For some reason, Rich doesn't link to Charles Knight's <a href="http://www.readwriteweb.com/archives/top_100_alternative_search_engines_feb07.php">Top 100 Alt Search Engine List</a> in asking that question, but to <a href="http://dondodge.typepad.com/the_next_big_thing/2007/02/the_top_100_web.html">Don Dodge's post</a> linking to us. Nevertheless, this brings up some interesting questions: why are only a few of the hundreds of <a href="http://altsearchengines.com/">alternative search engines</a> crawling? Are many of them  using a licensed index? Are many of them using <em>alternative</em> ways to get their data? </p>]]>
      <![CDATA[<p>AltSearchEngines editor Charles Knight has asked his many contacts for more information on this, so we will report back soon on the results. Meanwhile Yakov from alt search engine <a href="http://www.quintura.com/">Quintura</a> (a sponsor of AltSearchEngines.com) says in a comment on Skrenta's post that &quot;having its own index is a necessity for search startup&quot;. In another comment, <a href="http://www.tailrank.com">Tailrank</a>'s Kevin Burton points out that some alt search engines have a limited scope: &quot;Well with Spinn3r we only crawl blog content so we shouldn't show up on a historical site. I wonder if other crawlers/startups have similar limitations.&quot; Also  Rafael Cosentino says that his service <a href="http://www.congoo.com/ ">Congoo</a> uses feeds to gather content, so they don't need to crawl websites. <a href="http://www.faroo.com/">FAROO</a> uses a special kind of distributed crawler, which is crawling &quot;below the radar&quot;.</p>
<p>Rich Skrenta clarifies in a comment that he's talking about &quot;web scale&quot; search engines, not niche ones. Even so, it is indeed strange that only 11 crawlers showed up in his friend's website logs.</p>
<p>Do R/WW readers have any more information about this?</p>
<p>Image: <a href="http://www.flickr.com/photos/changturtle/102086333/">changturtle</a></p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21843</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21843" />
    <title>Comment from Phill Midwinter on 2007-08-07</title>
    <author>
        <name>Phill Midwinter</name>
        <uri>http://www.surrch.eu</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.surrch.eu">
        <![CDATA[<p>I can't speak for them all, but personally... we spend more time in development phases where a lot of crawling doesn't go on comparitively speaking.</p>

<p>Other search startups/alts may not report a user agent, or possibly use other peoples indexes.</p>]]>
    </content>
    <published>2007-08-07T15:58:20Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21842</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21842" />
    <title>Comment from Adam Jusko on 2007-08-07</title>
    <author>
        <name>Adam Jusko</name>
        <uri>http://www.bessed.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.bessed.com">
        <![CDATA[<p>For those of us in the human-powered space, it might be because we actually visit sites ourselves instead of sending a spider out to do the dirty work.</p>]]>
    </content>
    <published>2007-08-07T15:06:11Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21841</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21841" />
    <title>Comment from Gary Stewart on 2007-08-07</title>
    <author>
        <name>Gary Stewart</name>
        <uri>http://www.garystew.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.garystew.com">
        <![CDATA[<p>Thanks for the great piece. It's inspired my blog post for today (www.garystew.com). Migoa (my company) is a vertical search engine, so I guess technically we're not really covered by the article. But our information suggests that of the European vertical search players, only we and Extate have proprietary crawlers. Of course, this is only based on publicly available data, so it's possible that there are more European vertical search players with proprietary crawlers. In any case, thanks for the great article.</p>]]>
    </content>
    <published>2007-08-07T14:45:07Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21840</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21840" />
    <title>Comment from Larry on 2007-08-06</title>
    <author>
        <name>Larry</name>
        <uri>http://www.411edirectory.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.411edirectory.com">
        <![CDATA[<p>Thanks for the article and the links back to the Top 100. Great list!</p>]]>
    </content>
    <published>2007-08-07T05:35:32Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21839</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21839" />
    <title>Comment from Rich Skrenta on 2007-08-06</title>
    <author>
        <name>Rich Skrenta</name>
        <uri>http://www.skrenta.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.skrenta.com/">
        <![CDATA[<p>John -</p>

<p>Greg's site is about 7k pages, I figured fewer than 1000 hits over 3 months meant that the site basically wasn't being crawled to much of any extent with any reasonable interval.</p>

<p>Agree that it's all about scale...</p>]]>
    </content>
    <published>2007-08-07T01:04:55Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21838</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21838" />
    <title>Comment from valleyblogzine on 2007-08-06</title>
    <author>
        <name>valleyblogzine</name>
        <uri>http://valleyblogzine.blogspot.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://valleyblogzine.blogspot.com">
        <![CDATA[<p>It is very hard for the small search engines to have an infrastructure to crawl the entire web corpus which increases every second. Only a few companies have the "SCALE" to deal with this.</p>]]>
    </content>
    <published>2007-08-07T00:06:06Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21837</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21837" />
    <title>Comment from John Milan on 2007-08-06</title>
    <author>
        <name>John Milan</name>
        <uri>http://intelligantt.blogspot.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://intelligantt.blogspot.com">
        <![CDATA[<p>Perhaps search engines are making a distinction of some kind. For example, in the last 6 days TeamDirection has been crawled by 33 spiders. The total number of unique crawlers in July was 58-- and not all of them are duplicates of Rich's list.</p>

<p>What I do notice is a vast disparity on the number of hits each crawler registers. It could be that Rich's metric of 1000 web hits is filtering out the alternatives.</p>]]>
    </content>
    <published>2007-08-06T23:56:13Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21836</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21836" />
    <title>Comment from hombrelobo on 2007-08-06</title>
    <author>
        <name>hombrelobo</name>
        <uri>http://hombrelobo.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://hombrelobo.com">
        <![CDATA[<p>Because they want to attract investors, and not being useful to visitors ....</p>]]>
    </content>
    <published>2007-08-06T20:57:18Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21835</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21835" />
    <title>Comment from Rich Skrenta on 2007-08-06</title>
    <author>
        <name>Rich Skrenta</name>
        <uri>http://www.skrenta.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.skrenta.com/">
        <![CDATA[<p>I've corrected the missing attribution to Charles Knight's original article.</p>]]>
    </content>
    <published>2007-08-06T20:02:29Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.2698-comment:21834</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.2698" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/alt_search_engine_crawlers.php#c21834" />
    <title>Comment from Oli on 2007-08-06</title>
    <author>
        <name>Oli</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>(1) Because they only work on a subset of the web until their IT and processing scales well enough to go full monty...</p>

<p>(2) Because they are special-purpose and only look at specific hand-picked sites...</p>

<p>(3) Since they use alexa's index directly and don't need to crawl as someone else does it for them (much better for everyone's bandwidth)...</p>

<p>(4) Since they are "meta" and use other SE's search results...</p>

<p>Lots of possible reasons. (3) makes a lot of sense IMO - if crawling/indexing is a service you can just pay for instead of crawling yourself and building up the whole infrastructure it saves a lot of resources and you can direct your efforts towards _search_ instead of _indexing_.</p>]]>
    </content>
    <published>2007-08-06T19:51:30Z</published>
  </entry>

</feed>
