<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:www.readwriteweb.com,2011:/1/tag:72.47.210.69,2007://1.3765-</id>
  <updated>2011-04-29T12:26:36Z</updated>
  <title>Comments for Building An Open Source, Distributed Google Clone</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.35-en</generator>
  <entry>
    <id>tag:72.47.210.69,2007://1.3765</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=3765" title="Building An Open Source, Distributed Google Clone" />
    <published>2007-05-10T22:47:43Z</published>
    <updated>2007-12-16T23:11:28Z</updated>
    <title>Building An Open Source, Distributed Google Clone</title>
    <summary>Disclosure: the writer of this article, Emre Sokullu, joined Hakia as a Search Evangelist in March 2007. The following article in no way represents Hakia&apos;s views - it is Emre&apos;s personal opinions only. digg_url = &apos;http://www.digg.com/programming/How_To_Build_An_Open_Source_Distributed_Google_Clone&apos;; digg_bgcolor = &apos;#ffffff&apos;; digg_skin = &apos;compact&apos;; Google is like a young mammoth, already very strong but still growing. Healthy...</summary>
    <author>
      <name>Emre Sokullu</name>
      
    </author>
    
    <category term="Analysis" />
    
    <category term="Google" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><em>Disclosure: the writer of this article, Emre Sokullu, joined Hakia as a Search Evangelist in March 2007. The following article in no way represents Hakia's views - it is Emre's personal opinions only.</em></p>
<p><font style="float: right"><script type="text/javascript">
digg_url = 'http://www.digg.com/programming/How_To_Build_An_Open_Source_Distributed_Google_Clone';
digg_bgcolor = '#ffffff';
digg_skin = 'compact';
</script>
<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></font>Google is like a young mammoth, already very strong but still growing. Healthy quarter results and rising expectations in the online advertising space are the biggest factors for Google to keep its pace in NASDAQ. But now let's think outside the square and try to figure out a Google killer scenario. You may know that I am obsessed with open source (e.g. my projects openhuman and simplekde), so my proposition will be open source based - and I'll call it Google@Home.</p>
<p>First let me define what my concept of Google@Home is. Briefly, Google@Home is <strong>an open source, distributed clone of Google</strong>. We already have many open source search engine projects - <a href="http://lucene.apache.org/" target="_blank">Apache Lucene</a> (which is composed of Nutch and Hadoop distributed file system sub-projects) being the most credible one. So this Google@Home concept can be based on one of those open source search engines. Of course it will have a long way to go before reaching Google's utility and reach. But more importantly, Google@Home will be a distributed, decentralized system. What this means is that our desktop computers' idle time will become a part of this new search engine's computational power. In effect this allows it to compete with Google's beefy data centers. This is not a new concept either, <a href="http://setiathome.berkeley.edu/" target="_blank">SETI@Home</a> and <a href="http://folding.stanford.edu/" target="_blank">Folding@Home</a> are 2 well known scientific projects that use the same <a href="http://en.wikipedia.org/wiki/Grid_Computing" target="_blank">grid computing</a> idea in their cores. Indeed Google itself is the biggest supporter of Stanford University based Folding@Home, by dedicating the resources of their toolbars to this project.</p>]]>
      <![CDATA[<h2>Comparison to Wikiasari</h2>
<p>The distributed nature of the engine is what makes it different from Wikipedia co-founder Jimmy Wales' <a href="http://search.wikia.com/wiki/Search_Wikia" target="_blank">Wikiasari</a> project, which is an open source wiki-inspired search engine. While Wikiasari's power may come from Wikipedia, its weakest chain is too much human dependency; the power of masses worked well in the open, community driven encyclopedia project, Wikipedia. But vandalism has still been present - albeit at a manageable level. I'm not sure if this can work so well in search engines though.</p>
<h2>Why an open source search engine?</h2>
<p>Well the concept is clear, but you may wonder about the motivation behind it - why would anyone, an organization or a loosely formed group of people, unite around such a project; and why would people dedicate their computer's' idle time to this? Here are some reasons:</p>
<ol>
<li><strong>A search engine is a platform and should be open</strong>, just like operating systems. Do you remember <a href="http://www.readwriteweb.com/archives/what_does_google_think_you_look_like.php" target="_blank">Alex' post</a> on the image search space? By using himself as an example, he tried to prove how lame current image search engines are. The first comment to his entry was from me, and I told him this problem could be solved with open information access and some face recognition algorithms - just like <a href="http://www.riya.com" target="_blank">Riya</a> is trying to do. Well, unfortunately we don't have open access to search engine databases, all we have is the directory <a href="http://www.dmoz.org" target="_blank">dmoz</a> - which is clearly insufficient. Currently, most search engines APIs lock themselves off at predefined low limits of daily queries.</li>
<li><strong>Need for a better search engine</strong> - collaborative work can always yield better results. Imagine a system where researchers from all around the world, and Google competitors, would contribute to. This would create a bigger brains trust than the one in Mountain View. This is again similar to what's happening with Windows today. Microsoft has one of the world's biggest tech talent pools in their campuses all around the world, but it's impossible to compete with the whole world! And that's why Linux is a clear leader in the server space, and keeps leaping forward in the desktop arena too - see latest <a href="http://seattlepi.nwsource.com/business/314124_dellfolo03.html" target="_blank">Dell's Ubuntu Linux deal</a> and the <a href="http://www.youtube.com/watch?v=DUSn-jBA3CE" target="_blank">3D Linux desktops</a>.</li>
<li><strong>Privacy is a big concern</strong> - as the founder of openhuman, this argument surely doesn't apply to me, but it's a fact that many people are scared by the idea of being watched by the big G's eyes. And Google's compromises in the Chinese market have pushed people to think one more time before giving their noisy, but still useful, search history data to Google. Google's Matt Cutts recently wrote an <a href="http://www.mattcutts.com/blog/google-and-privacy/" target="_blank">interesting post</a> on his company's approach to privacy - but there are still remaining questions in my mind. Google is vulnerable to give up its huge stack of information when presented with subpoenas.</li>
<li><strong>Growing number of competitors</strong> - not everyone is happy with Google's rise on NASDAQ. Case in point: the latest Yahoo - Microsoft - eBay partnership deal. Google, instead of creating new markets just like Amazon does with its <a href="http://www.mturk.com/mturk/welcome" target="_blank">artificial artificial intelligence projects</a> and <a href="http://aws.amazon.com/s3" target="_blank">S3</a> - <a href="http://aws.amazon.com/ec2" target="_blank">EC2</a>, is competing heavily with Yahoo, eBay, Amazon and Microsoft. Also many startups are unhappy with Google disrupting their business and not rewarding their innovation. The best examples are Google Calendar and the broken dreams of 30 Boxes, Kiko and others. Also Google Spreadsheets and lately the situation with Google Toolbar and StumbleUpon. This was again what happened to Microsoft in the 80's and 90's - when they disrupted Sun, IBM, HP and others.</li>
</ol>
<p>Who would create an open source Google clone?</p>
<p>Perhaps, Google itself. Or Google competitors such as Ask or Yahoo. Also it might be something that P2P kings Niklas Zennstrom and Janus Friisk are up to - besides their Joost project. Everything is possible, but in my opinion the most plausible option would be a joint attack by direct competitors. Indeed perhaps the best fit would be the classic "closed source" company Microsoft!! This could be a mirror response to Google, who up till now has leveraged most of its PR towards Microsoft's 'evil' closed source approach (i.e. the subtle 'do no evil' mantra of Google). Stranger things have happened.</p>
<h2>Revenues</h2>
<p>Another idea, this Google@Home project can make more use of power of masses in its core - Google is still reluctant to use the direct power of masses idea in its search. Yahoo, on the other hand, with their new unified Social Search Unit seems more ambitious in this arena. As a total underdog, Google@Home would be more open to such innovations and could probably profit from these new paradigms.</p>
<p>How could you support this type of search engine with a complementary distributed and open source ad network? Baris Karadogan has <a href="http://baris.typepad.com/venture_capitalist/2007/04/social_networki.html" target="_blank">more</a> about this in his blog. (I met him at a conference last week and it turned out that surprisingly we hatched and blogged about these similar concepts at the same time!)</p>
<h2>Conclusion</h2>
<p>Yes. this is my 'Google killer' scenario. There are many open questions though - some of them are:</p>
<ul>
<li>Is this really feasible (I think yes) - but your technical input is welcome</li>
<li>Are there any projects already doing this?</li>
<li>Would it really be a Google killer, or would the user base stay limited to geeks only?</li>
</ul>
<p><font style="float: right"><script type="text/javascript">
digg_url = 'http://www.digg.com/programming/How_To_Build_An_Open_Source_Distributed_Google_Clone';
digg_bgcolor = '#ffffff';
digg_skin = 'compact';
</script>
<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></font>Let us know what you think, and also your 'Google killer' scenarios too!</p>
<p><em>Disclosure: Emre Sokullu now works for Hakia, as a Search Evangelist. He <a href="http://blog.hakia.com/?p=77">started at Hakia</a> in March 2007.</em></p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:316134</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c316134" />
    <title>Comment from bob6432 on 2011-04-20</title>
    <author>
        <name>bob6432</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>please have a look at YaCy, a free-software fully decentralized peer-to-peer search engine: <a href="http://yacy.net" rel="nofollow"><a href="http://yacy.net" rel="nofollow">http://yacy.net</a></a><br />This is a mature project; software is rich of features and if you operate it you can feel like you are an operator in Googles basement.</p>]]>
    </content>
    <published>2011-04-20T09:59:46Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32308</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32308" />
    <title>Comment from Anthony Ettinger on 2007-05-15</title>
    <author>
        <name>Anthony Ettinger</name>
        <uri>http://chovy.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://chovy.com">
        <![CDATA[<p>Nothing against Google really, but I have been talking about an open source distributed search engine among colleagues on several occassions...think of it as a similar implementation to p2p and other non-central data-crunching projects.</p>]]>
    </content>
    <published>2007-05-15T23:26:53Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32307</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32307" />
    <title>Comment from Otis Gospodnetic on 2007-05-14</title>
    <author>
        <name>Otis Gospodnetic</name>
        <uri>http://blog.simpy.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.simpy.com/">
        <![CDATA[<p>Emre:<br />
Re: #30/#31 - heh, didn't I mention this already?  Yes, look at point no. 3 in comment #7.  But that's nothing concrete, just one of the ideas of possible approaches so far, as far as I know.</p>]]>
    </content>
    <published>2007-05-14T15:43:16Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32306</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32306" />
    <title>Comment from erik on 2007-05-14</title>
    <author>
        <name>erik</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>www.open-search.net</p>]]>
    </content>
    <published>2007-05-14T09:17:29Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32305</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32305" />
    <title>Comment from Emre Sokullu on 2007-05-13</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>Oh yes, Wikia may be up to something like that; here are some discussion links:</p>

<p><a href="http://lists.wikia.com/pipermail/search-l/2007-May/000418.html" rel="nofollow"><a href="http://lists.wikia.com/pipermail/search-l/2007-May/000418.html" rel="nofollow">http://lists.wikia.com/pipermail/search-l/2007-May/000418.html</a></a></p>

<p><a href="http://lists.wikia.com/pipermail/search-l/2007-May/000351.html" rel="nofollow"><a href="http://lists.wikia.com/pipermail/search-l/2007-May/000351.html" rel="nofollow">http://lists.wikia.com/pipermail/search-l/2007-May/000351.html</a></a></p>]]>
    </content>
    <published>2007-05-14T05:35:00Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32304</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32304" />
    <title>Comment from Emre Sokullu on 2007-05-13</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Nathan - this is good news, yes I don't follow Wikia mailing list, but it would be great indeed if they take this track.</p>]]>
    </content>
    <published>2007-05-14T05:30:09Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32303</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32303" />
    <title>Comment from Otis Gospodnetic on 2007-05-13</title>
    <author>
        <name>Otis Gospodnetic</name>
        <uri>http://blog.simpy.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.simpy.com/">
        <![CDATA[<p>Emre:<br />
"you can **cache** the crawled bits extracted from the background processes (which are the heaviest part of it) via P2P; then serve them using inexpensive open source solutions" -- this is *super* vague! :)  I don't even see how this relates to Wikipedia.  Use of open-source Media-Wiki?  That'd be a pretty weak comparison, I think.</p>

<p>Oh course you "cache" crawled content.  That's essential for creating a searchable index.</p>

<p>I think Craven at #25 put it well.  He mentioned the same thing I mentioned in my original comment above.</p>

<p>You also mention PageRank in #27.  PageRank is but *one* component of the scoring algorithm.  An *old* one, too, predating all those PhDs hired by Google since the PageRank algo was published.  As far spam, I don't think defeating spam is easy.  How many spam emails did you get today?  I got at least 500.  You may not see as much spam in SERPs as in your inbox because of the nature of SERP/queries/ranking and Inbox/time/freshness/sorting, but you know it's there, eating Google's resources.</p>]]>
    </content>
    <published>2007-05-14T03:33:29Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32302</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32302" />
    <title>Comment from Nathan Braun on 2007-05-13</title>
    <author>
        <name>Nathan Braun</name>
        <uri>http://www.litepost.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.litepost.com">
        <![CDATA[<p>Wikia Search _is_ Google@Home.  Check it out for yourself!</p>

<p>I just wanted to make sure that it's perfectly clear that Jimmy (Jimbo) Wales' (yet-to-be-named) Wikia Search project is already discussing in some detail along these lines, (primarily on the search-l discussion list).  </p>

<p>The distributed computing thread there has been active for some time.</p>]]>
    </content>
    <published>2007-05-14T01:01:42Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32301</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32301" />
    <title>Comment from Emre Sokullu on 2007-05-12</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Dan - I think this is a lie, we more or less know how PageRank like algorithms work; the ways of preventing spam sites are obvious, spam sites know them well too, they try to change viagra with v1agra for example, human-readable modifications - but the algorithms that can detect this are pretty obvious too. I don't believe in legacy of this argument.</p>]]>
    </content>
    <published>2007-05-12T23:37:35Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32300</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32300" />
    <title>Comment from Dan on 2007-05-12</title>
    <author>
        <name>Dan</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>You forgetting one thing, The alogrithms are a best kept secret for a reason, so people cannot manipulate search results if we had the full alogrithm we could just create a bullshit wesbite about "viagra" make it comply to the alogrithm and be set to make alot of money but sadly it isnt that simple.</p>]]>
    </content>
    <published>2007-05-12T13:48:43Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32299</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32299" />
    <title>Comment from Craven de Kere on 2007-05-12</title>
    <author>
        <name>Craven de Kere</name>
        <uri>http://www.able2know.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.able2know.com">
        <![CDATA[<p>This is not a new idea. It's been tried before and failed.</p>

<p>Back in 2003 LookSmart (through Grub <a href="http://searchenginewatch.com/showPage.html?page=2177021)" rel="nofollow"><a href="http://searchenginewatch.com/showPage.html?page=2177021)" rel="nofollow">http://searchenginewatch.com/showPage.html?page=2177021)</a></a> once used user's unused computer resources in a grid to spider sites, but it did little more than compress the pages for the indexing process. This has been tried and it has failed.</p>

<p>That doesn't mean that it's not possible (I've been dreaming about it myself for a long time and it's been one of my dream projects) but there are inherent problems with it.</p>

<p>Firstly the biggest problem search engines face is "search engine spam". "Social" components to a search engine are weaknesses in this fight for search relevancy. </p>

<p>Furthermore the "social" components have yet to show benefits that can scale. For example, the rating of sites works well in limited scope (like a digg, a delicious or a stumbleupon) but do a piss poor job with web-wide searching. Link-based algorithms still dominate the quest for search relevancy and nothing else has proven to do as well just yet (not saying it won't just that it's been tried thousands of times without success, indicating a degree of difficulty).</p>

<p>But most importantly, people underestimate the effect that the latency of p2p has on usability. Nobody wants searches to take as long as they do on p2p platforms, where the query is passed to peers in sequenced handshakes and rare results come back minutes (or more) later.</p>

<p>Look, I want to beat google as much as the next guy. Both because I dislike the power they have over the web as well as my inherent interests in building leading technologies.</p>

<p>But the p2p platform isn't the solution. Open source is. I've followed nutch and lucene since their inceptions with hopes to build decent search engines out of them but they aren't ready yet. But there's hope:</p>

<p>The biggest reason I haven't put my passion for search and my skill with algorithms to use in the quest for a better search engine are due to problems with foundational issues that bore me (distributed filesystems etc) as well as scale issues that impede me (one example is the bandwidth needed to spider the web and keep it fresh).</p>

<p>The hope comes from next-generation services, the likes of which companies like Amazon are currently realizing. For example they offer their "300 terabyte" index as a service to build search engines upon. They also provide cloud computing as a service as well as a storage service.</p>

<p>The evolution of services like those will drive overhead down for problems of the scale of web search and more algo guys like me who think they can tweak their way to better relevancy (and competition for google) will have at it.</p>

<p>I summarize by saying that standardizing protocols (it'd be nice if the index api's protocol was a open standard so you could switch from alexa to another provider if needed) you'd avoid lock in that might discourage developers (I'm not ready to invest a boatload of time on someone else's search service just yet) and by making open source foundations for search with easy abilities to tweak the ranking algos you will see more engines out there.</p>

<p>p2p can only help with some of the spidering and indexing, serving the serps needs to be done from centralized servers hosting the indexes for speed reasons. With future services out there ready to help small developers scale what's really missing is something like lucene but with easy ability to customize the indexing and ranking algos.</p>

<p>Give me that and the maturation of cloud-computing as a service and I'll give you a very relevant search engine.</p>

<p>P.S.</p>

<p>Here's a very basic primer on making a search engine:</p>

<p>Why Writing Your Own Search Engine is Hard<br />
<a href="http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=143" rel="nofollow"><a href="http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=143" rel="nofollow">http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=143</a></a></p>]]>
    </content>
    <published>2007-05-12T11:10:38Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32298</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32298" />
    <title>Comment from Emre Sokullu on 2007-05-11</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Otis - you can **cache** the crawled bits extracted from the background processes (which are the heaviest part of it) via P2P; then serve them using inexpensive open source solutions - just like Wikipedia does. Serving the content should not be a problem. And that's the first thing that comes to my mind, it's a semi P2P solution though. But if we delve into it, we can find more.</p>]]>
    </content>
    <published>2007-05-12T06:17:20Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32297</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32297" />
    <title>Comment from Otis Gospodnetic on 2007-05-11</title>
    <author>
        <name>Otis Gospodnetic</name>
        <uri>http://blog.simpy.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.simpy.com/">
        <![CDATA[<p>Emre:<br />
cost(crawl)</p>]]>
    </content>
    <published>2007-05-12T03:46:58Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32296</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32296" />
    <title>Comment from Emre Sokullu on 2007-05-11</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Onno, the data manipulation problem could exist in any P2P system like Kazaa but encryption algorithms can solve it with compromise in speed and latency as Otis say. </p>

<p>@Otis, crawling is not cheap, consider 10MB sitemaps that webmasters include in their web directory; just for one site, parsing 10MB of sitemap and crawling these pages, extracting data out is a tremendous job and P2P can come to help here. I agree with your latency point, that's the problem that should be tackled, but at least, this can be solved with a semi P2P approach - a Wikipedia, P2P hybrid.</p>]]>
    </content>
    <published>2007-05-12T01:04:49Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32295</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32295" />
    <title>Comment from 0neway on 2007-05-11</title>
    <author>
        <name>0neway</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Google Killer... dreaming..!</p>]]>
    </content>
    <published>2007-05-12T00:19:31Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32294</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32294" />
    <title>Comment from Onno Zweers on 2007-05-11</title>
    <author>
        <name>Onno Zweers</name>
        <uri>http://www.onnoot.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.onnoot.com">
        <![CDATA[<p>I have thought of a distributed search engine, and I foresee a few serious problems.</p>

<p>Google is constantly defending itself against search engine optimizers who use dubious techniques to get their (clients) websites higher in the search results. Google manages to some extent by keeping their PageRank algorithm secret. This is one of the few examples where I think security through obscurity works.</p>

<p>A distributed search engine has two problems: first, someone has to manage some kind of ranking algorithm - but who? And second, by placing part of a web index database on my computer, it places me in a position to manipulate the data.</p>

<p>Without these problems solved, the distributed search engine is bound to fail because of manipulation by search engine optimizers.</p>

<p>I haven't come up with an idea to solve these problems. I'm very curious if they can be solved.</p>]]>
    </content>
    <published>2007-05-11T23:59:33Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32293</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32293" />
    <title>Comment from Otis Gospodnetic on 2007-05-11</title>
    <author>
        <name>Otis Gospodnetic</name>
        <uri>http://blog.simpy.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.simpy.com/">
        <![CDATA[<p>Emre:<br />
My point was - there is a reason why all those past and current attempts at distributed search engine are not succeeding.  Crawling part is cheap.  Delivering search results is expensive.  Crawling is easily distributed.  Search is not (latency).</p>]]>
    </content>
    <published>2007-05-11T23:46:50Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32292</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32292" />
    <title>Comment from Emre Sokullu on 2007-05-11</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Wolf, thanks for sharing, these Faroo thing looks promising - it has successive o's in the brand name - that's it, it will surely do it :-) Kidding but IMO it<br />
had better to be open source, otherwise PR, collaborative development and everything become difficult.</p>

<p>@Sagar, you have to be patient, this is not an easy job and will definitely take time!</p>

<p>@Zeno, the good thing is if this was an open source system, you could create this add-on by yourself and let everyone use it; or perhaps you could fork the project too :-)</p>]]>
    </content>
    <published>2007-05-11T17:30:51Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32291</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32291" />
    <title>Comment from Sagar on 2007-05-11</title>
    <author>
        <name>Sagar</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Am I correct if I say "depending on masses (distributed) is not always so successful to kill big companies(products)"! Linux, itself will take decades to kill windows. Unless google makes mistakes it is on its way to be a major player in the future.</p>]]>
    </content>
    <published>2007-05-11T16:10:03Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32290</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32290" />
    <title>Comment from Wolf on 2007-05-11</title>
    <author>
        <name>Wolf</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Have a look at <a href="http://www.faroo.com" rel="nofollow">FAROO</a>, a peer-to-peer web search engine (although not open source).</p>]]>
    </content>
    <published>2007-05-11T15:02:47Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32289</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32289" />
    <title>Comment from Zeno Davatz on 2007-05-11</title>
    <author>
        <name>Zeno Davatz</name>
        <uri>http://zeno.davaz.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://zeno.davaz.com">
        <![CDATA[<p>I totally agree. But you will also need a linguistical Database as in <a href="http://www.infocodex.com" rel="nofollow"><a href="http://www.infocodex.com" rel="nofollow">http://www.infocodex.com</a></a></p>]]>
    </content>
    <published>2007-05-11T13:09:12Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32288</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32288" />
    <title>Comment from Essam Alzamel on 2007-05-11</title>
    <author>
        <name>Essam Alzamel</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Its possible also, to make the project profitable (i.e. text ads) and give the profit to charities. This will be huge incentive for people to contribute in the project. Imagine, all google Income being distributed yearly. The money can also be used to push open-source projects and educate people is poor countries.</p>]]>
    </content>
    <published>2007-05-11T10:21:49Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32287</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32287" />
    <title>Comment from Peter Cooper on 2007-05-11</title>
    <author>
        <name>Peter Cooper</name>
        <uri>http://www.petercooper.co.uk/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.petercooper.co.uk/">
        <![CDATA[<p>This topic has been one of my personal obsessions for the past year now. Google's search result quality has reduced significantly in recent times (starting around last summer) and I've left the little used multitasker at the back of my brain to keep thinking about this topic :)</p>

<p><em>We already have many open source search engine projects - Apache Lucene (which is composed of Nutch and Hadoop distributed file system sub-projects) being the most credible one. So this Google@Home concept can be based on one of those open source search engines.</em></p>

<p>I follow Chas's sentiments on this. Lucene has its place, but Google was successful primarily because they developed a great algorithm (which is beginning to get stale, granted) and then developed an architecture designed to use that algorithm at its full potential. Google is an optimized grid running a grid algorithm. Creating a grid and then running a generic algorithm would be ineffective and a poor design.</p>

<p>Building great things takes pain (if only the architects of more modern e-mail systems realized this before SMTP became so prevalent) and Google's founders took the initial pain of developing a reasonably risky, unproven algorithm into a system which then became a business. They weren't look for full results on day one, month one, or even year one, but started with the principles they thought were right and then built the whole system up from there.</p>

<p>Starting with something like Lucene almost means throwing away ideals and many new ideas and concepts merely to get a time bonus. Torvalds, Brin and Page didn't do it that way with their projects, and I'm not sure an open source "Google killer" could do it that way either.</p>

<p>Chas is right in that we need to redefine the problem, and the potential solution, although I am not entirely convinced that redefining it must mean it would change significantly. There are a lot of poor two-bit search engines out there that seem to live in the smugness of being different.. and that's another mistake to avoid.</p>]]>
    </content>
    <published>2007-05-11T08:55:57Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32286</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32286" />
    <title>Comment from Emre Sokullu on 2007-05-11</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Patrik - The scale problem can be solved with P2P approach but as you say, the movement should start somehow, someone should lead it, just like Linus did years ago. Then, trust and backing of big corporations should come... The pieces are out there (Nutch, dmoz, Otis' propositions, there was an open source P2P framework too - I just don't remember the name), but someone should glue these and make it a product. I would like to do that by myself and show you the code :-) but I don't have time unfortunately... that's why I speak out and make a call actually.</p>]]>
    </content>
    <published>2007-05-11T08:41:18Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32285</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32285" />
    <title>Comment from Patrik Wallstr√∂m on 2007-05-11</title>
    <author>
        <name>Patrik Wallstr√∂m</name>
        <uri>http://pawal.blipp.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://pawal.blipp.com/">
        <![CDATA[<p>If you really want it to happen, there is the classical open source comment: show me your code.</p>

<p>Yes, it is a good idea, but it will stay a good idea until there is some code out there.</p>

<p>There is a lot of search engine code published with a free license, and some experiments on distributed crawling, but I have not seen anything that scales up to the size of Google. And it is not a novel idea out there, it is just very hard to climb up to the scale of Google.</p>]]>
    </content>
    <published>2007-05-11T08:01:21Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32284</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32284" />
    <title>Comment from Yakov on 2007-05-11</title>
    <author>
        <name>Yakov</name>
        <uri>http://www.quintura.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.quintura.com">
        <![CDATA[<p>Emre, it's great that you are both with Charles started this topic of Google's killer. I agree with Chas that a search problem first should be redefined</p>]]>
    </content>
    <published>2007-05-11T07:43:30Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32283</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32283" />
    <title>Comment from Remi on 2007-05-11</title>
    <author>
        <name>Remi</name>
        <uri>http://www.kokyunage.net</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.kokyunage.net">
        <![CDATA[<p>Brilliant. :)</p>]]>
    </content>
    <published>2007-05-11T07:03:55Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32282</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32282" />
    <title>Comment from Emre Sokullu on 2007-05-10</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Otis: a quote from the article:<br />
"Of course it will have a long way to go before reaching Google's utility and reach"</p>

<p>Thanks for informing about these projects but unfortunately we don't have anything people can really **use** - we obviously lack such a project.</p>]]>
    </content>
    <published>2007-05-11T06:24:58Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32281</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32281" />
    <title>Comment from Otis Gospodnetic on 2007-05-10</title>
    <author>
        <name>Otis Gospodnetic</name>
        <uri>http://blog.simpy.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.simpy.com/">
        <![CDATA[<p>Nothing new here, Emre.</p>

<p>- distributed search engine has been done.... what was the name... Gruby?  Looksmart bought the small team + sw many years ago, obviously did nothing with it.</p>

<p>- I know there is one active search engine like that somewhere in UK.  Can't rememebr the name now, obviously nothing spectacular.</p>

<p>- Jimbo's SE project might be distributed after all - check the list archives for May.</p>

<p>- Comparing Lucene to Google makes no sense.  Lucene is a low-level library.  It's up to the application to make clever use of it.</p>]]>
    </content>
    <published>2007-05-11T06:13:37Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32280</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32280" />
    <title>Comment from Emre Sokullu on 2007-05-10</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>Yes Gert-Jan, good to see that, but this doesn't even have a user interface - seems to geekish but can be used as a base; the one I talk about should have the same accessibility with current search engines. Average user Joe should not even feel that it is P2P.</p>]]>
    </content>
    <published>2007-05-11T05:51:16Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32279</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32279" />
    <title>Comment from Gert-Jan van Engelen on 2007-05-10</title>
    <author>
        <name>Gert-Jan van Engelen</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Very good idea! In fact so good that it already has been implemented see <a href="http://www.yacy.net/yacy/" rel="nofollow"><a href="http://www.yacy.net/yacy/" rel="nofollow">http://www.yacy.net/yacy/</a></a> or just do a search on 'p2p search engine' on Google. Not in Google quality but anyway...If you from Haika and YACY would bundle knowledge and resources and user base, it might become something more than an idea...</p>]]>
    </content>
    <published>2007-05-11T05:43:07Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32278</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32278" />
    <title>Comment from Emre Sokullu on 2007-05-10</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Narendra - sorry it was just a misinterpretation of my thoughts; i just meant Google Calendar disrupted the online calendar industry with the ubiquity of Google, that's all.</p>]]>
    </content>
    <published>2007-05-11T01:21:04Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32277</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32277" />
    <title>Comment from Emre Sokullu on 2007-05-10</title>
    <author>
        <name>Emre Sokullu</name>
        <uri>http://emresokullu.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://emresokullu.com">
        <![CDATA[<p>@Chas - see the link in revenues section, Baris Karadogan has what you want.</p>]]>
    </content>
    <published>2007-05-11T01:14:21Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32276</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32276" />
    <title>Comment from Narendra on 2007-05-10</title>
    <author>
        <name>Narendra</name>
        <uri>http://www.nosoapradio.org</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.nosoapradio.org">
        <![CDATA[<p>Kindly omit 30 Boxes from the broken dreams caused by Google Calendar. It simply isn't true. We continue to crush their product every day.</p>]]>
    </content>
    <published>2007-05-11T01:12:43Z</published>
  </entry>

  <entry>
    <id>tag:72.47.210.69,2007://1.3765-comment:32275</id>
    <thr:in-reply-to ref="tag:72.47.210.69,2007://1.3765" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php#c32275" />
    <title>Comment from Chas on 2007-05-10</title>
    <author>
        <name>Chas</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Well, not that the idea of a distributed search engine isn't nice (even though it really has no direct relation to open source) - but if the problem in competing with Google was the computing power or the size of their data centers, this market wouldn't have looked the way it does. Google simply implements search better - it brings better results (or at least most users act as if they do) - and that's what counts. It's the algorithms, not the computing resources. </p>

<p>Now, for the algorithms. Needless to say, Lucene is ages away from Google - and there's no reason to believe that making it distributed would change that. MS and Yahoo! both have many very smart people working for them, and I'm sure they can equal Google at some point (maybe they have, at least at some point in time - but I don't search there so I wouldn't know ;-) ) - it's safe to assume the search algorithms used by these giants are pretty dynamically evolving. Would such a (momentary) advantage make users switch? Not likely, because Google is good enough. It has to take really a radical advantage to make people switch. Personally, I haven't seen this disruptive search technology or product yet.</p>

<p>A minor point is also that open source development is, well, open. If there's such a great idea of a researcher working on this Mega-Lucene, Google can implement it too - there's nothing stopping them, because the GPL doesn't apply here (the software runs on Google servers), let alone other Open Source licenses. And patents probably wouldn't be used to protect this hypothetical researcher's ideas - they're not really compatible with the open source way of thinking.<br />
(Note the difference in this aspect between this case and the case of Windows vs. Linux).</p>

<p>So IMHO - no, this is really far from being a "Google killer". A "Google [search] killer" needs to at least redefine the search problem as we know it ("I give you some keywords, you give me links") - and it still, of course, wouldn't kill Google :-)</p>

<p>You may have a better chance in conjuring up a "Google ads killer" - that's where the real killer potential is.</p>]]>
    </content>
    <published>2007-05-11T00:24:26Z</published>
  </entry>

</feed>
