<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2009://1.15098-</id>
  <updated>2009-11-23T17:02:03Z</updated>
  <title>Comments for Even Social Search Needs an Algorithm: Arguing Against Data Entry As Search Engine</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=15098" title="Even Social Search Needs an Algorithm: Arguing Against Data Entry As Search Engine" />
    <published>2009-05-21T13:00:00Z</published>
    <updated>2009-05-21T12:33:54Z</updated>
    <title>Even Social Search Needs an Algorithm: Arguing Against Data Entry As Search Engine</title>
    <summary>With advance apologies to the hard-working PR folks and startup companies who have pitched us their social search engines this week, there is a rising menace in new media: A cluster of sites that call themselves user-powered search engines. Much in the vein of the failed Wikia Search (the abandoned brain child of Wikipedia founder...</summary>
    <author>
      <name>Jolie O&apos;Dell</name>
      
    </author>
    
    <category term="Crowdsourcing" />
    
    <category term="Search Services" />
    
    <category term="Social Web" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img src="http://www.readwriteweb.com/search.png"/>With advance apologies to the hard-working PR folks and startup companies who have pitched us their social search engines this week, there is a rising menace in new media: A cluster of sites that call themselves user-powered search engines.</p>

<p>Much in the vein of the <a href="http://en.wikipedia.org/wiki/Wikia_Search">failed Wikia Search</a> (the abandoned brain child of <a href="http://wikipedia.org">Wikipedia</a> founder Jimmy Wales), these engines purport to "crowdsource" intelligence about URLs and search terms by allowing users to create profiles and submit, submit, submit content. <a href="http://stumpedia.com">Stumpedia</a> and <a href="http://gurutoy.com">Gurutoy</a> are two products in this category. Each offers the excitement of multimedia, semantic, "neue search" capabilities; and each delivers astonishingly dysfunctional results.</p>]]>
      <![CDATA[<h2>Exhibit A: Stumpedia</h2>
Stumpedia calls itself "the human-powered search engine... a personalized social & real-time collaborative search engine that relies on human participation to index, organize, and review the world wide web. Stumpedia does not depend on bots, algorithms, or company insiders to make decisions on the relevance and ranking of search results."

<p>Because god knows those algorithms have done <a href="http://google.com">nothing</a> for search in the past. As for the "company insiders" part, we're drawing a blank on precisely what that means (<a href="http://www.readwriteweb.com/archives/techmeme_becomes_hires_a_human.php">Megan McCarthy</a>, was this aimed at you?) and defer to the wisdom of the all-knowing RWW commenters to fill us in.</p>

<p>Stumpedia currently boasts around 28,000 URLs and 75,000 search terms in its digital lexicon - hardly enough to allow for a good or interesting browsing experience. By way of comparison, Wikia Search had <a href="http://www.techcrunch.com/2008/06/03/jimmy-wales-wikia-search-finally-doesnt-suck/">indexed about 30 million websites</a> before Jimmy Wales could say with a straight face that the product didn't suck. Just because we know he likes the attention, we ran a search on Robert Scoble:</p>

<p><img src="http://www.readwriteweb.com/scoble.png"/></p>

<p>As you can see, the single returned result was entirely irrelevant to the search term; Scoble's name was nowhere to be found on the linked-to page.</p>

<p>And sadly, for all the talk about insiders not gaming the system, the most relevant results in many searches we tried came from the Stumpedia founder/CEO. Here's a look at his profile and submissions:</p>

<p><img src="http://www.readwriteweb.com/irony.png"/></p>

<p>We wanted to run a search for irony, but apparently the CEO hasn't submitted anything ironic lately.</p>

<p><img src="http://www.readwriteweb.com/irony2.png"/></p>

<h2>Exhibit B: Gurutoy</h2>
Gurutoy recently appealed to us for coverage, styling itself "a visual search engine run completely by you." According to its homepage, Gurutoy asks users to "tell us what is cool and interesting in the worldwide web, and it'll be posted up in Gurutoy for others to see. Search Gurutoy using keywords and phrases and you'll see an array of websites uploaded by you and other users."

<p>Assuming that the 99 percent of Internet users who are not tech bloggers use search engines because they need to find accurate, relevant results, the bar of expectations rests rather high.</p>

<p>For example, if a user searches for "orange juice," he might not expect to see this:</p>

<p><img src="http://www.readwriteweb.com/gurutoy1.png"/></p>

<p>As can be seen by mousing over the thumbnails, the two results returned for that search term were both uploaded by a Los Angeles haberdasher. The results were tagged with relevant ("plaid," "headware") as well as damn perplexing ("brad suzuki," Gurutoy's CEO) terms, and we're still not sure how this cap was returned as a result for "orange juice."</p>

<p>Distressingly, a recommended search for "action figures" returned dismally irrelevant results:</p>

<p><img src="http://www.readwriteweb.com/gurutoy2.png"/></p>

<p>Two of the 13 featured results had information on action figures, and none of the images contained action figures.</p>

<h2>The Problem with Reliance on UGC</h2>
When thinking about building a "visual search engine," entrepreneurs must consider the relevance of the images as well as the URLs. They are faced with the reality of competing with Flickr and Google Images, both of which have powerful tech backed up and fed by a critical mass of user-generated information in the form of tags. They also must compete with Google, Yahoo!, and Microsoft Live search engines on the relevance of results' content.

<p>Expecting that users will do the kind of data entry necessary to create a competitive product in this arena is ludicrous. The Internet already has a Wikipedia, so the kind of people with the knowledge and skill sets and the sheer time to invest have likely already picked their hobby and are eyeball-deep in <a href="http://en.wikipedia.org/wiki/Wikipedia:Barnstars">barnstars</a>.</p>

<p>However, Suzuki sees it differently: "The goal of Gurutoy is to become a visual directory of websites (any subject) on the net. But in a cool way, with the pictures." He compares the site to YouTube and has every faith in the power of user-submitted content.</p>

<p>"Gurutoy does not use any spiders to search the web for content. What we're counting on is for the masses to catch on with Gurutoy and to grow the content to make it relevant."</p>

<p>I asked <a href="http://sproutbox.com">SproutBox</a> cofounder and venture tech/capital expert Mike Trotzke what he thought of algorithm-free social search engines.</p>

<p>"Oh, you mean a purely spam search engine with no users? Yeah, they suck.</p>

<p>"If you are going to try to introduce UGC into search engines, you've got to have some indexing first. It has to have some value out of the gate or no one will care. Not even Jimmy Wales could pull that off."</p>

<p>Trotzke continued to say that if any company would be able to incorporate valuable user-generated information into search, it would be Google. And he doesn't imagine that the search giant would be interested in buying a smaller company for their data or technology.</p>

<p>"[Google has] the vote-up technology already ready in waiting. They just need to tweak and start giving weight to all the data they have been collecting in <a href="http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html">SearchWiki</a> notes for months already."</p>

<h2>The Spam Question</h2>
In Social Media 101, we learn that where there is user-generated content (i.e., where anyone is allowed to tag and submit unreviewed content at no charge), there is spam.

<p>Right now, most of the "users" interested in submitting content to these sites are retailers, enterprise sites, or others with a vested fiscal interest in driving traffic to their URLs. As you can see in this screenshot, MyJewelersPlace.com is spamming the heck out of Stumpedia:</p>

<p><img src="http://www.readwriteweb.com/spam1.png"/></p>

<p>Any site that permits user-submitted links is going to suffer the predictable, lamentable onslaught of black-hat, link-stuffed atrocities, especially for competitive verticals (I personally dare you to search any of these sites for iPods or Viagra.) Especially when adoption rates are low to begin with, UGC search engines are at high risk for being overrun by this kind of spam. This begins a circular process wherein potential users are scared or bored away from the site when search results are irrelevant, desperate pleas for clickthrus and credit card information.</p>

<p>For generic, noncommercial queries, few or no results will be returned. For more consumer-minded searches, results will be skewed and often uninformative. Allowing the community to police itself by flagging suspicious content is a necessary feature for any UCG site. However, when the amount of spam already outnumbers the amount of useful content on a relatively new search platform, what users are going to stick around long enough to register an account, let alone slog through the spam, planting flags left and right.</p>

<p>So, with more apologies to the startups named above, social search still needs to amass and index content using traditional search algorithms if results are to be useful to the end user. Then again, you could just let Google have this one and wait for your next big idea.</p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139004</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139004" />
    <title>Comment from Luis Pereira on 2009-05-21</title>
    <author>
        <name>Luis Pereira</name>
        <uri>http://www.stumpedia.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.stumpedia.com">
        <![CDATA[<p>One thing to note is that Stumpedia is first and foremost a bookmarking site that allows you to save and rank your favorite links using keywords and phrases versus tags. In its present state Stumpedia is not a search engine meant to compete or be compared with the search giants.</p>]]>
    </content>
    <published>2009-05-21T13:37:42Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139005</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139005" />
    <title>Comment from Adam Green on 2009-05-21</title>
    <author>
        <name>Adam Green</name>
        <uri>http://www.alertrank.com/mrgooglealerts</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.alertrank.com/mrgooglealerts">
        <![CDATA[<p>The reality of the Web is that Google defines the reality of the Web: <br />
<a href="http://www.alertrank.com/mrgooglealerts/2009/03/21/living-in-the-mind-of-google/" rel="nofollow">http://www.alertrank.com/mrgooglealerts/2009/03/21/living-in-the-mind-of-google/</a></p>

<p>Market leaders don't get beaten by better products. Even if the better products really are better, which these don't appear to be. They first have to completely mess up, and miss a huge market turn, and then they can get beaten, if the competitor is really better. 1-2-3 beat Visicalc when the IBM PCs appeared, and Visicalc didn't take advantage of that. Google beat Yahoo, when the centralized content portal lost out to search as a navigation tool. </p>

<p>When some huge shift in Internet usage occurs, and Google fails to exploit it, then there is a chance for a new search leader to emerge. That time has not arrived. </p>]]>
    </content>
    <published>2009-05-21T14:13:56Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139021</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139021" />
    <title>Comment from Benjamin on 2009-05-21</title>
    <author>
        <name>Benjamin</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>I think you analyze the problem right here (gaming is a huge issue in social search, as it is in traditional search).  The answer though is not more traditional indexing, but a posting structure that is more gaming resistant. </p>

<p>There is a great opportunity here, and I'll believe Google can do it when I see it. <br />
</p>]]>
    </content>
    <published>2009-05-21T17:01:01Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139024</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139024" />
    <title>Comment from Matthew on 2009-05-21</title>
    <author>
        <name>Matthew</name>
        <uri>http://bug.gd</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://bug.gd">
        <![CDATA[<p>I think it can really be a mixture and works very well in some  verticals.</p>

<p>Take <a href="http://bug.gd" rel="nofollow">http://bug.gd</a> for example. It is an error search engine that works simply by asking you to come back and solve your errors after you're done searching (assuming the solution wasn't found). We've been running for over a year and have over 100,000 solutions to errors catalogued from the community and people seem to like the idea of "errors should be solved once for everyone without repeating everybody's research".</p>

<p>We do use crawlers but only to the extent that it can absolutely identify error messages, but our best content is the generated info from clever users.</p>

<p>It's working awesomely for us, but we could just be in a lucky vertical that doesn't have the kind of problems a generic crowd-sourced engine might have.</p>]]>
    </content>
    <published>2009-05-21T17:33:57Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139038</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139038" />
    <title>Comment from Luis Pereira on 2009-05-21</title>
    <author>
        <name>Luis Pereira</name>
        <uri>http://www.stumpedia.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.stumpedia.com">
        <![CDATA[<p>Benjamin: Great point!</p>

<p>Defending Google's dominance in the search space is always the easy way out for these so called technology pundits and investors.</p>]]>
    </content>
    <published>2009-05-21T19:29:21Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139083</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139083" />
    <title>Comment from Mike on 2009-05-21</title>
    <author>
        <name>Mike</name>
        <uri>http://sproutbox.com/trotzke</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://sproutbox.com/trotzke">
        <![CDATA[<p>I don't blindly defend Google's dominance at all. They will fall-- a startup will topple them.  I just don't think the startup's secret sauce will just be user generated content. I'm thinking it be a superior algorithm</p>

<p>Crawling is kinda like UGC. It's a source of data. That's great, but do you think crawling  is what put google ahead early? It wasn't crawling. It was understanding that links weighted reciprocally were pretty decent at guessing authority and they were fairly difficult to game. That was the early google breakthrough. They had dozens of competitors (crawlers and user generated directories) but they did a better job than most at solving the authority problem.</p>

<p>I'm not even saying that googles demise won't be related to UGC.  I just can't see it being a step backwards in terms of an authority algorithm. </p>]]>
    </content>
    <published>2009-05-22T04:22:18Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139277</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139277" />
    <title>Comment from Hayden Frost on 2009-05-23</title>
    <author>
        <name>Hayden Frost</name>
        <uri>http://tapthehive.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://tapthehive.com/">
        <![CDATA[<p>Having a "social" search engine that requires data to be input is exactly the status of search engines about 15 years ago.  Companies would submit their info to the search engine just like the phone books and everything was a simple directoy.  Second generation search engines made crawlers that would go out and cache things, but their relevancy algos still weren't too great -- most results were offset by paid positions (again, just like the phonebooks).  It wasn't until Google computerized bibliometrics (a common concept in authorship) that search engines actually became really useful.  Relevancy was no longer based on how much you paid or who spammed the system the most with their own pages, but instead, who everyone else was talking about the most.</p>

<p>Long story short, these guys are 15 years too late.</p>]]>
    </content>
    <published>2009-05-23T17:39:40Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15098-comment:139568</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15098" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php#c139568" />
    <title>Comment from Luis Pereira on 2009-05-25</title>
    <author>
        <name>Luis Pereira</name>
        <uri>http://www.stumpedia.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.stumpedia.com">
        <![CDATA[<p>The real-time web is no longer about using link authority and page rank to determine relevancy, it's about linking to content that the majority of users are talking about at the present time and find worthwhile. It's about search engines looking for a new authority algorithm to index and rank this real-time data entry accordingly. "Google will always be the web’s library: archival, organized and oriented around research." It's this new kind of web that Google has little control over.<br />
</p>]]>
    </content>
    <published>2009-05-25T12:46:34Z</published>
  </entry>

</feed>