user contributed content - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/user contributed content en Copyright 2012 Richard MacManus readwriteweb@gmail.com Wed, 15 Feb 2012 12:30:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Even Social Search Needs an Algorithm: Arguing Against Data Entry As Search Engine With advance apologies to the hard-working PR folks and startup companies who have pitched us their social search engines this week, there is a rising menace in new media: A cluster of sites that call themselves user-powered search engines.

Much in the vein of the failed Wikia Search (the abandoned brain child of Wikipedia founder Jimmy Wales), these engines purport to "crowdsource" intelligence about URLs and search terms by allowing users to create profiles and submit, submit, submit content. Stumpedia and Gurutoy are two products in this category. Each offers the excitement of multimedia, semantic, "neue search" capabilities; and each delivers astonishingly dysfunctional results.

]]> Exhibit A: Stumpedia Stumpedia calls itself "the human-powered search engine... a personalized social & real-time collaborative search engine that relies on human participation to index, organize, and review the world wide web. Stumpedia does not depend on bots, algorithms, or company insiders to make decisions on the relevance and ranking of search results."

Because god knows those algorithms have done nothing for search in the past. As for the "company insiders" part, we're drawing a blank on precisely what that means (Megan McCarthy, was this aimed at you?) and defer to the wisdom of the all-knowing RWW commenters to fill us in.

Stumpedia currently boasts around 28,000 URLs and 75,000 search terms in its digital lexicon - hardly enough to allow for a good or interesting browsing experience. By way of comparison, Wikia Search had indexed about 30 million websites before Jimmy Wales could say with a straight face that the product didn't suck. Just because we know he likes the attention, we ran a search on Robert Scoble:

As you can see, the single returned result was entirely irrelevant to the search term; Scoble's name was nowhere to be found on the linked-to page.

And sadly, for all the talk about insiders not gaming the system, the most relevant results in many searches we tried came from the Stumpedia founder/CEO. Here's a look at his profile and submissions:

We wanted to run a search for irony, but apparently the CEO hasn't submitted anything ironic lately.

Exhibit B: Gurutoy

Gurutoy recently appealed to us for coverage, styling itself "a visual search engine run completely by you." According to its homepage, Gurutoy asks users to "tell us what is cool and interesting in the worldwide web, and it'll be posted up in Gurutoy for others to see. Search Gurutoy using keywords and phrases and you'll see an array of websites uploaded by you and other users."

Assuming that the 99 percent of Internet users who are not tech bloggers use search engines because they need to find accurate, relevant results, the bar of expectations rests rather high.

For example, if a user searches for "orange juice," he might not expect to see this:

As can be seen by mousing over the thumbnails, the two results returned for that search term were both uploaded by a Los Angeles haberdasher. The results were tagged with relevant ("plaid," "headware") as well as damn perplexing ("brad suzuki," Gurutoy's CEO) terms, and we're still not sure how this cap was returned as a result for "orange juice."

Distressingly, a recommended search for "action figures" returned dismally irrelevant results:

Two of the 13 featured results had information on action figures, and none of the images contained action figures.

The Problem with Reliance on UGC

When thinking about building a "visual search engine," entrepreneurs must consider the relevance of the images as well as the URLs. They are faced with the reality of competing with Flickr and Google Images, both of which have powerful tech backed up and fed by a critical mass of user-generated information in the form of tags. They also must compete with Google, Yahoo!, and Microsoft Live search engines on the relevance of results' content.

Expecting that users will do the kind of data entry necessary to create a competitive product in this arena is ludicrous. The Internet already has a Wikipedia, so the kind of people with the knowledge and skill sets and the sheer time to invest have likely already picked their hobby and are eyeball-deep in barnstars.

However, Suzuki sees it differently: "The goal of Gurutoy is to become a visual directory of websites (any subject) on the net. But in a cool way, with the pictures." He compares the site to YouTube and has every faith in the power of user-submitted content.

"Gurutoy does not use any spiders to search the web for content. What we're counting on is for the masses to catch on with Gurutoy and to grow the content to make it relevant."

I asked SproutBox cofounder and venture tech/capital expert Mike Trotzke what he thought of algorithm-free social search engines.

"Oh, you mean a purely spam search engine with no users? Yeah, they suck.

"If you are going to try to introduce UGC into search engines, you've got to have some indexing first. It has to have some value out of the gate or no one will care. Not even Jimmy Wales could pull that off."

Trotzke continued to say that if any company would be able to incorporate valuable user-generated information into search, it would be Google. And he doesn't imagine that the search giant would be interested in buying a smaller company for their data or technology.

"[Google has] the vote-up technology already ready in waiting. They just need to tweak and start giving weight to all the data they have been collecting in SearchWiki notes for months already."

The Spam Question

In Social Media 101, we learn that where there is user-generated content (i.e., where anyone is allowed to tag and submit unreviewed content at no charge), there is spam.

Right now, most of the "users" interested in submitting content to these sites are retailers, enterprise sites, or others with a vested fiscal interest in driving traffic to their URLs. As you can see in this screenshot, MyJewelersPlace.com is spamming the heck out of Stumpedia:

Any site that permits user-submitted links is going to suffer the predictable, lamentable onslaught of black-hat, link-stuffed atrocities, especially for competitive verticals (I personally dare you to search any of these sites for iPods or Viagra.) Especially when adoption rates are low to begin with, UGC search engines are at high risk for being overrun by this kind of spam. This begins a circular process wherein potential users are scared or bored away from the site when search results are irrelevant, desperate pleas for clickthrus and credit card information.

For generic, noncommercial queries, few or no results will be returned. For more consumer-minded searches, results will be skewed and often uninformative. Allowing the community to police itself by flagging suspicious content is a necessary feature for any UCG site. However, when the amount of spam already outnumbers the amount of useful content on a relatively new search platform, what users are going to stick around long enough to register an account, let alone slog through the spam, planting flags left and right.

So, with more apologies to the startups named above, social search still needs to amass and index content using traditional search algorithms if results are to be useful to the end user. Then again, you could just let Google have this one and wait for your next big idea.

]]> Discuss]]>
http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php http://www.readwriteweb.com/archives/social-search-needs-an-algorithm.php Search Thu, 21 May 2009 06:00:00 -0800 Jolie O'Dell
Lessons from the Ant Colony: Overcoming the Biases of Web 2.0 Operating as a collective, an ant colony can achieve remarkable things, complete tasks, and solve problems that would be unimaginable for a single ant. Colonies are responsible for building elaborate nests, waging battles, and creating efficient highway systems to food sources. The collective intelligence of an ant colony can serve as inspiration to help us solve complex human problems. Businesses in particular are finding innovative ways to apply these lessons from nature, from routing trucks to managing plane congestion on the tarmac... to making Internet search more accurate.

]]> The theory of swarm intelligence (or collective intelligence) relates to how the simple actions of individuals can come together to produce the sophisticated behavior of the collective. Deborah Gordon, a biologist at Stanford University who has spent decades studying harvester ants in the Arizona desert, summed up the concept this way: "Ants aren't smart. Ant colonies are."

Take foraging as an example. Whenever an ant finds food and carries it back to the nest, that ant leaves a chemical trail (pheromones) along the way. Other ants sniff the chemical trail and follow it toward that same food source. As more ants find the food and carry it back to the nest, the path gets a stronger chemical dose and, in turn, becomes more attractive to fellow foragers. Individually, these ants are following a simple set of rules and acting on local information: follow the pheromone clues and bring food back to the nest. However, the colony as a collective is behaving in quite a complex way: creating a sophisticated highway system that leads to the best food sources.

Collective Intelligence and the Web

So, what do ants and chemical trails have to do with the web? For starters, lessons from colony behavior can be applied to enhance the way we search for information, products, and solutions.

The Internet puts an unprecedented range of goods and information right at our fingertips. And while we now have the ability to find what we want, when we want, many are finding that more isn't necessarily better. As Barry Schwartz explains in "The Paradox of Choice: Why More is Less," too many options actually cause more psychological distress than good.

Think about it. How many times have you abandoned a search after hitting page 7 (or maybe 2) because you couldn't find what you were looking for, and then ended up doing multiple searches with different search terms? The sheer enormity of results often makes searches exhaustive, tedious, and overwhelming; we're forced to wade through pages of results before finding the product or link we want.

Of course, the answer is not to limit the available products or retrieved results. After all, access to this rich "long tail" of goods has been a key driver behind the success of many online retailers. Instead, websites try to minimize this search exhaustion by predicting what a person really wants and putting these products up front. It's a great idea in theory, but not so easy to implement in the real world.

The Limits of Web 2.0, and the "Squeaky Wheel" Syndrome

With a strong foundation in collaboration, community, and user participation, the Web 2.0 movement seemed to solve this dilemma by factoring in the contributions of users to narrow down choices. Eager community participants can make their voices and opinions known through user reviews, recommendations, ratings, tagging, and more. Websites have tapped into these crowd-sourcing techniques to determine the relevance of search results. While these methods may help tame the tangle of options, they suffer from one major problem: bias.

Traditional crowd-sourcing demands active participation from its members. The problem here is that not everyone contributes. Only certain types of individuals are likely to make an effort, and they are driven by various motives, from a mere hope to be noticed in a crowd to an altruistic desire to help others to a need to rant about a negative experience. In short, only a subset of the population (the squeaky wheels) will participate, significantly limiting the sample pool and possibly skewing the results with personal bias and inaccuracy.

But what if there was a way to sidestep these biases and gather a perfect representation of consumer attitudes by tapping into the opinion of every single person who visited a site or conducted a search?

Back to the ant colony...

The Next Phase of Social Search: the Super-Community

Watching a trail of ants march toward crumbs of food, it's hard to imagine that ants aren't aware of their actions. But according to studies on swarm intelligence, what appears to be intelligent behavior actually results from nothing more than the complex interaction of simple actions.

Likewise, websites can tap into the implicit wisdom of the community to more accurately predict the most popular and relevant results of any given search. There's a wealth of information in the everyday online activity and behavior of website visitors: every successful or failed search, every page visited, every purchase or abandoned cart represents valuable information. These natural behaviors and actions reflect the true and unbiased opinion of the community as a whole.

By listening to these implicit actions, website owners can gain new insight into the preferences of the silent majority; by leveraging the data, they can optimize results for future searchers. Just like ants that leave a chemical trail each time they bring food back to the nest, we leave real-time feedback each time we visit a page or select (or ignore) a result.

With each search, we unknowingly participate in a cooperative design that improves the search experience for all searchers to follow. Simple, self-guided actions -- entering keywords and selecting results -- drive the greater common good. And as more people participate, both the chemical trail and the overall system grow stronger.

This new participatory strategy gives greater power to the super-community, in which the collective intelligence of all site visitors is harnessed to create a better search and shopping experience for everyone. With each search, the community carves out a faster, more efficient pathway to desired information and products, no different than the trail of pheromones leading to food sources. And like the ants, web searchers act as a collective team (whether they know it or not), yet another example of the whole being greater than the sum of its parts.

Scott Brave is a founder and CTO of Baynote, Inc. Prior to Baynote, he was a postdoctoral scholar at Stanford University and served as lab manager for the CHIMe (Communication between Humans and Interactive Media) Lab. Scott is an inventor of six patents and co-author of over 25 publications in the areas of human-computer interaction and artificial intelligence. Scott is also an Editor of the "International Journal of Human-Computer Studies" (Amsterdam: Elsevier) and co-author of "Wired for speech: How voice activates and advances the human-computer relationship" (Cambridge, MA: MIT Press). Scott received his Ph.D. in Human-Computer Interaction, and B.S. in Computer Systems Engineering from Stanford University, and his Master's from the MIT Media Lab.

(Photo by Il conte di Luna.)

]]> Discuss]]>
http://www.readwriteweb.com/archives/lessons_from_ant_colony_overcoming_biases_web_20.php http://www.readwriteweb.com/archives/lessons_from_ant_colony_overcoming_biases_web_20.php Web 2.0 Wed, 15 Apr 2009 02:00:00 -0800 Guest Author