blekko - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/blekko en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 12:45:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss A Duck & a Wiki Team Up Against the Content Farms duckduckwiki.jpgInnovative web search engine DuckDuckGo has partnered with massive collaboratively built how-to site Wikihow to offer permanent top-level results for how-to searches on the site. DuckDuckGo is aimed straight at Google, going as far as buying a prominent billboard in San Francisco condemning Google's data tracking practices. Wikihow is aimed right at eHow, content farm Demand Media's massive how-to site of questionable quality.

The twist to the story is that the founder of Wikihow, Jack Herrick, actually sold eHow to Demand Media in 2006. Herrick chose the wiki method of collaborative editing over the bulk freelance model of eHow. "It's like eating a McDonald's burger vs. a wonderful, home cooked meal," he told ReadWriteWeb in 2009.

]]> Wikihow, which has raised no money at all, now sees 30 million visitors each month. The company says that no money changed hands between it and DuckDuckGo as part of this deal.

DuckDuckGo has been mentioned in the same breath lately as search challenger Blekko, which banished 20 alleged content farms from its search results this week. We questioned Blekko's decision to do that, on the grounds that one person's content farm might be another person's more easily-read, accessible content. One site on Blekko's list, shopping review site Buzzillions, contacted us to complain that it was put on that list unfairly. A brief visit to the site makes me wonder if Blekko's users, who called the site spam by clicking on a link on Blekko, ought not speak for everyone. I like Blekko, but at first glance I like Buzzillions too.

Some people don't like Wikihow and think it smells like a content farm! (It looks pretty good to me.) Blekko CEO Rich Skrenta says simply that his users voted Buzzillions into the Top 20 spammiest domains and they didn't vote that way for Wikihow. He just does what the users tell him on this, he says.

DuckDuck go's approach of pinning high-quality content, which is easily edited collaboratively, to the top of its page of search results - is more appealing to me than Blekko's strategy of banishment. Blekko's slashtag feature, the creation of topically limited collections of quality sources on a topic, is very useful but feels different.

What do you think of Wikihow, DuckDuck go and the prospect of a favored source like this?

]]> Discuss]]>
http://www.readwriteweb.com/archives/a_duck_a_wiki_team_up_against_the_content_farms.php http://www.readwriteweb.com/archives/a_duck_a_wiki_team_up_against_the_content_farms.php Search Fri, 04 Feb 2011 14:05:14 -0800 Marshall Kirkpatrick
Search Startup Bans Content Farms, But is That What People Really Want? Large quantities of low quality content, of marginal relevance, intended to draw visitors through search, but drive them to click through ads to other sites - that's what's called a content farm. The voices of critics of Google are getting louder with allegations that the world's leading search engine has been thoroughly gamed and is now drowning in content farmed links. Content farm is a very subjective designation, though.

Search startup Blekko is betting that web users want to search without seeing results from companies that are pumping out low-quality content just for the ad revenue. But is one person's low quality content another person's more-accessible reading material? Today Blekko released a list of the top 20 domains that its users have clicked the "SPAM" button on in their search results. Content from those sites will never show up in a Blekko search again, the company says. What do you think of this list?

]]> "These sites are the worst spam publishers on the Web according to our users," said Rich Skrenta, CEO of Blekko. "They are literally responsible for millions of pages on the Web that our users say are just not helpful and they'd prefer they were banned permanently. So we're going to do that for them."

The list is:
ehow.com
experts-exchange.com
naymz.com
activehotels.com
robtex.com
encyclopedia.com
fixya.com
chacha.com
123people.com
download3k.com
petitionspot.com
thefreedictionary.com
networkedblogs.com
buzzillions.com
shopwiki.com
wowxos.com
answerbag.com
allexperts.com
freewebs.com
copygator.com

Want to see what search results from those domains look like? You can't do it using Blekko anymore, but here's a Google Custom Search Engine that searches inside just those domains alone.

Loading


Are These Spam Domains? Maybe Not

A lot of these domains are pretty obnoxious, but that's just my opinion. Other peoples' opinions are different. People complain about Demand Media's eHow, for example, but the site also has one of the most popular free iPhone apps in the iTunes store. The content is directly useful, highly readable and easy to navigate.

In a semi-literate, post-functional world, people need basic instructions on everyday matters.

Picture the dystopia in the movie Back to the Future II, where Biff ends up a powerful media mogul and the world is awash in insipid, screeching, 24-hour infomercials. That's kind of where we live, folks, and our brains have turned a little softer than some of us might like as a result.

Where else are you going to learn about basic things in this world? On Wikipedia? Have you read a Wikipedia entry lately?

In a semi-literate, post-functional world, people need basic instructions on everyday matters.

Picture the dystopia in the movie Back to the Future II, where Biff ends up a powerful media mogul and the world is awash in insipid, screeching, 24-hour infomercials. That's kind of where we live, folks, and our brains have turned a little softer than some of us might like as a result.

Where else are you going to learn about basic things in this world? On Wikipedia? Have you read a Wikipedia entry lately? They trend wonky, over-detailed from the top and according to a New York Times report yesterday, written almost entirely by men.

The content on the domains above may seem like spam to the egg-headed geniuses behind Blekko, and the highly discerning early customers of that site, but I don't think they always look like spam to the rest of the people on the web.

Fixya is another domain that's on that list that I guarantee loads of everyday people are thankful for, not calling for banishment of. The site is littered with advertisements and poor writing. News flash: so is the rest of the world. That might offend the sensibilities of enough sophisticated Blekko users to click that Spam button on the Blekko site, but we'll see who wins long-term - Blekko or the content farms.

I use Blekko every day. But I don't use it for "spam control." The determination of whether something is spam or not is really about context. I use Blekko for other types of context filtering. The ability to set up custom lists of domains not to exclude, but to limit a search to, is what's most useful for me about Blekko.

I use the site to limit my searches and see what tech bloggers have written about a subject, or what tech industry analysts have, or bloggers who cover developments in the Middle East, or venture capitalists.

I understand that most people don't want to perform searches limited to contexts as sophisticated as that, perhaps. But those same masses of users who don't want to do anything too sophisticated are also likely to want some easy-to-read tutorial content like what you find on eHow. Is Blekko intending to serve just people who are interested in creating their own topical collections, or are they aimed at mainstream users? Do mainstream users really dislike these sites that Blekko is now banishing? I'm not so sure they do.

Banishing "content farms" may make sense in the minds of the people behind Blekko, but I'm not sure it's the best idea for everyone.

]]> Discuss]]>
http://www.readwriteweb.com/archives/are_these_the_top_20_content_farms_on_the_web.php http://www.readwriteweb.com/archives/are_these_the_top_20_content_farms_on_the_web.php Analysis Tue, 01 Feb 2011 09:05:29 -0800 Marshall Kirkpatrick
Top 10 RSS and Syndication Technologies of 2010 Best_of_2010.png"RSS is Dead", tech sage Steve Gillmor said in May of 2009. I know that's not true, because I spend a lot of my work and my leisure time reading RSS and other forms of syndicated content feeds.

If you're not familiar with Really Simple Syndication (RSS) - it is, in the simplest of terms, a powerfully simple technology that delivers new content from multiple websites to one single place you've subscribed to RSS feeds from. RSS has not changed the world in the ways its early adherents hoped it would, but it continues to change dramatically the lives of some of us unafraid to play around with it a little. Below are the 10 most exciting RSS and syndication technologies of the past year.

]]> There are a lot of repeat appearances from 2009 and 2008, but there are some new tools, too. Did we miss any thing important or exciting? Any power user tips you'd add?

Flipboard

Selected coverage: Flipboard, New "Social" iPad Magazine will be Powered by Semantic Data

Flipboard is a well-funded iPad app that turns Twitter and other streams of content into a beautiful "customized magazine." Many people have tried to go deep on the visual impact of feed reading on the iPad, but none have embraced the possibilities as gracefully as Flipboard.

You know how I use Flipboard? I read my usual Twitter and Facebook streams through it sometimes, but it's the curated topical Twitter lists that work best on this service. I've got a Twitter list of hundreds of geotechnology pros that serve up incredible topical links. The Twitter list of anthropologists I grabbed from Tlists? What a great magazine they make every Sunday morning!

Web page pre-loading in the background, integrated social media sharing and commenting, video, image collages - the user experience is really hard to beat and it's only getting better. OPML import is the only thing that the 15 of us in the world that like to play with OPML files could ask for more.

Not Dead Yet Factor: Some people have the audacity to complain that this magical creature that turns links to their website into shining, seductive, glossy magazine pages for iPad using readers to slide right down into their websites... is violating their copyrights! That's the dumbest thing I've heard since someone told me that the tens of thousands of readers a Huffington Post link to our site sends are somehow a case of that site stealing from us, too.

Postrank
Selected coverage: How to Build a Social Media Cheat Sheet on Any Topic

RSS overload getting you down? Give Postrank a feed and it will give you back a brighter day. This service, which has been on our best of list every year we've written one about feeds, is invaluable. You plug in a feed and Postrank will score every item in it based on the relative social media engagement that item has seen (comments, inbound links, mentions on Twitter and a lot more). Then, you can subscribe to a filtered feed of just the most-discussed items on any feed.

We use Postrank about 15 different ways here at ReadWriteWeb. It's awesomeness cannot be surpassed. Watch this space, you'll see us use it some more ways in the coming weeks and months.

Not Dead Yet Factor: Postrank's main home page is now a publisher analytics fancy service. If you want to run other peoples' feeds through it, like a sophisticated strategic thinker able to defer immediate gratification for one technology step in exchange for far greater opportunities, then visit http://postrank.com/main.

Notify.me
Selected coverage: Real Time as a Service? Check Out What Notify.me Is Working On

The battlefield of RSS to IM/SMS/email delivery and alert services is littered with bodies - the field of battle between those services and the cold reality of monetization, that is. There are a small number of people who appreciate the delivery of a substantial number of RSS feeds within minutes of their publication, but it's not an insignificant number. It's services like this that keep all the tech blogs you read feeling fresh, readers. Other people in other fields are learning to appreciate them as well.

Notify.me remains alive, despite its own determination to die this summer. The company is now focused on selling advanced services to large, paying customers; it's expensive shooting RSS feeds all over the web by IM and SMS for free.

In July, 2010, the Notify.me team threw up its hands and said it was shutting down its free consumer service. A minor cry for help arose and thankfully, the company changed its mind. It said it was going to start charging people a small amount of money. It doesn't appear to have done so and the messages are still coming.

Let me tell you what a service like this is good for, outside a journalist's immediate interest: I once led a workshop for non-profit organizations where one participant worked in communications at a local women's advocacy organization. In that workshop, we grabbed the RSS feed for the New York Times and we ran it through a filter, filtering for keywords related to the field she worked in. We then took that filtered feed and we put it through Notify.me, setting it for multiple forms of delivery.

The plan, then, was for her to get an SMS whenever the New York Times wrote a story related to women's issues. She could take a look at it, and if appropriate, could call the local newspaper people she knew. "I don't know if you've heard," she could say (they probably hadn't, so soon), "but there's this story breaking on a national level. If you're interested in a local angle, our Executive Director is an excellent source and would be happy to get on the phone with you if you like." The reporter has been looking for something to write about all day and you lay a timely, high-quality interviewee in their lap. Boom! Now repeat a few times and what have you got? You've got an organization that people in your area associate with the issue because you're regularly cited as a source in the local media - because you were the first to know.

Not Dead Yet Factor: It's not dead yet. Someday it probably will be. Another service will have to take its place, or we'll all have to learn how to roll our own.

SuperFeedr
Selected coverage: The Dream Team Quietly Gathering Behind Real-Time Service SuperFeedr

You've got online content and you want it in real time. You want it in different formats. You want it marked up with geolocation data that corresponds to place names occurring in free text. You want it all and you want it for a fair price. What does it mean? Maybe you want SuperFeedr. It's like FeedBurner was for bloggers, but much more developer-focused. The company adds features all the time and founder Julien Genestoux is one of the most agile technologists you'll find online.

Not Dead Yet Factor: Barely born yet, but backed by BetaWorks and Mark Cuban, that's good for something. Plus Genestoux builds features so fast that he'll likely fit whatever need real-time feed geeks find they have, well into the future.

Google Reader
Selected coverage: Facebook Could Become World's Leading News Reader (Sorry Google)

If you read RSS feeds and you know it, you probably use Google Reader. It's ok. It's pretty good, even. It's not that exciting, but it serves a whole lot of people very reliably and capably. It has survived while everyone else has not. This year we saw former market-leader Bloglines and former innovation leader Newsgator Online close up their RSS readers and send everyone to Google Reader instead. Other services use Google Reader as a place to sync up.

Not Dead Yet Factor: Google almost never kills anything, and there have to be a lot of people internally at the company who depend on Google Reader, too. Unless they've all given it up for Twitter.

My6Sense
Selected coverage: My6Sense & The Geek Who Rode His Blog to the Edge of the World

You're on your phone and you want something good to read? They say that small screens lend to high-quality recommendations of well-targeted content - so why would we read Twitter and Facebook?

My6Sense is a mobile RSS reader that syncs with your Google Reader account (all of it, not just the first one thousand feeds like so many imitations!) and then watches how you interact with the items. It knows when you are reading, it knows when you've shared a link. It then offers two views of all your subscriptions: their most recent posts and the My6Sense recommended posts. The service learns from your behavior over time and offers a quality mobile feed reading experience.

Not Dead Yet Factor: It's probably a slow burn, the company is focusing on monetizing a commercial API. That's a good business to be in.

Blekko
Selected coverage: How to Use Blekko to Rock at Your Job

Blekko calls itself a search spam killer but it's got a whole lot more potential for the power user.

Blekko is a platform for collaboratively edited vertical custom search engines. It eats OPML files, among other things, and its outputs include RSS feeds. You want a feed of updates from 10 key medical sites whenever any news about a particular issue is written about? Blekko can do that. You want to track a collection of blogs that cover a particular topic and get a ping when they write about one company, one concept or one keyword across all their blogs? No problem. It's great.

A custom search engine creation service with RSS feeds. That deserves a place on this list.

Not Dead Yet Factor: It just launched. When it launched, I said it was too beautiful to live long, but its CEO has been around the block many times and tells me he knows what he's doing.

Facebook
All RWW coverage of Facebook

Facebook's syndicated updates from friends, families and media organizations are the single most important way that hundreds of millions of people around the world relate to the power of the feed. The company tried to do a lot this year, but it's hard to know how drastic the users' experience will end up being. None the less: Facebook Places alone represents the introduction of a radical new type of knowledge into many peoples' lives (where the people you know are right now) - and it's coming to them by feed.

OStatus
Selected coverage: Run Your Own Twitter Clone: Status.net Launches Public Beta

When you hear about Diaspora, when you hear about Status.net, OStatus is what's under the hood. This open-source amalgamation of communication technology standards is like Twitter for networks that are disconnected, but interoperable. "People on Different Networks Following Each Other" is the OStatus slogan.

What does it mean? Interoperability means social networks compete on features, not control over your friends, because switching costs are removed. You lose nothing if you switch networks.

OStatus didn't take off like a Tweeting rocket ship this year, but it saw some continued growth, development and attention. Someday, maybe someday, the asynchronous micro-messaging that so many of us find so much value in will break out of the clutches of one single company (wonderful as you are, dear Twitter) and become a real communication platform like the telephone. That's probably as crazy as imagining a time when AT&T customers can call Verizon customers though, isn't it?

Not Dead Yet Factor: It's not dead yet.

Dapper
Selected coverage: How Yahoo's Latest Acquisition Stole & Broke My Heart

Point and click on almost any field on almost any Web page and Dapper will give you an RSS URL you can use to subscribe to updates from that field, if and when the content there changes. It sounds like a simple thing, but it's incredibly powerful.

Dapper has been one of my favorite services for years and was joined by Needlebase in the DIY data extraction world that has so much potential.

In recent years, the devil bought Dapper's soul, turning it into a semantic advertising platform in order to monetize its core technology. Then Yahoo bought the whole company this Fall, which will allow the core feed-extraction tool to remain open, at least for a while longer. To use this incredible tool, you've just got to sneak in through the back door at Open.Dapper.net.

Not Dead Yet Factor: It's not dead yet. Maybe more alive than it's been in years, in fact.

Honorable mentions:

Yahoo Pipes - definitely not dead yet. The company released an experimental 2.0 version of this wonderful spaghetti pipes tool for RSS magic this year, but few people noticed and the company itself says its products aren't production ready. YSQL is a better bet, if you're comfortable working with that. If not, well - Pipes isn't dead yet.

Twitter - One of these days! Annotations! Meaningful location as a platform! This year had high hopes for Twitter's technology. The year ended up being about better up-time, a prettier Web site and the company's nascent ad sales efforts.

Ogre translates spatial files into GeoJSON using a command line tool for use in JavaScript Web apps. Awesome. Some people are using this for sure, to set proprietary geodata free. Too few people, though.

OneSpot - This content recommendation engine does a lot of things, but my favorite thing it does is look at any set of feeds you give it and then suggest thousands of other feeds it believes are related. It's easy to curate a few hundred top blogs in any field that way.

That's our list - how does it compare to yours? What's coming down the line that you think might shake things up in RSS and syndication in 2011? Let us know in comments.

]]> Discuss]]>
http://www.readwriteweb.com/archives/top_10_rss_and_syndication_technologies_of_2010.php http://www.readwriteweb.com/archives/top_10_rss_and_syndication_technologies_of_2010.php 2010 in Review Tue, 28 Dec 2010 14:42:41 -0800 Marshall Kirkpatrick
How to Use Blekko to Rock at Your Job blekkologo.jpgThe author of the web's first worm-virus, teamed with a man who dresses as a medieval warrior and goes to battle on the weekends and a woman who follows World of Warcraft, acupuncture and ballet, have raised $24 million dollars to storm the gates of the Google Castle. They got incredible press coverage when their new search engine, called Blekko, launched this week - but they are probably going to get slaughtered.

In the meantime, they have provided an opportunity for countless other freaks and geeks to use the magical tool they've built to grow our stature wherever we work; to cut through information overload, to shine a bright light on opportunities and to augment our minds with the snap of a finger. Read on for my advice about how to use Blekko and we'll use it well - for as long as it lasts.

]]> blekkoskrenatascoble.jpg
Blekko CEO Rich Skrenata, photo by Robert Scoble

What is Blekko?

Blekko was the name of company CEO Rich Skrenta's first networked computer. Skrenta was 15 years old when he wrote the Elk Cloner virus that infected Apple II machines in 1982; it is believed to have been the first large-scale self-spreading personal computer virus ever created. Skrenta went on to work on the Amiga at Commodore, then at Sun Microsystems, then co-founded the Netscape-acquired Dmoz and the Tribune/Gannett/Knight Ridder-acquired local news search engine Topix.

Now he's raised funding from a group of investors that include Marc Andreesen, to build Blekko. His band of freaky geeks include CTO and Society for Creative Anachronisms member Greg Lindahl and community manager Cheralyn Watson, who prioritized being ready for the WoW and ballet communities in the days leading up to Blekko's launch. (I found that quite charming.) The entire Blekko team includes more than 20 people.

What have these people built?

Blekko, simply put, is a social Custom Search Engine creation service with RSS feeds. It lets users curate and subscribe to mini-search engines that return results only from selected websites, thus increasing the signal to noise ratio and tightly controlling the context of search results.

If you've used Google Custom Search, really used it, that very powerful tool has been improved upon in Blekko because the latter was built to search large groups of sites and to have those groups shared and edited.

I used Google Custom Search all day, every day, to query my defined sets of blogs from technology analysts, geolocation specialists, semantic web scientists, youth marketing bloggers, English-language Asian tech bloggers and more, about whatever topics we're writing about here at ReadWriteWeb.

These Custom Search Engines are like dynamic reference books, populated by the blog archives of topical experts. They are things of magic - but Google CSE has never caught on beyond its widespread use as an embedded search box for a single website. That's tragic, if inevitable: the most powerful magic on the web will never be appreciated in its making, only in its results. Those of us who appreciate its making get to make some it ourselves. But more on that in a minute.

Blekko launched this week to big write-ups in the Wall St. Journal, The New York Times, TechCrunch and a flood of other outlets all over Techmeme. Frankly, I think it got all that press because it's backed by Andreeson, makes big claims about improving on Google, talks smack about red-hot content farm Demand Media as part of its pitch, has founders with very credible pedigrees and has good PR. But the general response to the company's product has been confusion and doubt.

As a regular and heavy user of Custom Search Engines, I've got some ideas about why most people are unlikely to appreciate the beauty and power of Blekko. But cynicism is cheap. Strategic advice on how to most effectively use a good tool is far more valuable and interesting.

How to Be a Blekko Power User, For Fun and Profit

blekkoapis.jpg

I'd like to share a few thoughts about how I'm planning on using Blekko, based on the strengths and weaknesses of Google Custom Search. But first some assumptions:

  • I'm going to assume that you work in a field where high-quality information is valuable.

  • I'm going to assume that you can imagine the value of first mover's advantage, when it comes to information.

I believe that everyone works in a field like that, most people just don't know it.

First step, compile yourself a magic collection.

A searchable collection of your industry's top blogs and websites allows you to quickly put any topic in context, based on what the most knowledgeable people in your field have written about that topic in the past.

I've built scores of Custom Search Engines, but the first one I put into production is something I call my Magic Search. It's a collection of 25 of the top Web 2.0 news and review blogs on the web - all our competitors here at ReadWriteWeb.

I do a Magic Search whenever I see a new company or website, to see if and what our competitors have written about it. Last week, for example, someone sent us a link to the incredible website MapCrunch (like StumbleUpon for world-wide Google Street View views, try it!) and I was impressed. I did a Magic Search for it and with a snap of my fingers found that two of our biggest competitors had already written about it 3 weeks prior. (I hate to tell you how the sausage is made, readers, but if any of the leading tech blogs finds out that something has been written about by a competitor 3 weeks ago, then that would mean that we are 3 weeks behind if we write about it afterwords. There is nothing a leading tech blogger hates more than being considered behind the times. So I tweeted about it, and I'm telling you about that wonderful site now.)

If I'm writing about a company and I am being a good blogger who wants to mention competitors in providing the same type of search, I Magic Search it. For example, this week Nick Bilton wrote about the new email snooze-button service Nudgemail at The New York Times and got slammed in comments for failing to mention another service that had been providing the same kind of service for years. I reviewed Nudgemail after Bilton, and I made sure not to make the same mistake. I Magic Searched the neglected competing company and found that it had been written about by my competitors a number of times, along with still other companies providing the same type of service. So I included links to all of them in my write-up and I think it made for a pretty snazzy article.

If I want to know what startups provide a particular kind of service, I Magic Search it. If I want to know what my competitors have written about a person in our sector, I Magic Search them. If I want background on anyone or anything I'm looking at, background drawn from the archives of specialists in my field, I Magic Search it.

It's awesome. I've got it on my phone, I've got it on my browser toolbar and I've got it a couple of other places I won't mention here.

I've built that and my other Custom Search Engines with Google CSE, but Blekko is a very nice alternative. It provides a clean, sharable URL. It can be collaboratively edited. It handles search across hundreds of domains in one collection very nicely. It offers date-based search in a click. Blekko is a better service than Google CSE and I plan on moving all my engines there, as soon as the company offers a way for me to share them privately with my co-workers and no one else. Blekko's management says that's on the development road-map but not available yet.

Quick tips:

  • Build a collection of top blogs on any topic using any of these methods, or others - or just Google for "top blogs in X" and grab someone else's list to turn into a custom search engine.

  • If you find a list on a page - try putting that page through LinkLeecher or some other way to quickly scrape the URLs off the page and into your Custom Search Engine creation tool. Sometimes I'll open a text editor next to a browser page, drag the URLs onto the text editor, then find and replace all instances of "http" with "[linebreak] http".

  • If you've got a collection of the top blogs on any topic built, there are any number of things you can do with that, if you're creative.

Next step: Building Collections on the fly.
Content on the public web is searchable and the savvy searcher will manipulate that content to search it the way they want it.

Both Blekko and Google CSE are well-suited for carefully curated collections. Blekko in particular encourages the creation and maintenance of high-quality custom search engines with its easy collaborative editing, subscription to updates and more.

Sometimes you've got to do it quick and dirty, though.

Here's a story. This Spring I was hired to be one of the official event bloggers at a crazy conference called Techonomy. Bizarrely creative and accomplished people from all around the world came to speak at the event about how to use technology to solve the world's great problems (water, food, poverty, pollution, etc.)

The night before the event, a co-worker and I scraped the names and associated URLs from the pages listing the speakers at the event. I then threw those URLs into a Custom Search Engine that I used to do super-fast research while blogging on the conference floor. "Speaker X says A,B and C. That's something that Organization Y, also participating at the event today, has posted the following white papers about in their website archives." That worked really well.

While I was at it, I also put the names and URLs into a .csv file, uploaded them into Amazon's Mechanical Turk and paid $50 to have people around the world collect the Twitter handles, country-of origin and gender for each of the several hundred event participants. It took them about 90 minutes. I then created Twitter lists of all the participants, the women participants and the international participants, in part so I could put all three lists into Tweetdeck and make sure that as I was covering the event, I could track what people with different backgrounds were saying about the day. I also envisioned those lists being loaded into Flipboard, so you could have an iPad magazine made up of all the links shared by women who had attended the Techonomy conference, for example.

A few years prior, I did the same thing (and more) with companies launching at the DEMO conference, so whenever I was thinking about one, I could quickly search the websites of all the others and determine who else was working in the same sector.

Last week my mother emailed me to ask if I knew the educational technology speaker visiting her school that day. I didn't, but it took about 5 minutes to create an EdTech CSE and find what some of the most respected edtech bloggers online had said about the man. He and my mom ended up having lunch together and talking about the internet, which was pretty cool.

These on-the-fly CSEs are particularly well-suited to Blekko because anyone can then suggest revisions later. More importantly: it's a social community where you may find that someone else has already built the engine you're looking for and you can just grab it and go.

Quick Tips:

  • I've trained myself to see a list of anything and think CSE. If you've got, or want to create, folders in an RSS reader - it's relatively simple to export your subscriptions, open the OPML file in a text editor, see how simple it is and delete all the subscriptions except for the topic folder in question. Resave it, then you can upload the new .OPML file into Blekko. That's a very nice feature, OPML import.

  • The more you think in terms of URLs, the content they hold, the other URLs they are linked-to and the feeds that are associated with them - the more permutations for searching you can think of. The sky is the limit and it's a very rational system.

Finally, CSEs plus RSS equals a key to any inner sanctum. RSS feeds have been my bread and butter since becoming a professional blogger, and they are really what captured my interest more than anything when I first discovered social media. When I was just getting started (ok, sometimes I still do this), I would take the RSS feed of one or more blogs I really admired and I'd run them through an RSS to Instant Messaging service. These days, my favorite is Notify.me.

I'd get an IM within minutes of any post going up on a blog I really admired, and if I could make a meaningful value-add to the conversation, I'd jump right over and be the first person to leave a comment on the post. I'd do that just often enough to get their attention but not often enough to be annoying. (But where do you find interesting things to add to the conversation? If only there was some way to quickly search for keywords and tidbits in the archives of a collection of topical experts in any field!)

I used to do a whole lot of hitch-hiking and developed the ability to listen and converse with just about anyone, regarding just about anything. Participating in blog conversations is a much safer way to do that.

So - let's say you've got a good collection of topical expert blogs built up and you've got it put into Blekko so you can search against their archives. Guess what? Blekko offers RSS feeds for your search queries! That's awesome.

Take what parts of this you will, but I'll tell you what I'm going to do: I'm going to make sure that all the hundreds of geolocation experts who blog online know that I'm interested in geofencing and history because I'm going to subscribe to a feed for those two search terms in my big Blekko search engine of geolocation blogs. I'm going to read everything they write on the topic and I'll probably leave a comment whenever I can do so appropriately. Thank you, Blekko.

Google Custom Search does not offer RSS feeds. It has to be given a cookie and told to roll over just to get it to try to show you something close to the most recent search results inside your collection.

What Does it Mean?

These are just the ways that I intend to use Blekko, based on my experience using Google Custom Search and the ways that the two services are different. Maybe you've got other strategies in mind.

But here's how I see the three strategies above:

This is how I will assemble a magic reference book, a Dictionary, Encyclopedia and Atlas of the Past, Present and Future of...any topic I want.

This is how I will see a field, or a group of people and companies described online, and with a quick scrape be able to query deep into their online histories. Imagine looking at a set of people's names online, using find and replace to build a set of Google Social Graph query URLs to discover their blogs, bookmarks, Twitter accounts and more, then putting those URLs into Blekko. If that works, that's like 5 minutes of set-up to enable yourself to ask "who in this room has ever bookmarked or tweeted a web page about topics X, Y or Z?" Hello dreamy social search engine!

This is how I will hone in on conversations in the future, among any set of experts, on any topics of interest to me. That's going to be very high signal and very low noise. That's like pulling a sliver of knowledge from the future and putting it straight into your brain - then sliding down it to arrive exactly where the conversations emerge, so that you can participate in them.

Will Blekko Succeed?

Clearly, only the ten people still reading this article and I are ever going to do the above - and we're going to have to click on a whole lot of ads to keep this puppy in business!

The rest of the world is unlikely to build Blekko-collections of their own because the cognitive overhead of curation is of a fundamentally different quantity and quality than the simpler requirements of consumption. People have compared Blekko to the recently launched Cuil because of the hype, but better comparisons could be made to the valiant but failed roll-your-own search engines like Rollyo, Swicki and Hittery. Not enough people with good intentions (not spammers) were willing to put the time in to use these tools well.

Blekko will attempt to compensate for this by automatically serving up search results from hand-curated custom search engines on big popular topics. That will help direct searchers to the Mayo Clinic's webpage for "cure for the common cold" instead of Demand Media's eHow page on the subject, which is of clear commercial intent and questionable worth if you ask the Blekko team. Mayo Clinic doesn't have SEO people on staff, but Demand Media does, Blekko points out - so they built a health "slashtag" that includes Mayo's site but excludes Demand sites. Health search queries are performed inside that slashtag by default.

Will that matter, though? Will millions of people rush with open arms into the unfamiliar, even if it's clearly superior upon the slightest investigation? Will people choose what's best for them? Will they choose what's most powerful and beautiful, instead of that which is easy and must remind itself constantly to "Not be Evil?"

I would submit that if people were likely to make such choices, the world would be a very different place than it is today.

But in the meantime, fellow benevolent hackers, fellow dreamers of antiquity who find ourselves on the web, fellow fans of ballet, acupuncture and other romantic, unorthodox pursuits --- when we find ourselves with a need to search, in a cruel and banal world, may we use tools like Blekko. May we use them to dive deep into the archival knowledge of the finest writers and thinkers in our respective fields. To survey quickly what any set of people online have said before. And to track what our most trusted sources say regarding any topic, anytime in the future.

Even if no one else understands. They don't all need to understand, because in the hyper-linked, flash-curated, streaming social world of the future - those of us who know how to dance among the links and the fleeting, beautiful startups that overestimate most of humanity's capacity for intellectual engagement before crumbling into craven ad networks just to keep the lights on - those of us who can do the power user's dance in those circumstances will know that our positions are secure and that the web will be ours to create on. And we'll treasure beautiful things like Blekko, whether the rest of the world does or not.

]]> Discuss]]>
http://www.readwriteweb.com/archives/how_to_use_blekko_to_rock_at_your_job.php http://www.readwriteweb.com/archives/how_to_use_blekko_to_rock_at_your_job.php How To Sat, 06 Nov 2010 10:00:26 -0800 Marshall Kirkpatrick
After Cuil, Blekko Will Be More Careful - But Does It Matter? My first post for ReadWriteWeb, just over 1 year ago, started with the premise that search was “game over”, that Google had won and the only space left was (re)search - what users do after the basic search.

None of the search start-ups since then has made me change my mind. None of the cool new user interface features or ways of expressing your search intentions matter one iota, if the core search proposition is not better from day one. Well, enter the latest contender: Blekko.

]]> When Google launched, 10 years ago in 1998, there was no “new paradigm” or wizzy features - just a search box that worked better than the competition. The search competition bar is now way, way higher than it was back then.

Yet new search start-ups continue to get funded, even in what is a less frothy funding environment. Cuil(l) raised $33m. Looks like they blew it. In contrast, Blekko raised only $5m in two rounds. It is still in stealth mode and one assumes they'll will play the hype game a bit more cautiously after the Cuil debacle.

The proposition that launched countless search start-ups was “if we can get just 1% of the search market we will have a very valuable business“. That maybe true, but getting 1% has proved elusive. The reality is you either win big or fail totally in this game. There are no hedged positions in search. It is a really “non-trivial” technical problem.

Assuming the game is defined as the search infrastructure game. I think that game has been over for some time. The barriers to entry are just too high. An entrepreneur pitching VC now has to answer the “how do you avoid the Cuil problem?” Yahoo BOSS is the perfect play in the new game, with search infrastructure players offering their platform to developers. Hundreds of start-ups can make a decent business within less than 1% of the search market if the infrastructure is provided by somebody else. You don’t build operating systems, do you? You don’t build search infrastructure, do you?

You don’t, unless you believe that you really have disruptive technology. Blekko is one the few remaining plays building search infrastructure. They must think they have that disruptive technology.

Blekko seem to understand the complexity of the challenge, from comments on their Blog (as their site says nothing, their founder’s Blog is best source of insight into what they are up to):

“Search is an absolutely fascinating problem to work on for a bunch of reasons. For one thing you have to scale the thing before getting the first user. You can’t just start with a server or two and add more when the users come. Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..

The componentry is remarkably deep.

Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale…”

His Blog is well worth a thorough read if you are in the search game or just like hard technical problems. (As a historical footnote, Skrenta’s notes on Cuil, written well before the launch, make interesting reading).

Later on he says:

“you don’t need a million servers and half of the phd’s in the field to build a search app. It takes 20 people and $5M of hardware…if you know what you’re doing.”

I totally buy the “It takes 20 people” people bit. All my experience in software has confirmed that Frederick Brooks was totally right in the Mythical Man Month that small teams always outperform large teams. I cannot imagine what more than 20 people would do other than get in each other’s way.

Its the “you don’t need a million servers” bit that I am less certain about. Google invests $ billions in server farms. You have to have something fundamentally and totally disruptive. P2P enabled Skype to take on AT&T and Verizon. That was fundamentally and totally disruptive technology that enables such a compelling value proposition that they got millions of consumers using them. That is why I was excited to see Faroo attempt this with P2P, but I can see that they fail at the critical “has to be better than Google the day it launches” test.

Purely incremental improvements to the economics of crawling + indexing will not enable a new consumer search play. Saying “we only need $1 billion in infrastructure cost to compete out of the gate with Google and Google spent $3 billion” does not cut it with investors. Nobody will fund that $1 billion. However, incremental improvement is a great pitch to the big infrastructure players. If you can say “I can take 20% out of your infrastructure costs with my patented technology”, you will get your phone calls returned by Google, Yahoo and Microsoft. And one of them may offer to buy you for a big fat premium to prevent their rivals getting access to the same technology.

That is very, very different from launching a new consumer search engine.

In summary, I see 3 possible search plays in search today:

1. Build search applications on top of Yahoo BOSS or equivalent offerings from Google or Microsoft. There is room for hundreds of niche, vertical start-ups, using search as a feature not as the only proposition. I think Yahoo has a great shot at this as Google will suffer from cannibalization fears, so they won’t open up as much as Yahoo. Microsoft will undoubtedly play here as well, they are best at technology for developers.

2. Hard core search infrastructure technology sold to Google, Yahoo and Microsoft. That’s tough to get right as the technology has to be really, really good, the patent has to be rock solid and you have be good at playing poker with the big guys.

3. The totally disruptive Skype style venture that nobody has heard about.

]]> Discuss]]>
http://www.readwriteweb.com/archives/after_cuil_blekko_will_be_more_careful.php http://www.readwriteweb.com/archives/after_cuil_blekko_will_be_more_careful.php Analysis Wed, 30 Jul 2008 22:30:00 -0800 Bernard Lunn
11 Search Trends That May Disrupt Google My first post for ReadWriteWeb (nearly a year ago) started with the premise that search was "game over", that Google had won and the only opportunity left was (re)search - i.e. what one does after the basic search. Unfortunately, none of the search start-ups since then has made a dent in Google's relentless march towards search market dominance. In this article, we outline 11 search trends that may change that.

]]> The proposition that launched countless search start-ups was: "If we can get just 1% of the search market, we will have a very valuable business". That may be true, but getting 1% has proved elusive. It has been an all or nothing game.

That may be about to change.

It is possible that Google will not be beaten by one big competitor. It is possible that they will be pecked at by thousands of tiny start-ups using a new outsourced infrastructure.

But before getting to that punchline, here is my 11 point recap of the search market:

1. Disambiguation is (still) not enough motivation to switch. All those learned PhDs with backgrounds in natural language search and AI explaining that the words "paris" and "apple" have multiple meanings that Google cannot parse from a single search, massively miss the point. The average user has figured that out and either enters multiple words or refines the search based on the first search. Using natural language search - which is complex to code and expensive to process - is a classic "hammer to crack a nut" solution.

2. Webmaster push-back and basic economics will accelerate the trend towards an outsourced crawler market. Webmasters won't accept a proliferation of crawlers as some of them maybe malicious and all of them impact performance to some degree. Google Yahoo Microsoft (GYM) will always be accepted as they drive enough SEO, but marginal crawlers will struggle. Basic economics mean that only a very small number of players will be able to afford the giant server farms needed to index the whole Web. The YM parts of GYM (as well as Amazon) will increasingly offer their infrastructure to anybody who can build value on top.

3. Yahoo Search Monkey may have arisen from desperation, but we may also be witnessing a "Linus moment". SearchMonkey is the most well-defined entry into the outsourced crawler market. It comes from their recognition that it is too late to beat Google in a head to head battle, so it could be dismissed as a sign of desperation. However I prefer to see it as a "Linus moment", that point in time when Linus Torvalds simply said "here is what I have done so far, anybody who can take it to the next step is welcome to try". To be truly disruptive, Yahoo may need to open this up even more than they have to date.

4. There will be many more attempts to monetize Wikipedia. Well-funded search ventures such as Powerset have retreated to the much narrower goal of searching Wikipedkia. Freebase also uses Wikipedia as the their core data. Walking around the RPI Web Science Research Initiative, I could see many interesting R&D experiments coming out of Academia all of which used Wikipedia as a base. Wikipedia has just enough structure and normalization to be useful. Above all, the History feature makes "data provenance" possible and that is critical for trust.

5. Core search is still getting funded. This is not what one would expect in what is by any definition a consolidated market with one mighty big gorilla sitting on top. Look at Blekko getting $2m without even a prototype to show the world. Are the investor's nuts? Possibly, but they include some pretty smart guys like Marc Andreessen and the founder Rich Skrenta is clearly a smart guy (his Blog is a good read). Or look at Cuill, which got $25m as recently as April. Maybe they are idealists tilting at windmills. Maybe they know something that the rest of us don't. Only time will tell. These new entrants will eschew any hype, which they know has not one single point of value in adoption.

6. Image search is another "hammer to crack a nut". Searching images, video and audio is one of those "non-trivial" computer science projects that great engineers love to tackle. However great investors should steer clear. It is hard to code and incredibly expensive to process. The competition is tagging (see next point) which is classic "just good enough and improving all the time at virtually no cost" that is impossible to beat.

7. Tagging is quietly but massively disruptive. The fact that thousands of webmasters and bloggers tag their content so that they can be found by Google is Google's secret weapon. But it could get turned against them. A small incentive to be found by other search engines will change tagging behavior. This is likely to play out in lots of vertical niches, where a small change in tagging behavior can make a huge difference in findability and that can make a big difference to both buyers and sellers. Whether people use RDF or Microformats or some other defacto vertical standard will continue to be the subject of much debate, but the format itself is not the issue. The human drive to tag (to order one's world) is deep and strong and has financial motivations as well.

8. Whitelist is a good way to kill spam. Spam is the big problem for search as well as email and whitelists work well for both. In search this is done by a site that uses something like Google Custom Search Engine (or Search Monkey) to define what sites to search within a defined domain. Even if that means defining 1,000 sites and adding new ones every day, that is well within the range that a single human curator can do within a single market domain. The human curator deletes any spam sites manually.

9. P2P search could still be a long-term disrupter and Microsoft's route back to relevance. The only way to do search without putting all the Web's pages into one server farm is via P2P. I have written about Faroo's attempt here. It relies on .Net and this maybe Microsoft's card to play but only if Vista gets real traction. This is a real long shot, but an intriguing one.

10. There is tons of great data inside relational databases that is quite easy to search. It is the HTML layer that is getting in the way. As more sites learn how to expose their structured, relational databases as Web Services APIs, a lot more data will be available that does not rely on word search on HTML pages.

11. It's the Adwords, stupid! All the search wizardry don't matter a hoot if the monetization is not done right. There is plenty of motivation out there. Sellers want cheaper search words to buy. Publishers want a bigger piece of the cake. Buyers/searchers may even want cash back (we will see if Microsoft's crude tactic, lambasted in the Blogosphere, makes it in the real world).

Conclusion

Most of these trends point in the direction of search as infrastructure feeding thousands of innovators in niche markets - a long tail approach, in other words. Google will play in this infrastructure game - they already do with Google Custom Search - but it is vendors such as Yahoo, Microsoft and Amazon with equally deep pockets and much more to lose from total Google dominance, who will be the disrupting innovators in this next phase of the search market.

Image credit: davemc500hats

]]> Discuss]]>
http://www.readwriteweb.com/archives/11_search_trends.php http://www.readwriteweb.com/archives/11_search_trends.php Search Mon, 16 Jun 2008 14:45:57 -0800 Bernard Lunn