Blogcosm is a new company aiming to build a directory of the blogosphere. From the mundane to the esoteric, the company wants to provide users with a rich data set about any particular blog of interest or the vertical market it is in.
I met founder Scott Lawton, an old time geek from Massachusetts, last night at the first annual Blog World Expo in Las Vegas. Blogcosm built a blog directory of all the speakers at Blog World Expo and the blogs they write for, as a case study. Lawton is a data quality algorithm expert who says his involvement in the web 2.0 scene predates Dave Winer's creation of Radio Weblogs. He started writing scripting utilities professionally for the Mac in 1993. He is nerdy and charming, if you like nerdy innovative types.
The Blog World Expo in Vegas leaves no doubt that blogging is an emerging powerhouse of an industry. Lead by professional trade-show organizer Rick Calvert, the event is now expected to have 2000 attendees or more. Two hundred tickets were sold yesterday alone. WordPress founder Matt Mullenweg keynoted this morning, TechCrunch's Michael Arrington will speak tomorrow. I spoke twice yesterday and the energy here is high.
If you check out the Blogcosm page on the speakers at the expo you'll see that the company so far is pulling in data from Technorati, Alexa and a hanful of other sources. The self-funded project is muscling its way through indexing the blogosphere manually. It aims to go well beyond tech blogs and wants to pull data from a long list of available APIs - from Compete to Del.icio.us. The goal is to offer a useful entry about any blog you look up and information about categories of blogs that no one is capturing today.
On the far end of the complexity spectrum, Lawton says he is experimenting with an algorithm that estimates the monetization of any given blog. Looking at the ad technology employed, probable CPM for a vertial and the estimated traffic of a blog, he says he hopes to be able to provide a rough estimate of how much money any blog is making automatically.
Lawton told me that for now he can answer simple questions, but that those provide valuable information as well. There are no parenting blogs in the Technorati 100, he told me for example. That's interesting information. The ability to draw from a standardized taxonomy to discern who the leaders are in any blogging vertical is something that no one automates. As a data quality technical guy, Lawton says the software on the back end of his four person team should enable information parsing that Technorati, Techmeme and other sites just can't perform. That software, though, will ultimately be assisted with intelligent human intervention.
"Data quality issues in the Technorati 100 are appalling, even today," Lawton told me. "I think the world can afford to have someone look at a list like that before they publish it. Before there is a Blogcosm 100 it will face human judgment. Is a site a blog? Is it on the list for reasons that are correct or because of errors in the algorithm? What is it about? We think there are business models around answering those questions through a combination of automation and human editorial review."
The site is ugly and bare bones today. The potential, though, is significant. As you can imagine, Lawton watches computer scientist Gabe River's blog tracking service Techmeme closely as well. "Techmeme is just a pale shadow of what it could be," he told me. "For Gabe's sake I hope it's him that builds what Techmeme could be. If it's not him, it could be us."
As a person who makes his living engaging with sites like Techmeme and Technorati - I am excited to see Blogcosm build its business around offering a high quality, structured dataset concerning the blogosphere.

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1759
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
I for one welcome Blogcosm. I feel like other than techie type stuff which is covered by Techmeme or the other blogs, there are no good news source aggregators for other topics. Like high heels for example, =).
Posted by: Heelcandy | November 8, 2007 12:57 PM
It's a truly ugly website. I think that limits how reputable they appear to be...
Posted by: Justin Kistner | November 8, 2007 1:03 PM
Heelcandy, I agree - every vertical needs analytics and this could be a good source once the full service comes to market.
Justin, you should give these folks a break - they are data nerds from back east.
Posted by: Marshall Kirkpatrick | November 8, 2007 1:06 PM
I certainly advocate an advanced system of tracking / analysing the blogosphere that works as expected. It is a definite shame that Technorati has fallen short of what is needed in that area, they had momentum once-upon-a-time.
If Lawton and the folks need a new designer - I'd be more than happy to talk, too :-)
Posted by: Matt Harwood | November 8, 2007 2:31 PM
The first impression is not good. The site looks ugly, in a web 1.9 sort of way, and is very slow.
Posted by: Andrew | November 8, 2007 2:55 PM
Marshall, this is your greatest post ever. It would be leading Techmeme if Techmeme were any good.
Posted by: Gabe | November 8, 2007 3:06 PM
Given how Technorati collapsed, not keeping up with the infrastructure requirement of indexing the blogosphere, good luck to Blogcosm, and his entire inventory of 80 blogs ;-)
Posted by: Zoli Erdos | November 8, 2007 4:52 PM
Cynicism may be appropriate but the vision here is a truly interesting one.
Posted by: Marshall Kirkpatrick | November 9, 2007 1:08 AM
Marshall: I enjoyed our chat; thanks much for the post. Perhaps I spoke a bit too much about possible futures... There are a few items where I may not have been as clear as I could have been. Our target isn't really "any particular blog of interest", instead we focus on the top blogs. That's how we can afford to provide a rich data set. There are plenty of sites that attempt to catalog every blog; we'll leave that business to them.
Justin: Thanks for the feedback. Maybe I should have added the dreaded "beta" label? We launched in August and have been gradually adding more data and features. The "look" is lower priority, but will get attention at some point.
Andrew: the site is still running on modest infrastructure and isn't as fast as we'd like -- but I haven't heard any other complaints of "very slow". If you visit again and it doesn't seem any faster, please send details.
Gabe: I'm glad I ran into you at the Expo party. For anyone who looks beyond Marshall's provocative title, I think the post itself was clear that Blogcosm isn't a meme tracker or news aggregator.
Zoli: Blogcosm isn't a blog search engine and has no plans to track 70M blogs -- or even 1 million. We're not trying to compete against Technorati "on their own turf"; we're doing something different. The story of why there are only 80 blogs on the page you found is actually somewhat interesting -- but I'll save that for another time.
Posted by: Scott Lawton (Blogcosm) | November 9, 2007 1:18 AM
This is a NEW site!? How are they going to challenge Technorati? An animated gif duel?!
When your homepage is 90% Google AdWords and 10% content, you've got problems.
I can't tell if it's an ad or a site menu (an advertisers dream).
/rant
Posted by: dan | November 9, 2007 5:03 AM
dan: yes, we should expand the home page. Meanwhile, please click on something in the sidebar menu, e.g. profiles, categories, people. Most pages on the site have lots of content AND no more ads than (say) ReadWriteWeb.
Posted by: Scott Lawton (Blogcosm) | November 12, 2007 2:22 PM