Metaweb - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/Metaweb en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 12:45:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Big Data Giant Joins InfoChimps to Save the World's Structured Information KurtInfochimpspic.jpgSometimes highly accomplished people just have to join crazy little startups. It's always exciting to see what happens when they do. Data scientist Kurt Bollacker is one of those people; he's decided to join Austin-based bulk data marketplace startup Infochimps, one of the most interesting little companies we regularly write about here.

Bollacker's history is intense. He helped build one of the first search engines online for academic research papers, the first prototype for the Internet Archive's Wayback Machine where he was the Technical Director, he was a biomedical research engineer at the Duke University Medical Center, did research on long term digital archiving as the Digital Research Director at the Long Now Foundation and was the Chief Scientist at Metaweb, the massively ambitious semantic web project that Google acquired in the Summer of 2010. Those are some of the weightiest data projects in the Internet's young history; now he's joined InfoChimps. "The project that is Infochimps is in it for the long haul," Bollacker told ReadWriteWeb. "We're going to make something of lasting value. That's something I can buy into."

]]> InfoChimps is a small startup that provides infrastructure for people to buy and sell large sets of data. We first wrote in-depth about the company when it made a controversial move of putting 1 billion data points from months of the Twitter firehose up for sale. Twitter's legal department quickly took the edge off of what the marketplace was able to offer its customers, but its splash was made and the web suddenly knew about InfoChimps.

InfoChimps offers a wide variety of types of data, however. Among its most popular sets, the company says, is a complete downloadable set of Major League Baseball data concerning every trade, drafting, free agency and other player transaction since 1873. You can also download the raw survey data used for the Zogby International book What Arabs Think, for $999.00.

Revealing the hidden laws and processes underlying societies constitutes the most pressing scientific grand challenge of our century. That may or may not be overstated, but the point is: data is essential in order for us to develop the full extent of self-awareness that science can offer.
Who cares about raw data? Data scientists do, of course, but there's ample reason for the rest of us to as well. Our big picture interest was well articulated by Dr Dirk Helbing of the Swiss Federal Institute of Technology, who is leading an effort to build what's being called the Living Earth Simulator (LES), a giant simulation of as many of the earth's natural and social problems as can be simulated at once. His project is big data analysis taken to one of its most extreme conclusions.
"Many problems we have today - including social and economic instabilities, wars, disease spreading - are related to human behavior, but there is apparently a serious lack of understanding regarding how society and the economy work. Revealing the hidden laws and processes underlying societies constitutes the most pressing scientific grand challenge of our century."

Revealing the hidden laws and processes underlying societies constitutes the most pressing scientific grand challenge of our century. That may or may not be overstated, but the point is: data is essential in order for us to develop the full extent of self-awareness that science can offer.

Metaweb

Metaweb, where Bollacker was Chief Scientist, was a company best-known for its product Freebase, which it describes as An entity graph of people, places and things, built by a community that loves open data. Founded by Danny Hillis, a computer scientist whose name is usually said in hushed tones, Metaweb raised nearly $60 million to build its giant structured semantic graph.

Metaweb was acquired this Summer for an undisclosed sum and parts of the Freebase technology have turned into Google Refine, "a power tool for working with messy data."

"At large scale there are classes of applications you can build that you can't do with 50 items in a data set, but with 50 million or 50 billion items," Bollacker explains. "Statistics, searches to find patterns, etc.

"I have no illusions that in 20 years, Google will still be paying to keep Freebase online as a service. I have an interest in making sure these bulk data sets stay alive. I think Infochimps has part of a model that could help that happen.

"One of the things I've learned is data that is loved tends to survive. I think the Freebase data is underloved. I think we can build extracts out of Freebase. They publish regular dumps. We're going to grab sections of those dumps, make them better indexed, better labeled and better described."

Bollacker received a Ph.D. in Computer Engineering from The University of Texas at Austin and it was in his trips back to Austin that he met Infochimps CTO Flip Kromer, a Cornel educated Mechanical Engineer, University of Texas physics education specialist and super-geek.

"The knowledge and experience is a huge known quantity," Kromer says of Bollacker's joining the company. "I got into this to build out the open data part of it. The best way to build the open data commons for the world is to do it within the context of a mixed open and commercial thing that makes everybody smarter. We're building out the commercial part, that's what we have to focus on. With Kurt on board, I have no fears that we're ever going to lose our soul. We won't lose sight over the central mission of making everybody smarter."

]]> Discuss]]>
http://www.readwriteweb.com/archives/data_giant_climbs_aboard_at_infochimps.php http://www.readwriteweb.com/archives/data_giant_climbs_aboard_at_infochimps.php Data Services Mon, 03 Jan 2011 18:48:12 -0800 Marshall Kirkpatrick
Google Makes Major Semantic Web Play, Acquires Freebase Operators Metaweb googlemetaweb_jul10.jpgThe Semantic Web is all about structuring data so that humans and computers can more easily interpret the Web and discover relevant data for a wide variety of purposes. Google, a company built on the ability to advertise based on contextual data, announced today a major acquisition in the Semantic Web space. As of today, Metaweb, maker of Freebase and a leader in the Semantic Web, has joined forces with Google.

]]> ReadWriteWeb's Guide to The Semantic Web:
  1. Semantic Web Adoption by Facebook, Best Buy & Others
  2. It's All Semantics: Open Data, Linked Data & The Semantic Web
  3. The State of Linked Data in 2010
  4. Top 10 Semantic Web Products of 2009
  5. ReadWriteWeb Interview With Tim Berners-Lee

Freebase is a massive open-structured database of information about almost anything, including books, movies and music. In fact, Google already has a relationship with Freebase, pulling in its information to provide intelligent search results within Google News. With the acquisition of Metaweb, Google can now leverage the company's tools and data even further, especially within basic Web search results.

"This is a huge win for the Semantic Web," Alex Iskold, founder and CEO of AdaptiveBlue, the semantic technology company behind GetGlue.com (and occasional ReadWriteWeb contributor), told us. "It could not be bigger, because really, we had the biggest company on the Web buy the biggest player in the Semantic Web space."

Google already provides some smart search results, including basic math, sports scores and birthdays of public figures, to name a few. For the most part, however, Google merely serves up links to Web pages; knowing more about what is behind those links could allow the search giant to provide better, more contextual results. To get a better idea of how that could happen, have a look at the video below.

Microsoft made a similar purchase when it acquired Powerset two years ago. Since then, Bing has bested Google in terms of providing smart search results, and has been nibbling at its market share for search. In an effort to keep Bing from eating its semantic lunch, Google is taking Metaweb's technology and data under its wing.

freebase_jul10.jpg"What about [colleges on the West Coast with tuition under $30,000] or [actors over 40 who have won at least one Oscar]? These are hard questions, and we've acquired Metaweb because we believe working together we'll be able to provide better answers," writes Jack Menzel, Google's director of product management.

Metaweb says that Freebase will remain free and open as always, and will be improved upon due to the Google acquisition. The service's quarterly downloadable data dumps will now be served up weekly, and the company hopes the acquisition will encourage more companies to contribute to Freebase.

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_buys_semantic_web_database_metaweb.php http://www.readwriteweb.com/archives/google_buys_semantic_web_database_metaweb.php Semantic Web Fri, 16 Jul 2010 12:32:00 -0800 Chris Cameron
Metaweb's Freebase Now 60% Larger Than English Wikipedia Wikipedia is an incredible monument to human creativity and collaboration, but as one era of innovation passes into another - semantic web advocates want to augment the huge human input into the web with machine learning. The semantically enriched common database Freebase announced today that it will soon reach the milestone of 4 million topics added to its collection. That's 60% more than English Wikipedia's 2,445,041 articles and almost half the size of Wikipedia's full 10 million articles in 250 different languages.

What is Freebase? It's a database of information that's organized by people and machines and is particularly well suited for machine reading. You're not a machine - so why should you care? Read on.

]]> freebasepic2.jpg

What You Can Do With Freebase

Semantic web expert and RWW contributor Alex Iskold spelled out the value of Freebase in great detail here in May. The long and short of it though is that Freebase learns fast through a combination of automated information harvesting and machine and human organization. It collects information from sources like Wikipedia and MusicBrainz and from user uploads and edits.

Programmatic access to that now structured data allows all kinds of mashups to be built that "know things." Check out, for example:

  • Taught or Not - a cute little game that tests your knowledge of who influenced who throughout the history of thinkers.

  • Shot or Not - another game that tests your knowledge of the causes of death of various famous people throughout history.

  • Random Walk Through Influences - a little app that displays the chain of historical influence around any artist whose name you enter.

  • Pull Quotes - If you have any interest in politics, check this out - it's awesome!

  • Powerset - the Natural Language search engine acquired by Microsoft last week uses Freebase, too.

Seriously, Though

Obviously most of these are relatively frivolous use cases. Are there serious powerful use cases for Freebase yet? We're not entirely sure. There are big gaps in the data, which is understandable, but the interface is so much harder to use than Wikipedia's that there's reason to be concerned about expectations of substantial human editing. The interface was much improved this summer and is now far more usable, but it's still harder than it needs to be.

We've certainly got our questions about Freebase, but we're excited about what Metaweb is doing with it. They are smart, well funded and aiming high. The community there deserves congratulations on growing to 4 million reusable articles, something that the the celebrated English Wikipedia community can only aspire to.

]]> Discuss]]>
http://www.readwriteweb.com/archives/metawebs_freebase_now_60_large.php http://www.readwriteweb.com/archives/metawebs_freebase_now_60_large.php Semantic Web Mon, 07 Jul 2008 14:00:40 -0800 Marshall Kirkpatrick