Wikipedia is an incredible monument to human creativity and collaboration, but as one era of innovation passes into another - semantic web advocates want to augment the huge human input into the web with machine learning. The semantically enriched common database Freebase announced today that it will soon reach the milestone of 4 million topics added to its collection. That's 60% more than English Wikipedia's 2,445,041 articles and almost half the size of Wikipedia's full 10 million articles in 250 different languages.
What is Freebase? It's a database of information that's organized by people and machines and is particularly well suited for machine reading. You're not a machine - so why should you care? Read on.

Semantic web expert and RWW contributor Alex Iskold spelled out the value of Freebase in great detail here in May. The long and short of it though is that Freebase learns fast through a combination of automated information harvesting and machine and human organization. It collects information from sources like Wikipedia and MusicBrainz and from user uploads and edits.
Programmatic access to that now structured data allows all kinds of mashups to be built that "know things." Check out, for example:
Obviously most of these are relatively frivolous use cases. Are there serious powerful use cases for Freebase yet? We're not entirely sure. There are big gaps in the data, which is understandable, but the interface is so much harder to use than Wikipedia's that there's reason to be concerned about expectations of substantial human editing. The interface was much improved this summer and is now far more usable, but it's still harder than it needs to be.
We've certainly got our questions about Freebase, but we're excited about what Metaweb is doing with it. They are smart, well funded and aiming high. The community there deserves congratulations on growing to 4 million reusable articles, something that the the celebrated English Wikipedia community can only aspire to.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Also note that there is an additional 4.1m musical tracks in the system:
http://www.freebase.com/view/music/track
These are all distinct, reference-able, entities with connections to the relevant albums and artists, but don't count towards the advertised 3.95 m "items" number because they distort it so much.
Posted by: Michael | July 7, 2008 2:27 PM
thanks again marshall, this is going to make for an amazing set of features for our project !!!!
Posted by: Srini Kumar | July 7, 2008 2:59 PM
There are a number of more "serious" applications that might interest you. Perhaps the best known is Powerset though you might also be interested in Archiportal (Google Maps mashup about architects and their buildings), Thinkbase (general-purpose visualiser for Freebase data), or Dipity Timelines (a facebook app for building and sharing timelines).
Additionally, Jonathan Lowe presented some interesting stuff about Freebase and geodata at Where 2.0 a month or so back, and Toby Segaran has been demoing a mashup that lets you learn about American industry segments, corporations, news coverage, political donations -- all tied together through Freebase data. Unfortunately that one's not publicly available yet (I think he's working out some browser portability issues), but it was presented at Web 2.0 Expo and the recent Freebase user group meeting. A glimpse of what he's been doing can be seen in this visualisation of corporate America, though.
Posted by: Kirrily Robert | July 7, 2008 4:04 PM
It's a rather pointless count, as nearly everything in Freebase is a stub (nearly a blank entry).
Posted by: Yakzoo | July 7, 2008 4:14 PM
I believe wiki has set a pace in this kind of information but any new ideas are welcomed
I made this site
http://www.aguadecalidad.com
to try to emulate wikipedia on water info
Posted by: leo | July 7, 2008 5:29 PM
The comparison of topic numbers with Wikipedia is a little bit unfair, since (on a small random sample) a significant number of Freebase topics seem to be direct summaries of Wikipedia entries.
Surely the important thing about Freebase is not the number of topics stored (though no doubt that will steadily grow) but the fact that it is making semantic links between the topics, allowing a kind of querying that Wikipedia and others don't support.
Having said that, they seem to have a way to go to provide easy to use ways to query all that semantic goodness, but good luck to them as they develop.
Posted by: Bill Roberts | July 8, 2008 1:06 AM
I have used Wikipedia and Freebase in reserch, Freebase seams to "compress Wikipedia Articles", perhaps in the future they will evolve more independently?
Posted by: German Romance | July 8, 2008 1:31 AM
Another large Semantic wiki directory is found at:
http://www.MyWikiBiz.com
I think it's the seventh-largest Semantic Mediawiki installation, and it may be the only one that aims to have 265 million pages eventually. Of course, that's pie-in-the-sky thinking, but if the ambition is to allow a page for every adult, every business, and every organization in the United States, that's about the right number.
Posted by: Gregory Kohs | July 8, 2008 7:00 AM
@Yakzoo
The induction problem is certainly impending some of the more serious types of uses.
I'm sure that if anyone can harness implicit or explicit crowdsourcing in such a way as to add lots of good new data, Metaweb will add them to their Top Friends.
Posted by: Jams Levy
|
July 9, 2008 12:26 PM
If you're looking for another use case, you should check out Entity Describer at http://www.entitydescriber.org. It uses Freebase topics as tags for websites so you can tag a site using the nice Freebase disambiguating form. This allows for auto-complete, disambiguation, and a large publicly generated controlled vocabulary, with semantics. It also stops the need to create tag structure manually (like "Medical Ethics" and "Ethics") in hopes that the recommenders work better (I have StumbleUpon in mind). All the semantic data generated is available on a SPARQL endpoint as well.
Posted by: Andrew McKnight | July 14, 2008 3:39 PM