freebase - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/freebase en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sun, 22 Nov 2009 19:36:29 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Factual Makes Publishing Open Data Easy factual_logo_oct09.pngFactual, a new open data project founded by Gilad Elbaz, just launched its public beta today. Elbaz's last company, Applied Semantics, was acquired by Google in 2003 and became one of the core components of the search giant's AdSense contextual advertising product. Factual, which is mostly geared towards developers, is somewhat similar to Freebase, though Factual allows for a more free-form approach to building a database than Freebase. Factual provides users and developers with tools to create, contribute and mash up open data on any subject.

]]>Sponsor

]]> Factual also announced that Esther Dyson has joined the company's board of advisors.

For now, Factual obviously only offers a relatively small repository of databases, though the company's current focus is on getting more developers to use its service and on bringing as much data as possible into the system.

Getting Data into Factual.

To enter data, users could obviously tediously enter the data field by field, or upload spreadsheets in most of the standard formats. The service also provides a number of easier ways to import data. You can, for example, give Factual a URL of any website or Wikipedia page that includes tables and the service will automatically create a new table based on this data. We tried this with tables from a number of sites and it generally worked well and only required a few edits. For advanced users, Factual also includes a number of more advanced extraction tools.

Once the data is available on Factual, developers can obviously use the API to read, write and mash this data up in any form they like. Users can also edit tables directly on the site or through an embedded table. In addition, users can mash up and combine existing tables.

Currently, Factual only offers one relatively basic embeddable widget that can only display the table without any graphical embellishments. The company plans to rely on developers to create other ways to access and display the data available on the service.

Not a Wiki

While Factual allows any user to make changes to the database, Factual's model is slightly different from the standard wiki approach where only the last edit is generally visible to the public. Changes made to a fact in a Factual database are more like votes for a certain entry. If three users or data sources say a restaurant doesn't offer vegetarian food, for example, and one user says it does, then the table will display the fact that the majority of users entered. Factual, however, will also display a question mark next to this disputed entry. Users can click on this question mark to see all the editors and data sources.

Factual will obviously try to weed out spam here as well, though given how new the service is, it's hard to evaluate how effective Factual's spam filters are.

License

Users who enter data into a Factual database do not automatically give up their copyright - though given that Factual focuses on facts, which typically can't be copyrighted anyway, this shouldn't be too much of a problem. Users can, however, choose an open license for their work, which might be necessary if the table they used to seed their database was licensed under a Creative Commons license, for example. Factual's FAQ explains this issue in greater detail.

Would You Use an Open Data Service?

With regards to the question of why businesses would open up their data, Gilad Elbaz told us yesterday that he believes open data could eventually go the way of open source, which also had a hard time to get acceptance among businesses. While open source software is a tool that a lot of companies now use, data is usually what is at the heart of a company's products and it remains to be seen how many companies would really want to put their data into an open database. For now, we mostly expect non-profits and government organizations to make use of this service.

]]>Discuss]]>
http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php News Tue, 13 Oct 2009 05:00:00 -0800 Frederic Lardinois
Freebase Parallax Taunts Us With Awesome Semantic Web Video Staff researcher David François Huynh has created an interesting tool for browsing semantic database Freebase, called Freebase Parallax. Written up by ZDNet's Oliver Marks, the video Huynh recorded demonstrating Parallax (below) will knock your socks off.

Unfortunately, actually using Parallax demonstrates just how far from solid Freebase, one of the semantic web's poster children, really is. The idea is to allow you to apply multiple filters for your searches and embed live charts in a blog. It's a beautiful idea, check out the video.

]]>Sponsor

]]> Here's the video below, if you find yourself saying "get to the point already," then skip to about 1:30 in the timeline.


Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.

Unfortunately, when we tried out a number of searches in Parallax, very few subjects were well populated at all. We found duplicate subject titles where one held solid data and the other didn't, but even that was a best case scenario. In search after search, we found next to nothing in Freebase.

The example above is nice, but let's say I want to find out something about black women scientists. No luck. History of the internet? Not much information there. Venture Capitalists? Blank profile pages.

This ought to work. Freebase has taken more than $50 million in venture investments, they have a small army of volunteer and computer scientist contributors, they've got robots pumping their database with information automatically. There are now 60% more articles in Freebase than there are in English Wikipedia. So what's the problem?

We wrote last week about ontological concerns about the semantic web, but Parallax shows that there are more superficial problems. An unfriendly UI has been Freebase's excuse for a long time, despite recent improvements to it. We love the idea of the semantic web, but give it's grand daddy website a usable UI like Parallax and we're left questioning just how much there really is inside Freebase anyway.

For an alternate view see Alex Iskold's Freebase: Dispelling the Skepticism, and some fault here may lay in the coolness ratio of the video to the Parallax app, but for now - we feel inclined to look elsewhere for the "semantic web killer app."

Disclosure: The author has consulting relationships with a number of pre-launched semantic web companies.

]]>Discuss]]>
http://www.readwriteweb.com/archives/freebase_parallax_taunts_us_wi.php http://www.readwriteweb.com/archives/freebase_parallax_taunts_us_wi.php Semantic Web Wed, 13 Aug 2008 17:41:56 -0800 Marshall Kirkpatrick
Metaweb's Freebase Now 60% Larger Than English Wikipedia Wikipedia is an incredible monument to human creativity and collaboration, but as one era of innovation passes into another - semantic web advocates want to augment the huge human input into the web with machine learning. The semantically enriched common database Freebase announced today that it will soon reach the milestone of 4 million topics added to its collection. That's 60% more than English Wikipedia's 2,445,041 articles and almost half the size of Wikipedia's full 10 million articles in 250 different languages.

What is Freebase? It's a database of information that's organized by people and machines and is particularly well suited for machine reading. You're not a machine - so why should you care? Read on.

]]>Sponsor

]]> freebasepic2.jpg

What You Can Do With Freebase

Semantic web expert and RWW contributor Alex Iskold spelled out the value of Freebase in great detail here in May. The long and short of it though is that Freebase learns fast through a combination of automated information harvesting and machine and human organization. It collects information from sources like Wikipedia and MusicBrainz and from user uploads and edits.

Programmatic access to that now structured data allows all kinds of mashups to be built that "know things." Check out, for example:

  • Taught or Not - a cute little game that tests your knowledge of who influenced who throughout the history of thinkers.

  • Shot or Not - another game that tests your knowledge of the causes of death of various famous people throughout history.

  • Random Walk Through Influences - a little app that displays the chain of historical influence around any artist whose name you enter.

  • Pull Quotes - If you have any interest in politics, check this out - it's awesome!

  • Powerset - the Natural Language search engine acquired by Microsoft last week uses Freebase, too.

Seriously, Though

Obviously most of these are relatively frivolous use cases. Are there serious powerful use cases for Freebase yet? We're not entirely sure. There are big gaps in the data, which is understandable, but the interface is so much harder to use than Wikipedia's that there's reason to be concerned about expectations of substantial human editing. The interface was much improved this summer and is now far more usable, but it's still harder than it needs to be.

We've certainly got our questions about Freebase, but we're excited about what Metaweb is doing with it. They are smart, well funded and aiming high. The community there deserves congratulations on growing to 4 million reusable articles, something that the the celebrated English Wikipedia community can only aspire to.

]]>Discuss]]>
http://www.readwriteweb.com/archives/metawebs_freebase_now_60_large.php http://www.readwriteweb.com/archives/metawebs_freebase_now_60_large.php Semantic Web Mon, 07 Jul 2008 14:00:40 -0800 Marshall Kirkpatrick
Thinkbase: Mapping the World's Brain If Freebase is an "open shared database of the world's knowledge," then Thinkbase (found via information aesthetics) is a mind map of the world's knowledge. The interesting and incredibly addictive Freebase visualization and search tool is the brainchild of master's degree student Christian Hirsch at the University of Auckland. Thinkbase is one of the cool proof of concept applications built on top of Freebase that we mentioned last week.

]]>Sponsor

]]> As we've mentioned here on RWW, Freebase is best suited for complex inferencing queries -- the type that expose relationships between various entities to figure out an answer. Things like, "What's the name of the actor who was in both "The Lord of the Rings" and "From Hell?" (Answer: Ian Holm)

Thinkbase doesn't necessarily answer those questions -- at least not directly, but it does allow people to visually explore the relationships that Freebase can expose. Thinkbase employs the Thinkmap visualization software to visually represent the semantic relationships between objects on Freebase as an interactive mind map. Each object on the map is represented by an icon that corresponds to the type of object it is. For example, person, place, movie, song, or artwork.

The site uses a two-pane display, putting the relationship map in the left pane, and the Freebase entry for the active node in the right pane. Every node on a Thinkbase map and be expanded to see concepts related to that object, or collapsed to clean the graph of relationships you're unconcerned with. Every map you create can also be linked to via a dynamic share URL.

Thinkbase is a really fun visual front end to the Freebase database that exposes the semantic relationships that such a database can reveal in a compelling way. Alex Iskold wrote last week that the problem with semantic search is that we're asking the wrong questions. Tools like Thinkbase can help us start to think about what type of questions we should be asking by clearly showing the type of semantic relationships that databases like Freebase excel at finding.

]]>Discuss]]>
http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php Products Thu, 05 Jun 2008 10:30:01 -0800 Josh Catone
Weekly Wrapup, 26-30 May 2008 Here are some of the highlights from the week's Web Tech action on ReadWriteWeb. On the product side we covered announcements by Google about Gears and App Engine, we looked at some compelling Yahoo! Pipes apps, we checked out Strands Lifestreaming, and we reviewed promising Semantic Apps Faviki and Freebase. On the trends side we analyzed the contentious Semantic Search market, we looked at Google's Android vs iPhone, we put the Social Networking battle between Google and Facebook in context, and we explored more social media trends.

]]>Sponsor

]]> Web Apps

Google Gears Turns One: Future is in Open Standards

Google Gears, the offline web application API it debuted last year at its developer conference, turned one this week. To celebrate, Google dropped the company name from Gears. The name change is a symbolic move aimed at reinforcing Google's commitment to working with existing standards communities and helping them to define better open standards for bridging online applications and the offline world.

See also: Google App Engine Announces Pricing Plan, APIs, Open Access; Why Google is Wooing Web Developers

The Ultimate Yahoo! Pipes Creations List

Yahoo! Pipes is one of the coolest ways to mashup the RSS feeds of various sites and sources to get the data you want. Since our initial coverage of Yahoo! Pipes, thousands of creations are now available. However, finding the best picks can be tough. ReadWriteWeb has done the hardest part and comprised a list of some of the best Yahoo Pipes created by users. We give you the ultimate Yahoo! Pipes list.

Strands Lifestreaming: What They're Doing and Invites for Readers

strandslogo.jpg Recommendation service Strands.com launched a lifestreaming service this week that aims to pull together the company's wide range of services in particular media and online activity into one central place for users to share socially. The new Strands is a way to share your music, bookmarks, blog posts and other activity with friends, family and groups. It's a major entry into one of the most interesting sectors of the new web. We give it a mixed review...

See also: Recommendation and RSS: A Look at Two Readers Filtering the Noise

Semantic Tagging with Faviki

Faviki is a new social bookmarking tool that offers something that services like Ma.gnolia, del.icio.us, and Diigo do not - semantic tagging capabilities. What this means is that instead of having users haphazardly entering in tags to describe the links they save, Faviki will suggest tags to be used instead. However, unlike other services, Faviki's suggestions don't just come from a community of users and their tagging history, but from structured information extracted straight out of the Wikipedia database.

Freebase: Dispelling The Skepticism

Freebase, the first product of semantic web company Metaweb, is an open, semantically marked up database of information that we called one of the "10 semantic apps to watch" last year. With $57.4 million in funding, a smart team, and a tech legend in Danny Hillis at the helm, Metaweb is considered to be one of the most serious players in the Semantic Web space. Yet the company's efforts to date have been met with skepticism. Particularly, people have asked how is Freebase different to Wikipedia? Jamie Taylor, the Minister of Information at Metaweb, spoke at the SemTech 2008 Conference that took place in San Jose last week in an effort to dispel some of that skepticism.

SEE MORE WEB APPS COVERAGE IN OUR WEB APPS CATEGORY

Web Trends

Semantic Search: The Myth and Reality

For a few years now people have been talking about semantic search. Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different. What are we doing wrong?

See also: Making the Web Searchable: The Story of SearchMonkey

Android Is Out For iPhone Blood

Wednesday, at Google's I/O Event, the company demonstrated their Android prototype phone, a device which has been greatly improved since its last public outing at this year's CES and Mobile World conferences. Today, Android looks classy enough that you half-expected them to pull a Steve Jobs and announce that you could run out and buy it right now. During the demo, the company showed off some of the applications that will run on Android - like a Google Maps Street View app that drew cheers from the crowd. From the buzz surrounding the Google Phone at this event, it's clear that Android has a shot at knocking that other touchscreen phone off its pedestal.

See also: Google's Android: How Will it Compare to iPhone?

The Social Networking Arms Race

Last November, when Google launched Open Social we asked readers if Facebook would join Google's platform. The results were split right down the middle, but as we get farther from the Open Social launch, and the two sites continue to launch competing APIs (Google FriendConnect vs. Facebook Connect, for example -- the former banned by Facebook), that seems less and less likely. This is becoming a social networking cold war.

See also: How Many Friends is Too Many?

The Fork in the Road for Social Media

Social networking is at a major fork in the road. Down one road is adding more features to a walled garden and opening up just enough, so that users seldom need to leave. Most sites are going down this yellow brick road and the prize is clearly a big one. But they may end up back in Kansas. Down the other road, lies a future of being the primary repository for your connections (aka the social graph), but with this data available via open APIs to anybody who needs it. That is a utility type model, and as with any utility, it can be hugely valuable at scale.

See also: Sometimes Crowds Aren't That Wise

Who Are The "Digitally Savvy?"

A new report put about by consumer and media research firm Scarborough Research has revealed some interesting information about the section of the U.S. population that's being called the "digitally savvy." These are the consumers who are more likely to own high-tech items like DVRs, satellite radios, and VoIP phones and are more likely to engage in Internet activities that include blogging, downloading music, and other web 2.0 activities. In other words - they're us.

See also: When User-Generated Content Goes Bad

SEE MORE WEB TRENDS COVERAGE IN OUR TRENDS CATEGORY

That's a wrap for another week! Enjoy your weekend everyone.

]]>Discuss]]>
http://www.readwriteweb.com/archives/weekly_wrapup_26-30_may_2008.php http://www.readwriteweb.com/archives/weekly_wrapup_26-30_may_2008.php Weekly Wrapups Sat, 31 May 2008 05:00:01 -0800 Richard MacManus
Semantic Search: The Myth and Reality For a few years now people have been talking about semantic search. Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different. What are we doing wrong?

]]>Sponsor

]]> For example, when asked, What is the capital of France? both approaches come back with the correct answer - Paris. Also, a lot of queries that we are used to typing into Google in abbreviated form, come back with similar results if we type them using natural language. Clearly something is off. We all know that semantic technologies are powerful, but how and why? In this post we will show that the problem is that we are asking wrong questions.

The mistake is that semantic search engines present us with Google-like search box and allow us to enter free form queries. So we type the things that we are used to asking - primitive queries. It never occurs to us to type in What actor starred in both Pulp Fiction and Saturday Night Fever? or What two US Senators received donations from a foreign entity? We type simple questions, but this is not where the power of semantic search lies. Lets look at the spectrum of semantic technologies from Google, to SearchMonkey, to Powerset, and Freebase to understand what is going on.

What Problem Are We Trying to Solve?

The first confusion in the space comes from the fact that semantic search is being positioned as the answer to all possible problems - from modern search, currently dominated by Google, to problems that are computationally impossible. The situation is made more difficult by the fact that right now there is only a thin range of problems where semantic search can clearly do better. This range is complex queries involving inferencing and reasoning over a complex data set.

As shown in the diagram above basic queries are easily handled by Google. Sadly, natural language processing gives little advantage when it comes to this category of problems. Google correctly answers the question about Leonardo Da Vinci's birthday leaving no opportunities to improve the search by understanding the nouns and the verbs that user typed in.

Before looking at the problems that are perfect for semantic search, lets look at the hardest problems. These are computationally challenging problems that really have nothing to do with understanding semantics. The misconception has been perpetuated since early days of the Semantic Web that somehow, because we will annotate the web, we will be able to solve these super complex problems. This is simply not true. There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.

The good news is that there is a set of problems that are great for semantic search. These are the problems we have been solving so wonderfully with relational database. Way too often we forget that semantic technologies are here to help us represent relational data spread over the entire web - so it should be no surprise to us that it is relational queries that semantic search engines would excel at.

The Spectrum of Semantic Search Players

But semantic search is not just about the questions that we are asking. Because the web is just a bunch of unstructured HTML pages, semantic search is also about the underlying data. At its most structured extreme we find Freebase - the semantic database of everything. Freebase is accessible via free text search, but more importantly via MQL (Metaweb Query Language). MQL is essentially JSON with wildcards. Using it you can construct any query against Freebase and the result will be the same query with answers filled in.

Powerset, in a way, is just a relational database. It operates against certain, structured information. On the other end of the spectrum is Google, which is all about statistical frequencies and very little semantics. The recently launched SearchMonkey from Yahoo! is an interesting twist. It does not add anything to the result set, but instead uses semantic annotations to present a richer, more interactive and useful user interface.

Companies like Hakia and Powerset are probably working the hardest. These companies are trying to simultaneously build Freebase-like structures on the fly and then do natural language queries on top of them. The difference is that Hakia is using (likely similar) technology to query over the entire web, while Powerset has (probably shrewdly) chosen to restrict the search to Wikipedia.

Are Hakia, Powerset and Freebase All That Different?

This analysis brings up a question - which of these technologies are different and which are essentially the same? Lets get the easy one down first. Yahoo!'s SearchMonkey is no different from Google or any other search, as far as the core search technology is concerned. The difference is simply in the presentation layer. SearchMonkey is smart about creating a better user experience by letting publishers present the search results to the users in the best possible way.

But when it comes to Hakia, Powerset and Freebase the situation is much more complicated. On the surface all these products are different - Hakia lets you search the whole web, Powerset is restricted to Wikipedia (and Freebase!) and Freebase itself has two search interfaces - the search box and query language. Here is the problem - the natural language interface has nothing to do with the underlying data representation.

The fact is that all of these semantic search technologies allow people to type in arbitrarily complex questions and then interpret these queries and execute them against their databases. Fundamentally, Hakia, Powerset, and Freebase are databases. Fundamentally, all of them have some kind of Natural Language Processing that translates the question into a canonical query over the database.

To gain insight into all of this, think about Freebase and its query language MQL. Unlike natural language, which allows all sorts of constructs, MQL is non-ambiguous. This JSON-like language allows users to construct precise statements against Freebase. The fact that Powerset allows natural language queries does not mean that inside Powerset there is no database. For sure, though, there is a similar kind of database as there is beneath the Freebase search box. What is really different about Freebase and Powerset is the data gathering approach and user experience.

Back to the Future: It's All About UI

Probably the most striking revelation about the semantic search space is User Interface. First, to go on the tangent, Powerset got it right by realizing that semantics needs to be surfaced in the UI. After a user searches Powerset, a contextual gadget, aware of the semantics of the results, helps the user complete the search experience.

Yet the biggest mistake that I think Powerset is making is also in the UI. The search box that everyone is familiar with via traditional web search engines needs to go. Having a simplistic search interface hurts Powerset and Hakia, and to a lesser extent Freebase, which is not positioning itself as generic search.

Think about the recent launch of Powerset. The company released a vastly better way to interact with one of the most important sources of information on the web - Wikipedia. But what did the critics say? Lets see if this is a Google killer. And the answer to that is "no."

But what if Powerset restricted what can be searched? What if instead of a search box there was another interface or what if they told users not to look up things that they can find easily on Google? Why is it that new companies are expected to improve on the algorithm that has ruled the web for over a decade? Instead, the expectation should really be to solve the problems that can not be solved by Google today.

Conclusion

Semantic search is an upcoming technology that has set the expectations way too high. We have all been misled into thinking that these technologies are here to dethrone Google by delivering better search results. Neither of those things are true. What is true, however is that semantic search is going to be big and it is going to help us answer questions that we simply cannot answer today - complex, inferencing queries asked over the entire web as if it was a database.

In order for these semantic search technologies to make a dent in the market, they need to clean up their messaging and most importantly, their user interface. Presenting a search box is both misleading and detrimental, as people associate it with the simplistic questions that Google solves without any problems. To really showcase semantic search, these companies need to come up with innovative UIs that will help users to understand the power that is being put at their fingers.

As always, please tell us what you think. What should semantic search companies do to gain their place in the marketplace?

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php Trends Thu, 29 May 2008 14:15:01 -0800 Alex Iskold
Freebase: Dispelling The Skepticism Freebase, the first product of semantic web company Metaweb, is an open, semantically marked up database of information that we called one of the "10 semantic apps to watch" last year. With $57.4 million in funding, a smart team, and a tech legend in Danny Hillis at the helm, Metaweb is considered to be one of the most serious players in the Semantic Web space. Yet the company's efforts to date have been met with skepticism. Particularly, people have asked how is Freebase different to Wikipedia? Jamie Taylor, the Minister of Information at Metaweb, spoke at the SemTech 2008 Conference that took place in San Jose last week in an effort to dispel some of that skepticism.

]]>Sponsor

]]> What is Freebase?

Jamie has an interesting title: Minister of Information, and his primary responsibility is to seed Freebase with information and ensure the quality of the data. According to Jamie, Freebase is "open shared database of the world's knowledge." This sounds the same as Wikipedia, but it is really quite different, because at the heart of Freebase are the ideas of semantics and openness via API.

Unlike Wikipedia, which is a free form database, Freebase is structured, where concepts and relationships are interlinked into a gigantic network or graph. Another important difference is that Freebase is all about its API. Any information contained inside the database is accessible and can be retrieved via queries. In addition, the data in Freebase is under a Creative Commons license - meaning that is readily exportable and useful by others.

When it comes to defining the meanings of things, Freebase is focused on community, with collective editing, attribution, and collaboratively built semantics. This last point is quite crucial - the founders of Freebase believe that meaning has to emerge from the collaboration between users. As such, Freebase is one of the first experiments of web-scale social contracts. The site is really focused on the notion that information is not encumbered by licenses and is free to use.

What is in Freebase Today?

Data comes into Freebase from many sources: Wikipedia, Flickr, the US Department of Commerce, Music Brainz, the USGS, SFMOMA, the US Exchange Commission, Chef Moz, and many other places. Right now the information is mostly about people and places, but the system is engineered to have a wide range of data types. As an example of "People" information, there is a lot of information in Freebase about artists along with their artwork and place in history. More esoteric types of information you might find in the database include airplanes, french cheese, tropical storms in the 90s, oil companies, and candies.

Freebase also contains lots of other kinds of data and has:

  • 3.4 Million Subjects
  • 750K People
  • 450K Locations
  • 50K Companies
  • 40K Movies
  • ... Over 1K Data Types with over 3K Properties

Data Representation in Freebase

While Freebase certainly has long way to go before it can claim completeness of information, its core idea of object representation and linking seems very solid. Each object in Freebase is unique. As more information comes into the system about an object, more links are created about it in the system. It is particularly interesting how Freebase establishes object identity and decides that two concepts (or subjects) are the same.

The diagram above illustrates the idea. When a new source of information is added to Freebase, it is parsed into entities and facts. The new information is then cleaned up and is merged with the existing system. But the merge only occurs if the system determines that the two bits of information are really about the same subject (in this case Leonardo Da Vinci). This is a powerful approach which allows Freebase to grow the knowledge around individual subjects. What is also interesting is that Freebase allows human editing to reconcile situations when the system is unable to automatically link the two concepts together.

Each permanent object in the system has a GUID - a unique identifier, something like this: #9202a8c040000064..... The identifier can be used to refer to the object via URL and via queries. In addition to the GUID, there are other ways to refer to the object, for example, http://www.freebase.com/view/en/leonardo_da_vinci. Beyond that, there are even other aliases, for example, you can refer to a public company by its stock ticker symbol. But regardless of the reference, the key point is that you end up with the same, unique node in the system.

Freebase also has the ability to create new domains and types that describe new concepts, for example, science fiction movies. There is a way to attach new data types to the existing domains, and then these types can be shared and used by other users. The idea is that you can model things with the fine grained resolution that you need and then you can invite people to help you refine and evolve your models. An example is the motorcycle community, which evolved out of an effort led by one guy and who was then joined by others, and has since been promoted to the top level. The community process is about merging private types to build common models.

What Can You Do With Freebase?

Freebase is not a formal system, it is not a reasoning engine, it is just a knowledge repository, a database. To query Freebase you use the Metaweb Query Language (MQL), which is based on JSON. The language is meant to be very simple and it is actually very interesting as well. The idea is that you fill out a tree which represents a partial graph with pieces that you know and then the system basically fills in all the slots that you left blank and delivers back all possible subgraphs.

For example, say you are watching a movie and you can't tell what it is. You know that the movie stars Patrick Swayze and an actress who was also in "Tank Girl." So you create a movie query and express all these facts, using JSON-style syntax. And when you run the query you get back that the actress is Lory Petty and the movie is "Point Break" and you also get links to IMDB. So the query and the results have the same structure and to find matches you simply traverse the set of results that is returned.

Building on this example, Freebase is really meant for complex inferencing queries, the sorts of questions that Google has no way of answering using its statistical frequency algorithms. For example, what US senators took money from a foreign entity? Turns out that both Barak Obama and Hillary Clinton received donations from UBS AG, based in Switzerland. That is a complex inferencing query that needs to be expressed in a query language before it can be answered and so questions of this nature are outside of the reach of any search engine -- and Wikipedia too, for that matter.

Resources

There is quite a lot of activity going on around Freebase today. Many enthusiasts are building small proof of concept applications showcasing what can be done in the future with this powerful database. You can stay on top of the cutting edge stuff coming both from the Freebase team and community at: http://download.freebase.com and http://research.freebase.com

]]>Discuss]]>
http://www.readwriteweb.com/archives/freebase_overview.php http://www.readwriteweb.com/archives/freebase_overview.php Products Wed, 28 May 2008 22:10:01 -0800 Alex Iskold
Semantic Web: What Is The Killer App? The Semantic Web has been in the making for some time and people think it is nearing maturity. We have written about this trend extensively, with our two most notable posts being an analysis of the challenges of the classic bottom-up approach and the promise of the new top-down one. Regardless of how the Semantic Web will come about, for it to flourish it needs to hit the mainstream. There is no way that consumers will appreciate the elegance and mathematical soundness of RDF and OWL. People don't care about math, they care about utility and even more, about fun. What the Semantic Web needs, then, is a killer app.

]]>Sponsor

]]> Whatever it is, it needs to layer an understanding of semantics on top of a consumer application. The consumer application needs to be so cool and so viral that people will be open to learning that it is powered by semantic technologies. In that case, it will be possible to further market applications as Semantic Web apps. Consumers will understand that if one Semantic Web application has potential, so might others. In math, this is called proof by induction. In marketing this is called creating a market. In any case, it needs to be done.

In this post, we analyze several existing and potential applications of semantic technologies and look for the killer app.

Natual Language Understanding

Since the beginning, the Semantic Web has been associated with Artificial Intelligence. The idea of representing information in structured form so that computers can "understand it" and then solve complex problems was one of the keystones of the Semantic Web vision. The problem is that representing billions of existing web documents as RDF is a rather daunting, if not impossible task. An alternative would be to "teach" computers natural language. If an application could read the page the way we read it and interpret what it says, the annotations would not be necessary.

Natural language processing has been the Holy Grail of AI for awhile now. However, it is a very difficult problem, because humans are born with the innate ability to understand language and we learn it not in a vacuum, but in the context of life. Certainly if we could replicate that with computers, it would be amazing and it would be the killer app. The problem is that this is not on the horizon. The Semantic Web technologies of today are not able to represent natural language in its entirety, and this is not really even their goal. Even if we could represent each page completely, there is still the matter of interpreting structure into semantics, which is the magic that our brain does so well and so easily.

Genie In The Bottle

Related to natural language understanding, is another idea that is not on the horizon. John Markoff called it "the perfect vacation." I call it the "Genie in the Bottle" to illustrate the impossibility of this. There is a misunderstanding about the Semantic Web which is floating around, which equates the Semantic Web with ability to solve really hard problems. It is simply not true.

For example, if you go to a new travel agency and ask them to book the perfect vacation for you, the travel agent will not be able to do it, because she does not know you. In order to find the perfect vacation there needs to be constraints: where you've been before, who you are going with, what you like to do, what is your budget, etc. Finding the "perfect" vacation is not a one shot deal, it is a process, which leverages iteration and memory.

True, with the Semantic Web the information is structured, but it does not mean that the computer can necessarily solve complex problems. These are two completely different things. Just because you have a map, does not mean that you know the best way to get from point A to point B. Having a map is necessary, but it is not sufficient, you need the algorithm to find the best path. There is a big difference between asking what is the capital of France and what is the cheapest airfair today to fly from New York to Paris. And the even harder question is: Where should I go on vacation next? Computers are not going to give us an instant, perfect answer to this question anytime soon, if ever. Again, this would be the killer app, it is just not likely to happen.

Semantic Knowledge Databases

So what is realistic and possible today? The first in the list of growing applications are Semantic Knowledge Databases. The two examples that we will look at here are Freebase and Twine. While Freebase is focusing on building essentially a semantic equivalent of Wikipedia, and Twine is focused on a personal semantic database, both are databases, both focus on knowledge management, and both are Wikipedia-like. The advantage of these databases over Wikipedia is that they represent information in a structured way and support queries. To understand the difference, take a look at the Alicia Keys page on Freebase and on Wikipedia. At first glance they are very similar, but Freebase "knows" that Alicia Keys is a blues singer and it then knows other blues singers. For Wikipedia, blues is just another page, not a music genre. So Freebase can potentially answer a question of listing all blues singers, while Wikipedia can not.

This is certainly interesting but the question is will people care? Can the end consumer tell the difference? Unlikely. Today Wikipedia contains definitive references on a vast number of topics. Like Google, it is easy to search and find relevant information, and as a result, people are not likely to be in need of a better Wikipedia. With Twine the situation might prove to be different, because personal knowledge management is an important problem. The first question is: Are their enough people who want to be efficient in managing personal knowledge? I think the answer is increasingly likely to be "yes." And the second question is: Does knowing the semantics of knowledge help you build the best application? At the very least Twine has to beat del.icio.us bookmarks and ideally needs to do for personal knowledge management what Highrise is doing for CRM.

But beyond the execution, there is still another problem. For a semantic knowledge base to be the killer app it needs to ignite imagination and capture people's hearts and minds. This is not likely to happen. We appreciate libraries, we can not live without them, but we take them for granted. Knowledge has been commoditized thanks to Google, Wikipedia, and the blogosphere, and is perceived as abundant and unexciting. For this reason Semantic Databases are not likely to be the killer apps -- but they might become a stepping stone towards one.

Semantic Search

An early candidate for the killer app in the semantic web category was search. First Hakia and more recently Powerset marketed the idea that a semantic search engine, one that is based on the understanding of natural language, can beat Google. On top of having the pressure to deliver qualitatively better results, Semantic Search companies also have to, at least approximately, solve the problem of natural language understanding, which as we discussed earlier is a very difficult one.

Where things stand right now, it does not look like search is the killer app for semantics. The understanding of natural language does not seem to give you a noticeable edge in getting better search results. At least in the comparisons that we have performed earlier there is no major difference. The statistical algorithm deployed by Google is precise and good enough, which is why it has been the clear leader in web search for the past 8 years. To unseat Google will require more than incremental improvement in search, it will likely take a paradigm shift and the creation of a different web experience. Below, we discuss how "discovery" could possibly take a bite out of the pie, but as of now Google's algorithm remains good and strong.

Social Graph

After Tim Bernes-Lee posted his thoughts on the Social Graph, a discussion began on the web in which people wondered if the Social Graph is in fact the Semantic Web. This, however, is a gross misinterpretation of the post. The Social Graph is not the Semantic Web, nor is it the killer app of the Semantic Web. They are just two separate concepts. The confusion comes from the fact that they both are Mathematical Graphs or a Network. The underlying structure of both consists of nodes connected by links. Many things in the nature and society are networks, so it is not surprising that meaning and people fall into this category.

If anything, it is more correct to say that the Social Graph is a subset of the giant, all encompasing Semantic Web. Knowing how people are connected is important in order to solve the perfect vacation problem. After all, a perfect vacation should be taken together with perfect friends, right? But jokes aside, the Social Graph is an interesting and important trend for 2008, however, it is not really related to Semantic Web.

Shortcuts

Increasingly, we are seeing a new breed of Semantic Applications, which we generalize as shortcuts. This category includes SnapShots from Snap, BlueOrganizer and SmartLinks from AdaptiveBlue, Shortcuts from Yahoo!, and In-text search from Lingospot. What is common between all these technologies is that they leverage the simple semantics of the content to deliver additional information. In the case of Snap and AdaptiveBlue, the semantics is defined by the URL, while Yahoo! and Lingospot perform text analysis.

Regardless of the method, all of these technologies deliver related information via Ajax popups. That is, they leverage semantics to pull the information from the web. This is essentially discovery or reverse search. When the user is looking at a book there is a preview with a brief description and the cover image, when the user encounters a stock symbol he is presented with a stock chart, analysis and additional links to the company, when the user is looking at a music album there is a play button, and when the user encounters a movie there is an ability to watch the trailer in place. The shortcuts remove the need to search, instead, the related content from the web comes right into the page.

Today's shortcut technologies are simple and still in their infancy, but they are among the most successful examples of semantic applications. However, we can not call them the killer app for several reasons.

First, people perceive them as advertising, which is not the point. Snap certainly made an early push into ads, but this is not a representation of what these technologies will look like in the future. Second, in their current implementation, all of these technologies are utilities. For the same reason that people are not going to get emotional about personal knowledge management, they will not be emotional about shortcuts. Shortcuts will also be taken for granted.

Yet, shortcuts hold the most promise. With a few more iterations these technologies are going to get slicker and more precise. They will leverage content and micro-context to reduce the amount of search. They will become more personalized based on user behavior. And once this happens it will be a big deal.

Full Disclosure: Alex Iskold is the founder and CEO of AdaptiveBlue.

Conclusion

We are still waiting for the killer app for Semantic Web, something that can get viral and turn semantics into a marketing term. Problems like natural language understanding still remain difficult to solve, and the solutions do not appear to be on our horizon right now. It also appears that a semantic search engine, at least based on the ones we have seen to date, does not have a substantial advantage over Google. We are seeing the rise of early Semantic Knowledge Databases, but while we expect them to get better and more interesting, they are more likely to be the stepping stones to the killer app, rather than the app itself.

In the mean time, we are seeing the rise of shortcut technologies, which leverage the basic semantics of the content, like URL and simple context analysis, to deliver relevant information, links, and media directly into the page. While still very early, these technologies hold the most promise because they are simple and useful. We expect that the next generation of these technologies in conjunction with personalization will deliver an interesting alternative to search -- contextual discovery. We will discuss this alternative in more detail in a future post.

Now tell us what you think the killer app for Semantic Web will be? Which of these technologies do you think is the most promising?

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_web_what_is_the_killer_app.php http://www.readwriteweb.com/archives/semantic_web_what_is_the_killer_app.php Trends Wed, 09 Jan 2008 22:22:00 -0800 Alex Iskold