semantic web - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/semantic web en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 18:04:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Top 10 Semantic Web Products of 2010 Every year ReadWriteWeb selects the top 10 products or developments across a range of categories. We kick off the 2010 'Best Of' series with our selection of the top 10 Semantic Web products and implementations of the year.

This year we've chosen 5 products by semantically charged startups and 5 implementations by large organizations. The startups represent the cutting edge of Semantic Web. Each has made an impact on the Internet this year, with user growth and innovation. The organizations we've selected - which include Facebook, Google and the BBC - offered the best examples of large scale deployment of semantic technology.

]]> A note on terminology: we are using 'Semantic Web' and 'Semantic technology' somewhat interchangeably, although many people believe that the term Semantic Web (upper case) should only be applied to W3C-approved technologies such as RDF and SPARQL. The fact is that a good portion of our top 10 use technologies that are either not approved by the W3C (the Web's governing body, led by Sir Tim Berners-Lee), or they've been tweaked in some way - for example, Facebook's use of RDFa. So we've chosen to use the term 'Semantic Web' in its broader, more inclusive, sense. In a nutshell, these are products that add meaning and context to data.

Here then is our list of the top 10 Semantic Web products or implementations of 2010 (in no particular order).

Freebase

googlemetaweb_jul10.jpgIn July Google acquired one of the leading Semantic Web companies, Metaweb. Metaweb runs Freebase, an open, semantically marked up database of information. It looks similar to Wikipedia, but Freebase is all about structured data and what you can do with it.

Google already had a relationship with Freebase, pulling in its information to provide intelligent search results within Google News. With the acquisition of Metaweb, Google can now leverage the company's tools and data even further, especially within basic Web search results.

Freebase was one of our top 10 Semantic Web products last year and being acquired by Google validates its potential.

GetGlue

This year was a turning point for GetGlue, the service where users "check in" to watching TV shows, reading books, listening to music and more. Last November, GetGlue changed its branding and launched a new website. It changed almost overnight from a geeky browser add-on called Blue Organizer to a destination website called GetGlue. Mobile applications followed soon after, enabling its users to interact with GetGlue while watching TV or at an entertainment venue.

The changes have been good for GetGlue. It's experienced strong growth this year, reaching over 600,000 users by the end of September.

Disclosure: GetGlue's founder and CEO, Alex Iskold, used to be a regular contributer to RWW.

Flipboard

The launch of the iPad in 2010 triggered a new round of innovation in the startup community. Few startups utilized the touchscreen UI to create a unique user experience more than Flipboard, a magazine reading application built specifically for the iPad.

It turns out that Flipboard isn't just a pretty face, it's also using Semantic technologies.

In July, Flipboard acquired semantic technology startup Ellerdale, whose intelligent data-parsing algorithms had previously been used to create a real-time search engine and trends tracker. Ellerdale's technology was used by Flipboard to design a more personalized real-time experience - determining what social updates are important to you and presenting them in its now familiar magazine-like format.

Hunch

Hunch started out as a Q&A service, but in August it re-positioned as a personalization service. It's a recommendation engine that shows you movies you want to see, books you want to read, vacation destinations you want to go to, and much more. The company is on a mission to "map every person on the Internet to every object on the Internet, be that a product, a service, or a person."

Co-founder Caterina Fake told us in October that Hunch uses a decision tree model, as an alternative to search, to provide more personalized information to users.

Apture

Apture is a semantic contextual search service which continues to iterate strongly (it made our top 10 list last year, too). In August, Apture launched Apture Highlights, a plug-in that allows you to dive deep into any topic you discover on almost any page around the web.

When we first noticed Apture several years ago, it was a service that required publishers to load up linked pop-up widgets with multimedia of their own choosing. The company removed that barrier to entry with its August release. Everything is now automated and it's available almost everywhere. Indeed we liked it so much, we started using Apture on ReadWriteWeb (there is no commercial relationship, we just think the product adds to our site's user experience).

Next Page: Top 5 big organization implementations of Semantic Web technology. Featuring Facebook's Open Graph, Google's semantic search, and more...

Facebook

Arguably the biggest Semantic Web news of the year came in April, when Facebook announced a large-scale new platform called the Open Graph. The stated goal of the Open Graph protocol was to enable publishers to "integrate [their] Web pages into the social graph." Essentially, each web page can now become an 'object' in Facebook's social graph (which is Facebook's term for how people connect to each other in its network). This means that pages can be referenced and connected across social network user profiles, blog posts, search results, Facebook's News Feed, and more.

The Open Graph is a wide-ranging platform which includes features such as 'Like' buttons and publisher plug-ins. It also includes a simple, RDF-based markup. This requires publishers to include at least 4 metadata properties in each object: title, type, image, URL. There are a few additional properties which may be optionally added, such as site_name and description.

See also: Facebook Open Graph: The Definitive Guide For Publishers, Users and Competitors

Google Squared

The holy grail in web search technology is to be able to ask a simple question, in natural language, and get a simple answer. In May, Google announced that Google Squared was coming to its search results. Google Squared, which launched in 2009, adds additional information to search results.

The functionality was added to Google's traditional search results in two ways. Firstly, simple queries such as Catherine Zeta-Jones' date of birth elicited useful data within the search results:

squared-example-result.png

By clicking "show sources" on the Squared-provided result, a list of sources appears showing you how Google arrived at this answer.

Secondly, Google Squared is being used to provide a new feature in Google's sidebar (another innovation by the search giant in 2010): "Something different". This feature provides a list of related searches that may be of interest, determined by looking at your current search term.

This year Google also reported strong growth in its Rich Snippets feature, which adds extra information to Google search results too - in this case, data like review ratings.

Best Buy

One of the themes of 2010 was the increasing usage of Semantic Web technologies by large commercial companies like Facebook and Google. Leading U.S. retailer, Best Buy, was another large company to impress in 2010 with its adoption of semantic technologies. Specifically, Best Buy used a Semantic Web markup language called RDFa to add semantics to its webpages.

Jay Myers, Lead Web Development Engineer at BestBuy.com, told ReadWriteWeb in an interview earlier this year that the primary goal of using semantic technologies was to increase the visibility of Best Buy's products and services. With data such as store name, address, store hours and GEO data being marked up using RDFa, search engines are now able to identify each of those data components more easily and put them into context. The use of semantic technology, Myers told us, led to increased traffic and better service to its customers.

Data.gov.uk

In January, Data.gov.uk launched to make non-personal data held by the U.K. government available for software developers. It arrived six months after the U.S. government launched its Data.gov site, but from the start the U.K. site had more than three times as much data. At launch, Data.gov.uk had nearly 3,000 data sets available for developers to build mashups with. By the end of the year, that had increased to over 4,600.

Data.gov.uk was one of the highlights of the year in Linked Data, which is when organizations or governments upload data to the Web in a format enabling it to be re-used and built on. Linked Data is a subset of the wider Semantic Web movement.

See also: The State of Linked Data in 2010

BBC World Cup Website

The biggest sporting event of the year was the soccer World Cup, which was widely covered in the media. The BBC World Cup 2010 website used "dynamic semantic publishing" technology to enhance its daily World Cup reporting.

The site featured over 700 webpages and was powered by a semantic publishing framework. It boasted a comprehensive ontology (a map of concepts), that output "automated metadata-driven web pages" created on-the-fly. It was an impressive demonstration of how a large, mainstream website can add meaning and structure.

There you have it, ReadWriteWeb's selection of the top 10 Semantic Web products and implementations of 2010! Let us know in the comments whether you agree or not with our top 10.

]]> Discuss]]>
http://www.readwriteweb.com/archives/top_10_semantic_web_products_of_2010.php http://www.readwriteweb.com/archives/top_10_semantic_web_products_of_2010.php 2010 in Review Wed, 29 Dec 2010 15:17:00 -0800 Richard MacManus
Web Linking Gets Deeper with New Standard for Link Relations ietflogo.jpgThe Internet Engineering Task Force (IETF) has published a Request for Comment on a proposed standard for link relations across multiple web formats. From rel="stylesheet" to rel="bookmark," rel="payment," and rel="me," according the the consensus of the IETF community members, link relations are now first class citizens with a centralized Registry where they can be found. The IETF is a nearly 25 year-old Internet standards body.

What does that mean? "Web linking is the most fundamental web building block," says Yahoo! standards wonk Eran Hammer-Lahav. "Typed links - links with a clear semantic meaning - existed on the web since the very beginning, but for the most part lacked any generally acceptable definition... Agreeing on what a link type means across formats is critical for a semantically rich web, in which links are used to provide a richer user experience, as well as better search and automation features."

]]> LinkRelations.jpg
Above: Seven of the forty two Link Relations currently included in the Registry

IETF RFC 5988 is the document authored by Yahoo's Mark Nottingham for the IETF that explains the standard and this is the registry where you can find the 42 relations that have been accepted so far.

Hammer-Lahav continues:

"What the new RFC does is establish a registry and a simple process for defining new link relation types across formats (HTML5, XRD, Microformats, HTTP headers, ATOM, etc.).

"What is important about the new registry is its lightweight approach, allowing most stable documents to be used as reference specifications for new relation types. The process is used as a sanity check, and not as another bureaucracy slowing down innovation."

Hammer-Lahav says the HTML5 community has been particularly active in submitting Rels for inclusion in the registry. See also the Web Hypertext Application Technology Working Group's HTML5 rel directory. (Details)

Rich links, expressed across multiple languages, in a standardized semantic format, promise to act as a platform where programatic analysis can be performed on scale - making it far easier than ever before to bring together diverse resources from all around the web to create new experiences for application users.

Below: The Firefox extension Identify uses the rel="me" code to string together all the social networks a person uses when looking at their profile on a single network.

The rel="me" link, for example, has enabled services like the Google Social Graph API to string together semantically marked-up profile pages owned by a single person across multiple different sites and social networks. That makes it easy to draw a picture of who a person is across different services they use, because their profile pages link out to their blogs or Twitter accounts, for example, using the rel="me" link relation.

That kind of cross-site functionality could be built for everything from bookmarks to content licenses to payments and more if the IETF's new web link relations markup proliferates.

]]> Discuss]]>
http://www.readwriteweb.com/archives/web_linking_gets_deeper_with_new_standard_for_link.php http://www.readwriteweb.com/archives/web_linking_gets_deeper_with_new_standard_for_link.php News Fri, 29 Oct 2010 11:42:18 -0800 Marshall Kirkpatrick
Semantic Startup Evri Goes Mobile Evri, a semantic content discovery engine for real-time content, has decided to switch gears and change its focus. "Going forward, we consider ourselves a mobile company," said Evri CEO Will Hunsinger. To that end, the company is now launching a handful of new mobile applications that use Evri's core technology to enable the discovery of relevant news and media on the topics you care about.

Currently, the mobile lineup includes apps for tech, football, baseball, celebrity gossip and rock music, but dozens more are in the works. There's even an iPad app coming, which Evri describes as a "smarter Flipboard."

]]> To be clear, Hunsinger says Evri isn't walking away from the Web - "We love the Web," he says. But for Evri, as for many companies today, the future is in mobile. "Mobile devices are ideally suited for what we're trying to do," he explained. Mobile users are consuming content and Evri is a company whose goal is to improve content consumption. Going mobile just makes sense.

How Evri Uses Semantic Tech to Deliver the News

Evri's new apps aim to bring you the content you're most interested in and passionate about, with the signal filtered from the noise and the content distilled down to what really matters.

The semantic technology Evri is known for enables this, as it helps to understand what content is popular, but also what it means. For example, its football application wouldn't confuse Will Smith, the actor, with Will Smith, the football player for the New Orleans Saints. And after discovering the content, Evri can then rank it based on recency, relevancy and popularity.

It also doesn't require a large group of curators to make this happen. Instead, the ratio is more like one curator per hundreds, maybe thousands, of pieces of content. The curator's job consists only of pointing the technology in the right direction. This is curation at scale.

Mobile Apps Available Now: iPhone, Android (iPad Coming Soon)

Each mobile app features multiple views of the content it provides: a news view, a video view, a Twitter view and an "EvriThing" view, which is all the views combined.

The Twitter view could be an intriguing alternative to using Twitter lists. Although lists are currently one of Twitter's best features for curating and filtering content, Evri's "Twitter" view on a topic functions like a dynamic Twitter list of what's interesting, current and relevant.

The upcoming iPad app will do much of the same, except on a larger form factor. It will also let you add content from sources that matter to you - like your own Twitter and Facebook accounts. "Flipboard doesn't get popularity," said Hunsinger. It doesn't know what content should be featured bigger or smaller within the app based on popularity and other factors.

But that's only a temporary glitch - Flipboard acquired semantic data-analysis company Ellerdale and is in the process of integrating that tech into its backend to better determine the relevance of the information it displays.

Design Needs Work

Flipboard has great design, too. And Hunsinger says his company has learned from that design and will likely implement similar interactions. Flipboard didn't invent the magazine, after all. It just got it right on the iPad.

But when it comes to design, this is an area where Evri has some serious ramping up to do, especially if it wants to take on Flipboard. The mobile apps are somewhat garish in their color choices and include hard-to-read fonts. GigaOm's Liz Gannes even said "the Evri app design currently hurts my eyes." It's hard to disagree, and that doesn't bode well for the company's iPad plans.

But like Gannes, who said the idea has "promise," we see the potential here. Semantic technology plus content discovery plus mobile is a recipe for success, is it not?

The Evri apps are being announced today at GigaOm's Mobilize conference in San Francisco. They will be available on both iPhone and Android. The apps are monetized through affiliate advertising partnerships with Amazon, Apple and others so they're available for free.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantic_startup_evri_goes_mobile.php http://www.readwriteweb.com/archives/semantic_startup_evri_goes_mobile.php Mobile Thu, 30 Sep 2010 07:05:20 -0800 Sarah Perez
Yahoo Kills SearchMonkey, Rolls Back BOSS, Says YQL Will Live One year ago, Yahoo announced that it had signed a deal to replace its own search engine with Microsoft's Bing - but the big question for us was what that meant for all the incredible search-related programming infrastructure Yahoo makes available to outside developers. Today Yahoo began offering the beginning of an answer to that question.

In a post on the Yahoo Developer Network blog, VP Social Platforms at Yahoo Neal Sample broke the news.

]]>
  • Yahoo's semantic search enrichment program SearchMonkey will be closed down October 1st. High hopes that the company's throwing its weight behind structured markup for web pages would herald a new era of a ubiquitous semantic web never panned out. In March of 2008 we wrote, And Nerds Became Kings: Yahoo! to Announce Semantic Web Support. Sorry! Search Monkey was to be "a component of a major overhaul at Yahoo! across all of its properties to 'rewire' for the social graph and data portability."
  • White label Build Your Own Search Service (BOSS) may no longer be free and will begin to show Bing results. BOSS has incredible potential and if it lives on, that's good news. Unfortunately, instead of people all around the world singing from the rooftops about this super-cool program, 18 months after BOSS launched the public reaction remains tepid.
  • Geo: "We will be evaluating all our Geo, Maps, and Local APIs--updating or shutting down some of them, and working with our strategic partner, Nokia, on others. We will work with our developer community to ensure a smooth transition in all instances and we will share more details about these decisions in September." Bummer.
  • MyBlogLog APIs will be shut down. The future of the service is unclear, Yahoo says. Because, you know, streams of data made up of the web history of people, tied to their associated social networks and even their faces - that's just not very valuable data. (I'm rolling my eyes and crying at the same time while typing this.)
  • YQL, the powerful Yahoo Query Language favored by developers who want to pipe data from one API around the web to another, is safe because Yahoo used it extensively on its own home page. That's good. People would freak out if YQL shut down.
  • Social bookmarking service Delicious was one service we were concerned about last year but it's no longer ruled by the Search team. In fact, we're told that Delicious has seen a fresh infusion of new blood and has big plans for the near-term future.
  • It's hard not to be disappointed by news like this, but perhaps some innovative engineers will be set free to work on other things. And perhaps some unfulfilled dreams will be allowed to die, so that they might be reborn to try again elsewhere.

    Paul Graham's essay about what happened to Yahoo is worth reading, as well. (As is this counter argument from Yahoo evangelist Tom Hughes-Croucher.)

    There are other cool projects in the works at Yahoo. I hope they find more success than these ones have, with the exception of YQL.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/yahoo_kills_searchmonkey_rolls_back_boss_says_yql.php http://www.readwriteweb.com/archives/yahoo_kills_searchmonkey_rolls_back_boss_says_yql.php News Tue, 17 Aug 2010 10:53:06 -0800 Marshall Kirkpatrick
    Beyond Social: Read/Write in The Era of Internet of Things This blog was founded in 2003 on the philosophy of a read/write Web - a Web in which people can create content as easily as they consume it. This trend eventually came to be known as Web 2.0 - although others preferred Social Web - and was popularized by activities like blogging and social networking.

    It would be easy to say that the 'social' element is still the primary part of today's Web, since the popular products of this era enable you to say what's on your mind (Facebook), what's happening (Twitter), or where you are (Foursquare). All of these are mostly social activities. But more significantly, these and other products output data that will increasingly be used to build personalized services for you.

    ]]> The more data there is, the better Web services will be at delivering personal value to you. While part of this increase in data is coming from social data from the likes of Facebook and Twitter, much of it is coming from the Internet of Things and data uploaded by governments and organizations. In short: the read/write Web is now much more than the Social Web.

    How We Went Beyond Social

    So how did we arrive at a Web that is less about social and more about you?

    It's not how much content you consume that is important, it's about what you do with data.

    After the peak of Web 2.0, we (meaning all of us) began to get overwhelmed with the choice of content available. We thought we had to actually 'read' as much of that content as possible. So we watched YouTube, chatted on MySpace and Facebook, read blogs, followed lots of people on this new thing called Twitter, and so on. By the end of 2008, we were exhausted by all of this CONTENT. How could we possibly keep up?!

    In 2010, we're still struggling to digest all of what social media throws at us. However, a shift has been happening since 2009 which alleviates the problem. We've begun to realize that it's not how much content we consume that is important: it's what we do with all of the social and other data available to us. The social is still important, but the resulting data is - slowly - becoming more important because it can be analyzed, filtered, mashed up and personalized.

    Structured Data & Internet of Things

    Two relatively new trends are driving this change.

    If I was an entrepreneur or developer, I wouldn't be thinking about social anymore. I'd be thinking: How can I use all of this data and build on top of it?

    The first is the increasing amount of data being uploaded to the Web by governments, organizations and people. Much of this data is being structured using Semantic Web technologies like RDFa or microformats. In other words, it is categorized and encoded with meaning that machines can process. Recent examples include U.S. and U.K. government data, Best Buy's store and product data and Facebook's Open Graph.

    And then we have the Internet of Things: an evolving trend where real-world objects and 'things' are connected to the Internet via technologies such as sensors and RFID tags - everything from cars to houses to roads and more. The upshot is that the Web is about to experience a data explosion, as billions of sensors and other data input and output devices upload exabytes of new data to the Web.

    How do We Use This Data?

    If we add together social data from the likes of Facebook and Twitter, data from governments and businesses, and data from sensors and RFID, this is a huge amount of data. Most of it isn't for "consuming." Rather, the value of all of this new Web data will be in how it's filtered, mixed together ("mashed up") and personalized in new Web services - most of which haven't yet been built.

    Adam Greenfield is one of the leading thinkers of the Internet of Things; I interviewed him earlier this year about his book called Everyware. Greenfield recently wrote a post describing a near future scenario for non-technical people using the Web. He posited a use case where his mother would be able to plan a train trip to see her son, by creating an "ad-hoc service" that tapped into the Web and utilized real-time data sources.

    In 2010, his mother would have to find and "read" several different applications in order to plot her travel schedule, and some of that information isn't even currently on the Web. Greenfield envisions a near future where his mother can essentially "write" her requirements into her mobile or other device, and the Web will deliver a personalized schedule to "read." You can view a diagram of Adam's concept here (PDF).

    Don't Think Social, Think Data

    Successful products in the Web 2.0 era had a strong social element: YouTube, MySpace and Flickr were a few relatively early examples. In the current era of the Web, which began to form in early 2009, the focus has shifted from social to data-driven software. Successful products of this era of the Web will be ones that filter, structure and personalize this vast amount of data coming onto the Web.

    So if I was an entrepreneur or developer wondering what to build for this era of the Web, I wouldn't be thinking social. I'd be thinking: How can I use all of this data and build on top of it? There are incredible opportunities out there for you.

    This current era of the Web doesn't have a name, which is probably a good sign! One thing is for sure though: It's still a read/write Web - only now you're reading and writing data from much more than just social services. You're increasingly interacting with "things," organizations, governments - virtually anything that can connect to the Web.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/beyond_social_web_internet_of_things.php http://www.readwriteweb.com/archives/beyond_social_web_internet_of_things.php Internet of Things Mon, 19 Jul 2010 02:39:39 -0800 Richard MacManus
    Google Makes Major Semantic Web Play, Acquires Freebase Operators Metaweb googlemetaweb_jul10.jpgThe Semantic Web is all about structuring data so that humans and computers can more easily interpret the Web and discover relevant data for a wide variety of purposes. Google, a company built on the ability to advertise based on contextual data, announced today a major acquisition in the Semantic Web space. As of today, Metaweb, maker of Freebase and a leader in the Semantic Web, has joined forces with Google.

    ]]> ReadWriteWeb's Guide to The Semantic Web:
    1. Semantic Web Adoption by Facebook, Best Buy & Others
    2. It's All Semantics: Open Data, Linked Data & The Semantic Web
    3. The State of Linked Data in 2010
    4. Top 10 Semantic Web Products of 2009
    5. ReadWriteWeb Interview With Tim Berners-Lee

    Freebase is a massive open-structured database of information about almost anything, including books, movies and music. In fact, Google already has a relationship with Freebase, pulling in its information to provide intelligent search results within Google News. With the acquisition of Metaweb, Google can now leverage the company's tools and data even further, especially within basic Web search results.

    "This is a huge win for the Semantic Web," Alex Iskold, founder and CEO of AdaptiveBlue, the semantic technology company behind GetGlue.com (and occasional ReadWriteWeb contributor), told us. "It could not be bigger, because really, we had the biggest company on the Web buy the biggest player in the Semantic Web space."

    Google already provides some smart search results, including basic math, sports scores and birthdays of public figures, to name a few. For the most part, however, Google merely serves up links to Web pages; knowing more about what is behind those links could allow the search giant to provide better, more contextual results. To get a better idea of how that could happen, have a look at the video below.

    Microsoft made a similar purchase when it acquired Powerset two years ago. Since then, Bing has bested Google in terms of providing smart search results, and has been nibbling at its market share for search. In an effort to keep Bing from eating its semantic lunch, Google is taking Metaweb's technology and data under its wing.

    freebase_jul10.jpg"What about [colleges on the West Coast with tuition under $30,000] or [actors over 40 who have won at least one Oscar]? These are hard questions, and we've acquired Metaweb because we believe working together we'll be able to provide better answers," writes Jack Menzel, Google's director of product management.

    Metaweb says that Freebase will remain free and open as always, and will be improved upon due to the Google acquisition. The service's quarterly downloadable data dumps will now be served up weekly, and the company hopes the acquisition will encourage more companies to contribute to Freebase.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/google_buys_semantic_web_database_metaweb.php http://www.readwriteweb.com/archives/google_buys_semantic_web_database_metaweb.php Semantic Web Fri, 16 Jul 2010 12:32:00 -0800 Chris Cameron
    Extractiv Launches "Semantics as a Service" Platform Extractiv has quietly launched a service that crawls the Web for text on a specific topic, then transforms it into "structured semantic data." It's a direct competitor to Thomson Reuters' Calais product, which has been doing this for a couple of years now. This type of service is potentially valuable to media companies, search services and monitoring applications - because it turns messy, unorganized HTML content into data that is organized into categories and given other semantic 'meaning.'

    I sat down with Extractiv CEO Shion Deysarkar at the recent Semantic Technology conference in San Francisco, to find out how Extractiv intends to compete with the more well-known and big media backed Calais.

    ]]> How Extractiv Works

    Extractiv is a joint venture between Houston-based web crawling service 80legs and natural language processing company LCC (which created Swingly, a Q&A service).

    Deysarkar explained that Extractiv uses technology from both of its parent companies, to crawl the Web for content on a particular topic and then - using natural language processing - transform it into structured data. This video, produced by Extractiv, explains how the service might be used to crawl the Web for stories about smart phones over the past month.

    The output of the crawl and analysis can be JSON or XML, two formats commonly used for structured data. Support for RDFa, a popular Semantic Web standard, will be available "soon" according to the company. Extractive also offers an API, allowing customers to bypass the web site.

    Extractiv is free to try, but if you'll be a moderate or heavy user of the service then you'll have to pay (the pricing is as yet unavailable on the web site).

    Extractiv vs Calais

    Deysarkar told ReadWriteWeb that Extractiv is targeting "mid-market Calais customers" - such as media companies or those developing search applications, monitoring services, recommendation engines or aggregators. He also claimed that Extractiv goes beyond what Calais offers, because it can mine sentiment data (which is data about how people feel about products and services).

    Extractiv also wants to "provide access to more types of semantic information than any other provider." As CEO of partner company LCC, Andrew Hickl, put it, "if you're interested in baseball pitchers, a generic type like PERSON just won't cut it."

    At launch, Extractiv offers about 250 different types of named entities, but it aims to have more than 3000 different entity types by the end of the U.S. summer.

    Preparing For the Future of the Web

    The product is not aimed at the consumer market, so it's not for the faint hearted and you need to know what to do with all of that XML or JSON data! It also remains to be seen how competitive it is with Calais, which is a proven performer and has many reputable companies as its customers. Some startups have taken on Calais before, but fallen short.

    However, there is undoubtedly a need for products like Extractiv and Calais that turn the Web's unstructured data into meaningful, organized content. This is the future of the Web, because there is going to be a large increase in the quantity of data online over the next 5-10 years - and all of that data will need to be structured if we're going to be make the best use of it.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php Structured Data Mon, 12 Jul 2010 01:58:13 -0800 Richard MacManus
    W3C Pleased With Semantic Web Adoption by Facebook, Best Buy & Others At the Semantic Technology conference in San Francisco last week, I met up with two W3C representatives to discuss the current state of the Semantic Web - a Web of added meaning and structured data. W3C, the World Wide Web Consortium, is the official standards organization of the Web and is led by Sir Tim Berners-Lee. I spoke with W3C Semantic Web Activity Lead Ivan Herman and W3C eGovernment Interest Group leader Sandro Hawke.

    The main takeaway from the conversation was the rapid adoption of RDFa, by big commercial companies such as Facebook and Best Buy. It's come as a "very pleasant surprise" to Ivan Herman.

    ]]> RDFa Adoption in 2010

    RDFa is a simpler version of the primary language of the Semantic Web: RDF (Resource Description Framework). RDF is a complex and production-heavy language, so it has struggled to gain adoption over the past decade. The main purpose of RDFa is to add metadata to existing HTML or XHTML webpages, so it is easier to deploy than RDF.

    I opened by saying that at last year's SemTech event, adoption of Open Data was the big theme. This year, adoption of RDFa seemed to attract the most chatter in the hallways.

    Ivan Herman agreed, saying that it was a "very pleasant surprise [that] there is a buzz around RDFa." Herman remarked that "RDFa is suddenly picking up and it may become the single biggest source of RDF data, aside from relational databases." He added that RDFa is "easy to add and when you see Facebook or others adding RDFa data it's really exciting."

    How Facebook is Using RDFa

    Indeed, Facebook's adoption of RDFa is exciting. However it should be noted that Facebook is not using pure RDFa; and this is where a new standard called RIF comes in.

    At SemTech, W3C announced RIF: Rule Interchange Format. According to Ivan Herman, it is "two standards in one." Firstly, it's a format for exchanging rules between one rules system and another. For example a set of email spam rules that can be exported for another person to use. Secondly, RIF defines a rule language for semantic web data - similar to what can be done with ontologies. Herman said that it enables "simpler things than major ontologies."

    Simplicity is a key attribute in the adoption of RDFa. It's also something that Facebook emphasizes (which we will explore more in a follow-up post based on interviews with Facebook people).

    According to Sandro Hawke from W3C, Facebook's Open Graph platform uses RDFa "in an abbreviated, not really good modeling way." He said it's because "they [Facebook] need to make it simple enough that everyone can use it." He thinks though that Facebook made the right choice. Hawke explained that RIF "is a way to bridge from that [Open Graph markup] to the more standard modeling that we see in the rest of the Semantic Web."

    Hawke sees Facebook's Open Graph as "the real killer app for RIF right now."

    Others Adopting RDFa

    Another example of RDFa adoption is Best Buy adding RDFa to their entire product catalog, which has resulted in benefits in SEO and cost savings. We will write more about this in a follow-up post.

    UK retailer Tesco is doing the same as Best Buy. Drupal 7 is also adding significant support for RDFa. It's the next version of Drupal, a publishing system used by websites like the White House and World Heath Organization. So if you're the manager of a site that runs on Drupal 7, you won't have to do anything - data will automatically be in RDFa format. Other adopters of RDFa include the Library of Congress and eGovernment.

    W3C started a new RDFa working group at the end of January, to make a 1.1 version of RDFa. The main goal is to simplify the job of authoring RDFa within HTML. Also, an API for RDFa will be defined.

    So overall, the W3C is very pleased with RDFa adoption - although Herman added with smile that "we are never pleased enough."

    Image credit: Semantic Web Rubik's Cube, dullhunk

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/w3c_pleased_with_semantic_web_adoption.php http://www.readwriteweb.com/archives/w3c_pleased_with_semantic_web_adoption.php SemTech 2010 Tue, 29 Jun 2010 22:36:21 -0800 Richard MacManus
    How Twitter Annotations Could Bring the Real-Time and Semantic Web Together twitter_pillow_jun10.jpgJust because the new iPhone arrived in stores today doesn't mean the rest of the technology world shut down. In fact, today in San Francisco the 2010 Semantic Technology Conference continued its week-long series of talks and sessions about the semantic Web - the ability to understand and intelligently interpret content from the Web. A fascinating example of how the semantic Web is colliding with the real-time Web is through Twitter and the impending release of annotations - and Ph.D student Joshua Shinavier provided some fascinating semantic scenarios for their use.

    ]]> Twitter posts already contain plenty of metadata that allows for smart filtering and organization, including date and location. With annotations, however, the metadata possibilities will be literally endless. Tweet metadata could eventually contain information or links based on words or phrases in the tweet itself, other options added to the tweet, or even other external data like the weather in the senders location at the time it was sent. Imagine being able to add an infinite number of hashtags to a post without wasting precious characters.

    As Shinavier points out in his presentation (see slides above), Semantic databases could then plug into the annotation metadata and provide real-time semantic information to those who seek it. Using existing databases like GeoNames, Linked Movie Database and FOAF (Friend of a Friend), very specific searches for genres of tweets can be collected. Searchers could ask for tweets about "places in developing countries," "English-language movies starring Chinese actors," or "songs by artists my friends like," says Shinavier.

    semantic_firehose_jun10.jpgShinavier likens annotations to the real-time version of attributes from RDF (Resource Description Framework), which provide websites with extended semantic metadata. Since Twitter's annotations will be easy to implement for developers, the sheer size of the network of use will create the "long tail" of real-time semantic data, he says. The application of the semantic Web to annotations will make it easier for developers to create richer applications, which benefits the end user.

    In basic terms, the Web is getting smarter. Not Skynet smart, but smart, and with the mashup of the real-time fire-hose of information coming from services like Twitter, the semantic Web can provide even deeper and richer interactions for users. Personally, I am highly anticipating the release of annotations because I know brilliant developers are going to create amazing applications that leverage metadata. Throwing in semantic recognition only sweetens the pot.

    Image from Flickr user Colectivo Mambembe.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/how_twitter_annotations_could_bring_the_real-time_semantic_web_together.php http://www.readwriteweb.com/archives/how_twitter_annotations_could_bring_the_real-time_semantic_web_together.php Real-Time Web Thu, 24 Jun 2010 18:40:00 -0800 Chris Cameron
    Google's Semantic Web Push: Rich Snippets Usage Growing At the Semantic Technology conference in San Francisco today, Google gave an update of its rich snippets initiative - which adds extra information to Google search results. For example, showing restaurant review ratings. It's an experimental Semantic Web feature, but today's update shows that usage is increasing and Google wants to ramp it up significantly.

    Rich snippets was announced in May last year and began to be seen in results around October. At the SemTech panel today, Google's Pravir Gupta noted that rich snippets impressions have grown four-fold globally since October 2009, with a two-fold increase on the US/English Web. Rich snippets is available in more than 40 languages.

    ]]>

    Gupta told the SemTech audience that there are now more than 50 reviews sites using rich snippets, for example sites that offer restaurant reviews. Also there has been uptake on social networking sites, like Facebook and LinkedIn.

    The most common use cases are events (which was added in January) and recipe formats. Google is adding support for more formats, such as video, local businesses and shopping.

    Google is using structured data open standards such as microformats and RDFa to power the rich snippets feature. As the below chart shows, microformats is more common than RDFa for this feature.

    Google spent a good deal of today's panel continuing its drive to get webmasters to adopt rich snippets. It has a tool called the Rich Snippets Testing Tool, which helps publishers utilize rich snippets.

    Finally, Kavi Goel from Google talked about how Google can accelerate growth of the ecosystem, noting that less than 5% of webpages currently have semantic markup. Google wants to see this rise to 50% or more. It is looking for critical mass, which includes adding more formats and encouraging more "beneficial peer pressure" for companies to support rich snippets. Goel cited restaurant review sites as an example - it's not just Yelp which supports it, but other restaurant review services too.

    Rich snippets is an example of how the Semantic Web is being adopted by large and powerful Internet companies, so it's encouraging to see that Google is pushing for rapid adoption.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php Google Thu, 24 Jun 2010 14:50:54 -0800 Richard MacManus
    Primal: Publishing at its Most Basic Tomorrow at the 2010 Semantic Technology Conference, Primal will launch a new publishing platform. It's grandly described as a "semantic synthesis platform," but simply put it's a publishing platform that automates the production of content. What's more, the resulting web pages include no original content. It's all aggregated from other sources.

    So in many ways this is reducing Web publishing to its most basic form, devoid of new content. Is this "automated content manufacturing," as founder Paul Sweeney described it to me today, useful to people?

    ]]> The stated goal of Primal is to deliver a "personalized content experience that is based directly on [a user's] individual thoughts and ideas." Primal Pages, the first application of this platform, is a webpage builder that enables a user to create a web presence based on their topics of interest. The content sources include Wikipedia, Yahoo! and Flickr.

    The use cases of Primal, according to Sweeney, include a teacher building a website of course materials for their students and a small business providing information to support their product.

    In my initial tests today, Primal seemed a little raw - although the UI is slick. The brainstorming and 'find content' aspects of the product are essentially search features that surface keywords and media from sites like Wikipedia and Flickr.

    What's most interesting about Primal is the publishing aspect, the webpage builder. This is well designed and easy to use. Within a matter of minutes I was able to 'author' a webpage about my favorite band, The Velvet Underground.

    However, as noted above, it had no original content on it - which means it doesn't add much value to the Web as a whole.

    Primal appears to be competing with other lightweight publishing services, such as Tumblr and Posterous. More so, the so-called Geocities 2.0 startups like Weebly and Yola. The difference is that Primal is much more automated than any of those services, which takes a lot of creativity out of publishing.

    I asked Sweeney how he thought Primal compared to Demand Media, the content farm that is pumping thousands of pieces of content onto the Web each day. He acknowledged that Primal will also pump a lot of (unoriginal) new pages onto the Web, but he said that Primal content is architected by the end user and not the company.

    Despite the rather hyperbolic terminology in the company's press release (an upcoming product called 'Primal Thought Networking' apparently "supercharges your thinking by remembering, organizing and connecting your ideas in your own machine-readable thought network"), the product itself is interesting because it takes Web publishing down to its very basic bare bones. Whether this is something that enough consumers need or want - and whether it's good for the Web - is yet to be determined.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/primal_publishing_at_its_most_basic.php http://www.readwriteweb.com/archives/primal_publishing_at_its_most_basic.php Publishing Services Tue, 22 Jun 2010 21:00:01 -0800 Richard MacManus
    The Fate of the Semantic Web pew_internet_logo_sep09.pngThis month, the Pew Research Center's Internet & American Life Project released a study on the semantic Web. The Web will get smarter. It will become more useful. But will the "semantic Web" become the reality that many envision?

    Lee Rainie of Pew and Janna Quitney Anderson of Elon University's Imagining the Internet project asked 895 experts to "predict the likely progress toward achieving the goals of the semantic web by the year 2020."

    ]]> Some 47% agreed with the statement: "By 2020, the semantic web envisioned by Tim Berners‐Lee will not be as fully effective as its creators hoped and average users will not have noticed much of a difference."

    Some 41% agreed with the opposite statement, which posited: "By 2020, the semantic web envisioned by Tim Berners‐Lee and his allies will have been achieved to a significant degree and have clearly made a difference to average internet
    users."

    Among the more interesting results to me is how "critics noted that human uses of language are often illogical, playfully misleading, false or nefarious, thus human semantics can never be made comprehensible to machines."

    How much of the "tedium" of, well, human understanding, can machines take away? How much would we want them to? I'm not alone in asking this. Two gentlemen with appreciably more confidence have addressed the question: Cory Doctorow and Clay Shirky. The latter said the semantic Web "requires too much coordination and too much energy to effect in the real world, where deductive logic is less effective and shared worldview is harder to create than we often want to admit."

    On the other side of the semantic aisle is Bryan Trogdon, president of The Semantic Group, who declared that "within the next 10 years, the semantic web will take us from the age of information to the age of knowledge. Simple tools and services will allow individuals, corporations and governments to quickly glean meaning from the vast amounts of data they have compiled."

    O brave new world! That has such self-organizing data in't!

    The lists of respondents, pro and con, are extremely readable, though a reader will probably find one table-slappingly accurate and the other either a seaweed dance of credulous dreaminess or a dramatic monologue of surly nay-saying. (In other words, it's super fun.)

    TragicComicMasksHadriansVillamosaic.jpg
    Hence loathed melancholy... Hence vain deluding joyes...

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/the_fate_of_the_semantic_web.php http://www.readwriteweb.com/archives/the_fate_of_the_semantic_web.php Semantic Web Thu, 13 May 2010 17:30:00 -0800 Curt Hopkins
    Startup Rolls Out Facebook Open Graph Markup for 300 Major Sites Last month Facebook announced a new standardized way to mark up web pages concerning things like books, movies, music and more. It was called the Open Graph Protocol and was ostensibly intended to make the web comprehensible to computers building a profile of your interests across many different websites.

    Unfortunately, it wasn't implemented very well, according to GetGlue CEO Alex Iskold. Iskold, a long-time contributor to this site, penned the most extensive guide to understanding Facebook's Open Graph and a critique of how it was constructed, implemented by launch partners and by Facebook itself. Yelp, IMDB and Pandora for example were all launch partners but have implemented the system incompletely or not at all, even several weeks after launch. Now Iskold has taken his own company's competing semantic markup of pages around the web and used it to build a replacement for a large part of the Facebook code in the wild - using Facebook's own format. Developers interested in understanding the content across 300 major websites, in Facebook's own terms, can now find a robust source of data at GetGlue.

    ]]> Iskold says of the struggles to roll out Facebook's protocol:
    "We saw that a lot of initial partners didn't implement Facebook Open Graph protocol correctly. GetGlue already has over 15 million entities and all these pages indexed. So we decided to add the adapter and use our experience with semantics to help people get the markup done right. What we've done is not really a replacement, it is more like an implementation of Facebook Open Graph based on GetGlue index and database."

    GetGlue is a 3 year old New York City startup backed by Union Square Ventures and RRE Ventures. Its core product provides recommendations of music, books, movies and more based on semantic analysis of the sites a user visits around the web.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/facebook_open_graph_protocol_implementation.php http://www.readwriteweb.com/archives/facebook_open_graph_protocol_implementation.php Social Networks Mon, 10 May 2010 11:28:07 -0800 Marshall Kirkpatrick
    Does Facebook Really Want a Semantic Web? fb_open_graph.pngTwo weeks ago, Facebook has announced a major new initiative called Facebook Open Graph. This is an attempt to not only re-imagine Facebook, but in a lot of ways, an attempt to re-define how the Web works. We wrote in details about the implications of this move for all interested parties.

    A big part of the announcement is Facebook's vision of a consumer Semantic Web. In this new world, publishers have an incentive to annotate pages by marking up activities, events, people, movies, books, music and more. The proper markup, would in turn, lead to a much more interconnected Web - people would be connected with each other across websites and around the things they are interested in.

    ]]> Directionally, this vision is both correct and important. We've been talking about pragmatic approach to the Semantic Web for sometime, and we're excited at the possibility of it finally happening. Yet, two weeks after the announcement it is becoming more and more apparent that there are gaps in Facebook's offering and intentions. A close look reveals that perhaps Facebook's intent is not to make the Web more structured, but instead to engineer a way for more data - mostly unstructured - to flow into Facebook databases.

    "Instead, it appears that semantics is an afterthought in the race to capture user identity and information, in exchange for sending publishers the traffic."

    As you will see from the rest of the post, it appears that getting semantics right has not been a big priority for Facebook, at least not prior to the announcement. Here are the issues we identify:

    1. Open Graph Protocol does not support object disambiguation
    2. Open Graph Protocol does not support multiple objects on the page
    3. Launch partners have not implemented Open Graph Protocol correctly on their sites
    4. Facebook does not have the markup on its own pages that it asks the world to adopt
    5. A growing amount of user profile data is full of duplicates and ambiguity

    Concerns with Open Graph Protocol

    A week ago, we complimented Open Graph Protocol for its simplicity, but upon closer look we are seeing a couple of flaws. First of all, there is no way to disambiguate objects. For example, two movies that have the same name would be considered to be the same movie. A proper way to deal with this sort of thing is to introduce secondary attributes like director or a year that can help identify specific object, but the protocol does not define secondary attributes.

    The second issue is that there is no way to markup the objects inside the page. In its current version, the protocol only supports declaring that entire page is about a person, a news event, a musician or a movie, but there is no way to identify objects inside the page. This is a big use case for bloggers and review sites - each blog post typically mentions many entities, and it would be nice to support this use case from the start.

    Both of these shortcomings are easy to correct. The nice thing is that the protocol is simple and minimalistic, so adding the bits to handle disambiguation and multiple entities is straightforward. The other things that we are going to discuss, are much more troublesome

    Launch Partners - Why No Markup?

    The truism of making the Web more structured is adding more markup. No matter how limited, having markup on the pages is always better than not having it. When Facebook announced the Open Graph Protocol, it highlighted several sites that are already using it. Among them were Yelp, IMDB and Pandora. We took a look to see how exactly these sites are marking up their pages. What we found is rather surprising - none of these sites implemented markup correctly. We looked at the How to Train Your Dragon movie on IMDB, Brad Pitt's page on IMDB, the Muse page on Pandora and the Acquagrill page on Yelp.

    This is what Facebook defines as required properties:

    fb_protocol_ai.png

    And this is what we found on the actual partner pages:

    fb_partners_ai.png

    So what does this mean? It means that Facebook implemented special handling for these sites. When a user likes a movie on IMDB or when she likes a movie star, Facebook can't really tell the difference since IMDB is not passing correct information via the protocol. The only reason it works is because IMDB is explicitly hard coded by Facebook.

    The roll-out for launch was not generic, but custom, targeted more towards PR than correctness. Why would Facebook allow this instead of having partners implement correct markup is unclear. It is so easy to implement, and the partner pages already have all the necessary information. We conclude that enforcing correctness was simply not a priority for the launch.

    Eating Your Own Dog Food

    As it turns out, not only did publishers not markup pages - neither did Facebook. At the time of writing of this post, none of the entity pages on Facebook.com have Open Graph markup. So much for being open - Facebook's own pages remain closed. Ironically, it might not be because the company does not want to markup the pages, but instead that it can't. At least not yet.

    Figuring out what is on the page is actually not a trivial problem. This is what semantic technologies that Freebase, Powerset, Open Calais, Evri, Zemanta and GetGlue, among others, have been building over the past several years. To be able to markup the pages correctly, especially the ones created by the users, Facebook needs to run them through a sort of semantic processing and disambiguation. This isn't a trivial matter.

    Unstructured data on user profiles

    fb_movies_ai.png

    All of this comes full circle to impact the users. As the Like buttons spread through the Web, so is the unstructured, duplicated data spreading through user profiles. Absence of semantics creates fragmented connections and noise around the Web.

    Below is the listing of movies that I liked and fetched via Facebook Open Graph API. How to Train Your Dragon shows up twice, because I liked it once on IMDB and then also on Fandango. Friends that are see on Fandango page are different from the ones I see on IMDB. And worst of all, all this uncleaned data is showing up on my profile - movie title contains a year in one case and the originating site in the other case.

    So right now Facebook does not correlate things across sites. Instead, it just captures the information as is, hoping to maybe clean it up later.

    Conclusion: A different goal?

    All of these facts when added together lead to the obvious conclusion: Facebook's goal is not to create a better, more structured Web. Instead, it appears that semantics is an afterthought in the race to capture user identity and information, in exchange for sending publishers the traffic.

    As more and more data flows into Facebook via the Like buttons, Facebook and publishers are getting the benefit of recycling friends through the content on sites around the Web. But at the same time, the data in user profiles is becoming more and more noisy. Since not as many users are paying attention yet, it just looks silly under a closer inspection.

    But to be able to power recommendations, to make social plugins a success and to facilitate good user experience, Facebook will literally need clean up its act. Duplicate and dirty data will be a big turn off for the users, and the longer this problem goes on the more difficult it is going to be for Facebook to deal with it.

    We will see in coming weeks and months how the social networking giant will handle this issue. In the mean time, we'd like to reverse the tables. Please tell us what do you think about Facebook's semantic Web ambitions. Should they have gotten the core bits right first before the launch, or is this fine and they will be able to quickly catch up?

    Disclaimer: Alex Iskold is a founder and CEO of GetGlue.com, a social network for entertainment. GetGlue developed the ability to connect users across different sites through a combination of browser addons and semantic databases in the cloud.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/does_facebook_really_want_a_semantic_web.php http://www.readwriteweb.com/archives/does_facebook_really_want_a_semantic_web.php Semantic Web Thu, 06 May 2010 14:07:00 -0800 Alex Iskold
    Facebook Open Graph: The Definitive Guide For Publishers, Users and Competitors Facebook just shook the tech world by announcing several major initiatives that collectively constitute an aggressive move to weave the social net on top of the existing Web.The rumors were that the leading social network would launch a "Like" button for the entire Web. Instead, Zuckerberg & Co. unveiled a bold and visionary new platform that cannot be ignored.

    The bits of this platform bring together the visions of a social, personalized and semantic Web that have been discussed since del.icio.us pioneered Web 2.0 back in 2004. Facebook's vision is both minimalistic and encompassing - but its ambition is to kill off its competition and use 500 million users to take over entire Web.

    ]]> Whether we like it (pun intended) or not, we have to understand what this move means. It impacts users, publishers, competitors and, of course, Facebook itself. In this post, we summarize what Facebook announced and ponder the impact this will have on everyone.

    Facebook Open Graph: Publisher Plugins

    The Open Graph is a set combination of publisher plugins, semantic markup and a developer API.

    "This new API turns Facebook into a read/write storage of user's tastes."

    Login with Faces & Facepile: The simpler publisher plugins enhance Facebook Connect. They makes it easy and compelling to sign in by leveraging Facebook cookies and showing faces of Facebook friends who are already members of the service.

    Like Button and Like Box: These plugins add the liking feature to any content, typically the whole page. Both can be enhanced with semantic markup, described below. But the very basic intent for these is to get users to Like on the site and post a link to Facebook, which is then permanently stored on a user's profile and points back to the original site.

    Activity Feed and Live Stream: These plugins show static and dynamic activity on the site. Activity Feed lists recent likes and comments from the site, while Live Stream shows a real-time view of activity on the site and is intended for interactive events.

    Recommendations: This plugin surfaces personalized recommendations for the user based on what friends and everyone else is liking on the site. It is intended to drive the users to other pages on the site.

    Facebook Open Graph: Semantic Markup

    Facebook announced simple, RDF-based markup to make the plugins smarter. In a nutshell, the markup enables publishers to say what object is on the page - a movie, a book, a recording artist, an event, a sports team, etc. This automatically enables semantics, that is, an understanding that the user is not just interacting with a webpage, but that he or she is liking a specific kind of thing. Semantics then leads to bucketing of the objects into categories like books, movies, music, etc., and gives rise to all sort of applications, including personalized recommendations.

    Perhaps even more importantly, the markup helps Facebook connect the users across common interests across different websites. For example, if both Pandora and Last.fm annotate a page about The Beatles using Facebook's markup, then users will be able to see their friends, who like the Beatles across different sites. This is very significant, because the data around friends is sparse and scattered around the sites. Previously, Facebook would surface this data in the stream without persisting it. Now, the information about a friend's likes of movies, music, books, recording artists, events, sports team, etc. will be permanent on Facebook profiles and readily available in context around the Web.

    Facebook Open Graph: New API

    The new Facebook API is elegant and streamlined. It makes it easy to access user information (with permission of course) such as profile, friends, etc. All of the calls are REST based and return JSON objects. For example, my profile information can be fetched like this: http://graph.facebook.com/alexiskold. The authentication is based on OAuth 2.0 protocol and makes it simple not only to connect, but to also prompt for permissions to access user information.

    This new API turns Facebook into a read/write storage of users' tastes. And not just one user - all Facebook users.

    Implications for the Users

    happy_sad_face.jpgWith this release, Facebook asks users if they are willing to trade off privacy for personalization. To be clear, no personalization is ever possible without users telling a system about their tastes. What Facebook is asking for is necessary in order to then create personalized Web experience. Whether users want this sort of thing is a different question, but assuming that you want to know more about your friends you will.

    Friends' interests around entertainment, sports, travel, etc. will be categorized and available. It will be easy to figure out what your friends are into both on Facebook and around the Web. In addition, Facebook is going to be using its own engine to bring you recommendations for related content. This will further accelerate the discovery and cross linking between friends. This will likely further impact the amount of search people do around the Web. As Fred Wilson pointed out - passed links replace search.

    Yet, the crux of user implications is neither of the above, but one single issue: privacy. It is unclear at this point that this issue is a concern for actual Facebook users, but it is clear that tech world is raising its eyebrows: Marshall Kirkpatrick, Dave Winer, Jeff Jarvis and many others expressed their concerns. People are saying that not only Facebook will know too much about us (because Google is already there today), but that it will be able to control too much.

    Personally, I am skeptical that the average Facebook user is going to care all that much. People are notoriously naive about being watched on the Web, and this is likely to be no exception. More likely than not, Facebook users will enjoy the personalization aspects of the new platform and won't think much about it - until Facebook starts openly targeting them.

    This was not been part of f8 of course, but Facebook is likely to use the information for targeting. After all, advertising is a major part of its monetization already so why won't it make it even better? If this targeting is too spot on, lots of users will probably get annoyed. Facebook is likely to sooth them via Facebook credits and heavy discounts, negotiated because of their massive volume.

    How exactly users react remains to be seen, but they will probably like the new Facebook more because of increased relevancy and interaction with friends around the Web.

    Next page:Implications for Publishers

    Implications for Publishers

    publishers.jpgOn the surface, this Facebook offering is a no-brainer for publishers. Who does not want more social activity on their site? However, in reality this is far from a slam dunk. To understand why, consider two types of sites: sites that are either social networks or have social networking integrated, and the sites that have their own commenting and ratings systems. In the first camp you will find Last.fm, Flixster, Goodreads, etc. None of these sites were a launch partner, understandably so. Social connections around music, movies and books are their bread and butter as are the ratings, reviews and recommendations. If they switch to Facebook for all of this, what do they have left?

    So any site that already has social networking built in has to decide to abandon that before jumping into the Facebook Open Graph. The even worse problem is the ownership of ratings and comments. Are publishers really ready to give that up? Nobody seriously thinks that users are going to be rating through Facebook and then through the site again. So how is this going to work? It is unclear at this point, but it's likely publishers will ask for ways to replicate or export comments and likes that users sent to Facebook via their site. Perhaps an open API that allows publishers to manipulate the data is the answer, but it is easy to see how some publishers would be very concerned.

    "You don't need to look too closely to see that Facebook is creating a feedback loop, which includes it, users and the rest of the Web and excludes its competitors."

    However, if you run a website like eCommerce or a blog or a service like Pandora that currently does not have a lot of social built-in, this offering is a no-brainer as it will instantly start recycling your pages through the massive Facebook power of passed links.

    Implications for Competitors

    competitors.jpgThis is aggressive and brilliant move by Facebook - and Twitter, Google, Yahoo, MySpace, AOL, eBay, Amazon and others, except for Microsoft, should be really worried. It appears that Microsoft is content with just partnering with Facebook, perhaps rightly so. Possibly a Bing deal is in the works, which would make a lot of sense.

    For all other players on the Web, the worry is that Facebook is trying to close the loop in exclusively owning user eyeballs. Apparently Facebook is not content with just connecting people; it wants to connect people and things. And not only that, it wants to do it around the Web. And not just any people - friends. You don't need to look too closely to see that Facebook is creating a feedback loop, which includes it, users and the rest of the Web and excludes its competitors.

    There are several things that other big players might try to do, the worst of which is to try to mimic Facebook. The "me too" that we've seen way too many times recently has not worked, and will not work now. The second best choice is to try to block it. As strange as it sounds it might just work. Between publisher and user issues there are a lot of concerns, and a carefully orchestrated and coordinated campaign may seriously hurt this initiative. Remember, Beacon was brought down fairly quickly by a combination of user backlash and derogatory press.

    The third option - to embrace and extend this platform, to innovate on top of it - is likely to be the best move. Innovation has always trumped stagnation on the Web. The problem is that it might not be that easy to embrace this initiative. After all, it does not look like Facebook asked everyone to gather around the table and cooperate on this. It might not be open to cooperation, but if it is then this is the way forward.

    Technically speaking, what Facebook has done is elegant and correct. From markup, to plugins, to API, all of it is modern and awesome. The missing bit is that Facebook appears to be the only repository of data in this equation - and that makes the whole offering seriously closed. Publishers and users don't have a choice as to where to store the data. It is going to Facebook and Facebook alone. Perhaps there is a way to rework the system in a way that fixes that. We will look forward to see how this unfolds.

    Implications for Facebook

    zuckerberg.png Clearly this announcement is yet another turning point for Facebook. Before the conference Facebook was the biggest social network on the planet. If its vision actually happens, Facebook will be the biggest network of people and things on the planet- or to put it differently, it will be the taste graph of the planet.

    Obviously there is a different technology that Facebook will need to be building. It already perfected the social networking part, but semantic analysis, recommendation systems, vertical categories like movies and books, as well as having completely open read/write storage of tastes is completely new to the team. The biggest challenge that Facebook will face is to inject, re-deliver and most importantly make use of the data that is flowing into it.

    Facebook will be doing some serious number crunching and UI revamps to prepare for this next phase of its life. But perhaps the biggest experiment and test will be delivering relevancy. Google succeed with this in search; Facebook will now have the challenge to bring relevancy to the recommendations and taste-based advertising arena.

    Next page:Implications For the Semantic Web semantic_web_stack.jpg

    Implications For the Semantic Web

    One of the most exciting parts of the Facebook announcement to me personally is the possible breakthrough in semanticizing the Web. We've written previously about the Semantic Web here, and it has been a personal passion of mine. What Facebook has done has a chance to make vast parts of the consumer Web including movies, books, music, events, sports, and news semantically tagged. Publishers and websites finally have a strong incentive to mark things up and get return traffic from Facebook.

    "This is a great chance for the Semantic Web to finally hit consumer verticals and become real."

    The actual protocol that Facebook suggested is very simple. To describe the object on the page, the site owner needs to specify the title, type of the object, image, url and the name of the site using simple meta tags. The format is extensible and additional tags can be added. For example, for a book a site can add an isbn number. This format leaves room for ambiguity. The goal of classic semantic markups traditionally has been to refer to entities precisely; for example adding the director to a movie, or a year to remakes. The Facebook protocol does not seem to have this.

    There were lots of previous efforts to markup the Web. To name a few, RDF, microformats, Google Rich Snippets, Yahoo's Search Monkey (based on RDF and microformats), and lastly, abmeta, which was developed by me with help from Peter Mika at Yahoo. Of all these formats, Facebook's is most similar to abmeta because the markup is placed into meta tags, and is simple and human readable. This simplicity is the key to broad adoption.

    fb_protocol.png

    abmeta.png

    So all around, this is a great chance for the Semantic Web to finally hit consumer verticals and become real.

    Implications for Developers

    source_code.jpg Every new rich platform that has been rolled out in the past couple of years presented a big opportunity for developers and this one will be no exception. While we do not know exactly what sort of applications will be build on top of new Facebook, we know that they will be very powerful. This platform has the potential to give rise to to new kind of personalization and attention economy that people have been talking about for years. It has of course, a chance to majorly backfire, but I am optimistic.

    This will be a gold rush for application that is likely to last for at least a year, like the last one did. It's too early to tell whether this will be a platform that survives and does not hurt is participants. However, it is very likely that the best applications built on this platform will be owned by Facebook. Still, there is a huge new opportunity here for developers and the sky is the limit.

    Checkmate?

    Facebook made a major chess move. It might have checkmated its competitors, or perhaps it might have to lose another piece like it lost Beacon. Whichever is the case, right now there are deep implications for Facebook and its competitors, publishers, users and the Web at large. What Facebook has announced cannot be ignored and can not be undone. Everyone needs to figure out the next steps and understand what to do.

    Time will tell where we land, but my gut is that positive things will come out of this. If nothing else, let's give Facebook credit for innovation and re-imagination the Web.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/facebook_open_graph_the_definitive_guide_for_publishers_users_and_competitors.php http://www.readwriteweb.com/archives/facebook_open_graph_the_definitive_guide_for_publishers_users_and_competitors.php Facebook Fri, 23 Apr 2010 10:50:00 -0800 Alex Iskold