calais - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/calais en Copyright 2012 Richard MacManus readwriteweb@gmail.com Mon, 13 Feb 2012 17:00:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Extractiv Launches "Semantics as a Service" Platform Extractiv has quietly launched a service that crawls the Web for text on a specific topic, then transforms it into "structured semantic data." It's a direct competitor to Thomson Reuters' Calais product, which has been doing this for a couple of years now. This type of service is potentially valuable to media companies, search services and monitoring applications - because it turns messy, unorganized HTML content into data that is organized into categories and given other semantic 'meaning.'

I sat down with Extractiv CEO Shion Deysarkar at the recent Semantic Technology conference in San Francisco, to find out how Extractiv intends to compete with the more well-known and big media backed Calais.

]]> How Extractiv Works

Extractiv is a joint venture between Houston-based web crawling service 80legs and natural language processing company LCC (which created Swingly, a Q&A service).

Deysarkar explained that Extractiv uses technology from both of its parent companies, to crawl the Web for content on a particular topic and then - using natural language processing - transform it into structured data. This video, produced by Extractiv, explains how the service might be used to crawl the Web for stories about smart phones over the past month.

The output of the crawl and analysis can be JSON or XML, two formats commonly used for structured data. Support for RDFa, a popular Semantic Web standard, will be available "soon" according to the company. Extractive also offers an API, allowing customers to bypass the web site.

Extractiv is free to try, but if you'll be a moderate or heavy user of the service then you'll have to pay (the pricing is as yet unavailable on the web site).

Extractiv vs Calais

Deysarkar told ReadWriteWeb that Extractiv is targeting "mid-market Calais customers" - such as media companies or those developing search applications, monitoring services, recommendation engines or aggregators. He also claimed that Extractiv goes beyond what Calais offers, because it can mine sentiment data (which is data about how people feel about products and services).

Extractiv also wants to "provide access to more types of semantic information than any other provider." As CEO of partner company LCC, Andrew Hickl, put it, "if you're interested in baseball pitchers, a generic type like PERSON just won't cut it."

At launch, Extractiv offers about 250 different types of named entities, but it aims to have more than 3000 different entity types by the end of the U.S. summer.

Preparing For the Future of the Web

The product is not aimed at the consumer market, so it's not for the faint hearted and you need to know what to do with all of that XML or JSON data! It also remains to be seen how competitive it is with Calais, which is a proven performer and has many reputable companies as its customers. Some startups have taken on Calais before, but fallen short.

However, there is undoubtedly a need for products like Extractiv and Calais that turn the Web's unstructured data into meaningful, organized content. This is the future of the Web, because there is going to be a large increase in the quantity of data online over the next 5-10 years - and all of that data will need to be structured if we're going to be make the best use of it.

]]> Discuss]]>
http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php Structured Data Mon, 12 Jul 2010 01:58:13 -0800 Richard MacManus
Ten Companies Twitter Should Consider Acquiring Next twittercleanlogo.jpgIf you were a little blue bird, with a good pile of money and a whole lot of hype, what would you buy to spice up your nest? There are so many little services being built on top of Twitter that we wouldn't be surprised to see some more of them acquired by the company soon. That would mean more features for everyday users and more usefulness for features loved by loyal early adopters.

Twitter has acquired two other companies so far, that we know of. Search engine and sentiment analysis service Summize became Twitter's own search engine and Values of N sold its assets so engineer Rael Dornfest could be brought into the company. Here are ten other startups we think that Twitter should consider acquiring next. Which kind of company would you most like to see become part of Twitter itself? We've got a poll below.

]]> Is Twitter in a position to make more acquisitions? We suspect so. It has cash but more importantly it has stock. Think of it this way: Google is afraid of Facebook and Facebook is afraid of Twitter. Would startups bend over backwards to become a part of Twitter? We suspect most would.

Some of these we think are likely acquisitions, some less so. In making this list we considered both functionality that would be helpful to have added to Twitter's own site and technology that would be worth buying instead of just building in-house. Whenever a platform company builds technology that a number of other startups offer, there is a risk of scaring other people away from investing in development that the platform could just reproduce. Acquisitions of startups on a platform probably increase the appeal of development though, as it's a chance to get in on the game.

Quite Likely, if It Hasn't Happened Already

bitlypic.jpgBit.ly is the most full-featured and popular URL shortener on the market right now and was recently selected as Twitter's own shortener of choice, dethroning TinyURL. Bit.ly offers all kinds of smart analytics, from real-time click tracking to semantic analysis of topic keywords from the links that people tweet.

One trusted industry source speaking on the condition of anonymity told us that Bit.ly servers "were moved into Twitter's racks months ago in preparation for this change" [of becoming the default shortener]. Bit.ly is becoming too important to Twitter to keep that functionality outside the company's own shop and the two companies share some investors. We will not be surprised at all if a Bit.ly acquisition by Twitter is announced sometime in the near future.

Could Happen...

Tweetmeme is another fast growing Twitter analytics service that tracks sharing on the service. With another chunk of new features just added today, the service is looking a whole lot like "Feedburner for Twitter" but with even more viral distribution possibilities. The Tweetmeme API is quite interesting and could complement Bit.ly quite well.

Twitpic is a popular way to share images on Twitter. The site faces a strong challenge from ImageShack's YFrog, but independent Twitpic would be a cheaper acquisition and is already well known among Twitter users. (Twitter should probably look at Enjoysthin.gs; it's got the best user experience.) An increase in imagery on Twitter would probably offer the company a lot more advertising real-estate.

Twitpicpic-1.jpg

Tweepz is a fascinating Twitter search engine that acts like a directory that lets you parse your results using various metrics gleaned from Twitter. Check out this search, for example. Twitter could benefit from making this kind of search available to users, advertisers and researchers - and Tweepz has already built it. See also Twazzup, another company doing interesting things with Twitter data.

twazuptennis.jpg

Longer Shots

An iPhone app company could be a good buy for Twitter; there's certainly plenty of options. M.Twitter.com is a good mobile service already but someone specializing in super high-quality Twitter apps for the iPhone, Android and Pre could be good to bring in house. It could be AteBits, makers of Tweetie. There may not be enough reason for Twitter to buy one of these companies, though.

A desktop Twitter app company could help Twitter increase user engagement. Many of the most serious Twitter users (though not all) swear by desktop access. Twitter could acquire the most popular and arguably most innovative desktop app, Tweetdeck, or it could bring Seesmic in house. Tweetdeck would be cheap and shares investors with Twitter. Desktop apps may be too limited in appeal to be a compelling acquisition target.

Geo-location could be a good feature to add to Twitter. Search by user location could be made much more meaningful and the list of things that could be done with it is very long. Brightkite is popular and well developed, Shizzow is pretty and wouldn't be expensive. On the other hand, browsers themselves will likely all become more location aware in the near future and Twitter may be satisfied with its current location data.

brightkitejuly09.jpg

A semantics company could bring structure to the Tweets, making them more useful and easier to advertise against. Right now links Tweeted are semantically analyzed by Reuters' Calais and sent to Bit.ly, but we wouldn't be surprised if Twitter was interested in scooping up a small semantics shop and helping it scale so that analysis was being done in house. Twitter may feel like semantics don't need to get that close to consumer users, though. (Disclosure, Calais is a ReadWriteWeb sponsor.)

Topify is a widely loved service that intercepts your new Twitter follower notification emails and sends you much more useful ones. It's great but probably too easy for Twitter to just reproduce itself.

FriendFeed plus Twitter would be a match made in heaven. It would be an engineering powerhouse. It would be a step towards mainstream user adoption of FriendFeed, a service that can't make up its mind which end of the sophistication spectrum it's targeting. It's also quite unlikely to happen. If there's one related startup we can imagine turning down a Twitter acquisition offer, it's probably FriendFeed. (Though the investment-laden and highly ambitious OneRiot is a close second.) Nonetheless, it would be awesome if FriendFeed's cross-network aggregation, threaded conversations, groups, media support, search and more joined forces with Twitter.

Ultimately, it may be most likely that Twitter's next acquisition will be something vapid. A service that aggregates shopping Tweets, or celebrity Tweets, or something else that will fall short of taking advantage of the Twitter platform's huge potential to change the world. Twitter staff makes relatively simple use of its own service, so hoping that it will acquire companies that make it all the more powerfully sophisticated may be an early adopter's pipe dream. [Update: After some discussion this afternoon, I am thinking it's time to reconsider this position I've held for some time. Twitter staff is not full of dummies, I'm sure, and it has probably been inappropriate of me to write as if that's the case.]

Maybe not, though. We wouldn't be shocked to see Twitter pick up at least a few of the companies above. What do you think? Are there other services you'd like to see become part of the Twitter team even more than the above? It's a wild and woolly micro-content ecosystem out there - anything could happen.

You can find ReadWriteWeb on Twitter, as well as the entire RWW Team: Marshall Kirkpatrick, Bernard Lunn, Alex Iskold, Sarah Perez, Frederic Lardinois, Doug Coleman, Jolie O'Dell, Dana Oshiro , Lidija Davis and Steven Walling.

]]> Discuss]]>
http://www.readwriteweb.com/archives/ten_companies_twitter_should_consider_acquiring_ne.php http://www.readwriteweb.com/archives/ten_companies_twitter_should_consider_acquiring_ne.php Analysis Fri, 03 Jul 2009 12:20:19 -0800 Marshall Kirkpatrick
The State of the Market in Semantic Technologies Tom Tague from Thomson Reuters' OpenCalais team did a keynote speech today at SemTech in San Jose. His presentation was a wonderful wrapup of current semantic technology trends, and what we can expect over the next few years.

To open, he said that where we are now in the evolution of the Web is content rich, but information poor - plus "experientially deficient". He suggested that 'web 3.0' is about cleaning up the mess of web 2.0 and improving interfaces. In terms of semantic technology, he explained that over the past 5 years it has evolved from invention of standards to a period of commercial innovation on top of those inventions. While standards are still being worked on, now "we are at an inflection point where innovation is exploding."

]]> Tague called Calais, the project he leads at Thomson Reuters, "a web service a.k.a. plumbing". They've had 13 releases, talked with 100+ customers about Calais, have 13,000 registered developers. He put the ideas that he's been talking about with customers and developers into 6 buckets, which we've listed with sub-categories below.

Tools

  • Semantic data mgmt
  • Semantic data generation
  • Databases
  • Integration and workflow

Tague said that tools are important, particularly in the enterprise. He sounded a note of caution to tools vendors: they need to simplify their stories, along with have "simple basic tools."

Social

  • Semantics-powered link sharing
  • Network mining
  • News sharing
  • Tweet mining

Tague said that we shouldn't focus on providing "frosting" on top of current social Web tools. He advised to focus on commercial imperatives, such as the categories above.

Advertising

  • Semantic ad placement
  • Contextual ad placement
  • Semantically driven landing pages
  • Mashup ads

There are clearly opportunities to improve advertising using semantic technology, said Tague.

Search

Tague noted that semantic search may be "the answer to the question nobody is asking." He said that we should look at general "semantic search" vs domain specific semantically-enhanced search. The latter is where the commercial opportunity actually is, but he questioned the economics of general semantic search.

Publishing

He put this into 3 sub-categories:

  • A-Content Producers - from back office to user experience
  • B-Editorial + Aggregation Publishing Models
  • C-Robotic publishing - aggregation only

Tague explained that Calais has really focused on this over the last 8-9 months. He said that classic publishers can get an enormous amount of value from this. Right now the big focus is "back in the bolier room," for example to cut editors from 3 to 2. He expects that later on more focus will go on enhancing the user experience.

Tague thinks that B is the biggest opportunity, using Huffington Post as an example. He said that it gives a "near newspaper like experience" at perhaps a 5th of the cost. It's an area where they're seeing adoption of Calais.

Interface

Tague noted that gaming is a huge industry that the semantic technology industry can learn from. He listed these attributes:

  • Great story line
  • High interactivity, immediate responsiveness
  • No interuptions
  • Graphically engaging
  • Seamless
  • Fun

So he asked who out there is trying to really change the user experience in semantic technology? He listed 4 companies (all of whom we've profiled on ReadWriteWeb):

  • Zemanta
  • Apture
  • Feedly
  • Glue

Tague told the audience that the next big innovation in interface will be something that stays with the user where they are, which will be mobile and in the browser.

To sum up, Tague suggested that semantic technologies vendors should decide whether they care about semantics or about user value. If it's semantics, then be a tools vendor. He said the basic building blocks are out there already, so focus on user experience.

Disclosure: SemTech has been a recent sponsor of ReadWriteWeb

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_state_of_the_market_in_semantic_technologies.php http://www.readwriteweb.com/archives/the_state_of_the_market_in_semantic_technologies.php Conferences Tue, 16 Jun 2009 09:23:17 -0800 Richard MacManus
12 Companies Targeting Early Tech Adopters readwritewebOur mission at ReadWriteWeb is to explore the latest Web technology products and trends. We're fortunate to have a great group of sponsors who support this goal. So, once a week, we write a post about them; about who they are, what they do, and what they've been up to lately. We hope you'll pay them a visit as a way to show your appreciation for their sponsorship of this site.

Interested in being a ReadWriteWeb sponsor? ReadWriteWeb is one of the most popular blogs in the world and is read by a sophisticated audience of thought leaders and decision-makers. We have several innovative new features in our sponsor packages that we'd love to tell you about. Email our COO Bernard Lunn for all the details.

]]> Ready to learn more about the smart companies that support this site you love to read? Read on...


Skip to info about: Calais: semantic Web API | Socialtext: enterprise 2.0 | Mashery: API management services | Rackspace: cloud computing experts | Aplus.net: Web hosting | Crowd Science: demographic data | Smub: mobile sharing | 2009 Semantic Technology Conference: semantic search and tech | Hakia: semantic search | Media Temple and SixApart: our hosts and blogging software



Calais

370_tagaroo.jpgCalais, powered by Thomson Reuters, brings state-of-the-art semantic functionality into your blog, content management system, site or application. Calais 4.0 was released in January, for the first time allowing publishers to connect to the Linked Data Web standard. Calais 4.0 goes beyond meta-tagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com, and others. Calais 4.0 also lets publishers share semantic meta-data about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services, and more.

Check out the incredible work being done at Calais and let us know what you think.

Socialtext

Socialtext provides an enterprise wiki platform for organizations who want to accelerate knowledge sharing, foster collaboration, or build online communities.

Socialtext is currently offering a free white paper entitled "5 Best Practices for Enterprise Collaboration." It explains how collaboration solutions (a.k.a. Enterprise 2.0) can "dramatically reduce enterprise cycle times and costs. These results may be critical to survival in difficult economic times, and the right collaboration solution is the easiest, most cost effective way to achieve them."

Download Socialtext's free white paper at http://socialtext.com.

Crowd Science

Crowd Science gives online publishers reports on the demographics and attitudes of their audience. We at ReadWriteWeb have signed up to this new service, because demographic data is something we've struggled to get in the past. It's important for any online business to know their audience, so Crowd Science is a welcome addition to the stats armory that most of us in the Internet biz use.

You can sign up to get demographic data by clicking here.

Mashery

Mashery is a platform for Web services, allowing companies to manage their APIs using Mashery's expertise. At the "Business of APIs" conference, Mashery CEO Oren Michels explained to the audience that while APIs are a technology, their use is a business decision. He went on to say that Mashery has helped customers such as WhitePages.com, Thumbplay, Compete.com, and Calais. Check out the white paper "Five steps to scaling your business development using Web services" to discover how you can use APIs for your business.

You can find out more about APIs and their business use at www.mashery.com.

Rackspace

Rackspace is one of the world's largest hosting providers, but it's also competing in the cloud computing arena. In October Rackspace announced two major acquisitions: SliceHost and JungleDisk. Slicehost is a popular cloud computing and hosting provider with about 15,000 users, while JungleDisk is one of our favorite online backup services. JungleDisk used to rely on Amazon's S3 storage solution, but it now also supports Rackspace's cloud storage solution. At the same time, Rackspace also announced a new suite of services, Rackspace Cloud Hosting, which combines a hosting platform (CloudSites) with a cloud storage solution (CloudFS), and, in the long run, a tight integration with Slicehost's services.

Click here to explore Rackspace's hosting and cloud computing solutions.

Aplus.net

Aplus.net offers a variety of services relating to Web hosting, including shared hosting, dedicated server, managed hosting, Web design, marketing and online advertising services, search engine optimization, e-commerce solutions, and domain registration.

You can register for Aplus.net here.

Smub

Smub is the first truly mobile bookmarking, link-sharing tool. Smub lets you share and save any link easily from your iPhone, Mac, or PC without a plugin or application.

Type smub.it/ to the left of http:// on any link to save or share, and Smub will automatically take you through the process. Make the link public to share with others, or keep it private just for yourself. Smub has built-in sharing to Facebook, Twitter, MySpace, and more.


2009 Semantic Technology Conference

What are the big players doing in semantic search? Which startups are challenging them? How does semantic technology change search results? What key advantages and new opportunities does semantics provide in consumer and business search markets?

At the 2009 Semantic Technology Conference, taking place from 14 to 18 June 2009 in San Jose, semantic applications and usage cases will be presented by product developers and technical experts in such fields as advertising, business process management, cloud computing, digital asset management, and e-commerce.

Hakia

Hakia is a general purpose "semantic" search engine that delivers a search experience based on focus, clarity, and credibility. Today's search engines retrive popular results via statistical ranking, but popular websites are not always credible and credible websites are not always popular.

Hakia's semantic technology provides a new search experience based on quality, not popularity. Its search results come from credible websites recommended by librarians; they represent the most recent information available and remain absolutely relevant to the query.

Our Gracious Hosts and Blogging Software

370_rwwmt.jpgReadWriteWeb is hosted by Media Temple and is published using SixApart's Movable Type.

If you've ever wondered what ReadWriteWeb looks like behind the scenes, or if you've never seen the Movable Type publishing interface - that's it on the left. We recently upgraded to MT 4.23, which is the latest version. We got onto this release as soon as it was available - in fact our contacts at Six Apart emailed the actual code to us before it was up on their website. That's customer service for you!

The companies above pay our rents or mortgages and we appreciate it. We hope you'll stop by their sites and see what they've got to offer.

Have you got a smart company that could use some more visits by the sophisticated readers of a blog like ReadWriteWeb's? Drop us a line and let's talk.

Thanks to all our sponsors and our readers for your support!

]]> Discuss]]>
http://www.readwriteweb.com/archives/sponsors_post_31may09.php http://www.readwriteweb.com/archives/sponsors_post_31may09.php Sponsors Sun, 31 May 2009 15:45:31 -0800 Admin
12 Companies Targeting Early Tech Adopters readwritewebOur mission at ReadWriteWeb is to explore the latest Web technology products and trends. We're fortunate to have a great group of sponsors who support this goal. So, once a week, we write a post about them; about who they are, what they do, and what they've been up to lately. We hope you'll pay them a visit as a way to show your appreciation for their sponsorship of this site.

Interested in being a ReadWriteWeb sponsor? ReadWriteWeb is one of the most popular blogs in the world and is read by a sophisticated audience of thought leaders and decision-makers. We have several innovative new features in our sponsor packages that we'd love to tell you about. Email our COO Bernard Lunn for all the details.

]]> Ready to learn more about the smart companies that support this site you love to read? Read on...


Skip to info about: Calais: semantic Web API | Socialtext: enterprise 2.0 | Mashery: API management services | Rackspace: cloud computing experts | Aplus.net: Web hosting | Crowd Science: demographic data | Smub: mobile sharing | 2009 Semantic Technology Conference: semantic search and tech | Hakia: semantic search | Media Temple and SixApart: our hosts and blogging software



Calais

370_tagaroo.jpgCalais, powered by Thomson Reuters, brings state-of-the-art semantic functionality into your blog, content management system, site or application. Calais 4.0 was released in January, for the first time allowing publishers to connect to the Linked Data Web standard. Calais 4.0 goes beyond meta-tagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com, and others. Calais 4.0 also lets publishers share semantic meta-data about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services, and more.

Check out the incredible work being done at Calais and let us know what you think.

Socialtext

Socialtext provides an enterprise wiki platform for organizations who want to accelerate knowledge sharing, foster collaboration, or build online communities.

Socialtext is currently offering a free white paper entitled "5 Best Practices for Enterprise Collaboration." It explains how collaboration solutions (a.k.a. Enterprise 2.0) can "dramatically reduce enterprise cycle times and costs. These results may be critical to survival in difficult economic times, and the right collaboration solution is the easiest, most cost effective way to achieve them."

Download Socialtext's free white paper at http://socialtext.com.

Crowd Science

Crowd Science gives online publishers reports on the demographics and attitudes of their audience. We at ReadWriteWeb have signed up to this new service, because demographic data is something we've struggled to get in the past. It's important for any online business to know their audience, so Crowd Science is a welcome addition to the stats armory that most of us in the Internet biz use.

You can sign up to get demographic data by clicking here.

Mashery

Mashery is a platform for Web services, allowing companies to manage their APIs using Mashery's expertise. At the "Business of APIs" conference, Mashery CEO Oren Michels explained to the audience that while APIs are a technology, their use is a business decision. He went on to say that Mashery has helped customers such as WhitePages.com, Thumbplay, Compete.com, and Calais. Check out the white paper "Five steps to scaling your business development using Web services" to discover how you can use APIs for your business.

You can find out more about APIs and their business use at www.mashery.com.

Rackspace

Rackspace is one of the world's largest hosting providers, but it's also competing in the cloud computing arena. In October Rackspace announced two major acquisitions: SliceHost and JungleDisk. Slicehost is a popular cloud computing and hosting provider with about 15,000 users, while JungleDisk is one of our favorite online backup services. JungleDisk used to rely on Amazon's S3 storage solution, but it now also supports Rackspace's cloud storage solution. At the same time, Rackspace also announced a new suite of services, Rackspace Cloud Hosting, which combines a hosting platform (CloudSites) with a cloud storage solution (CloudFS), and, in the long run, a tight integration with Slicehost's services.

Click here to explore Rackspace's hosting and cloud computing solutions.

Aplus.net

Aplus.net offers a variety of services relating to Web hosting, including shared hosting, dedicated server, managed hosting, Web design, marketing and online advertising services, search engine optimization, e-commerce solutions, and domain registration.

You can register for Aplus.net here.

Smub

Smub is the first truly mobile bookmarking, link-sharing tool. Smub lets you share and save any link easily from your iPhone, Mac, or PC without a plugin or application.

Type smub.it/ to the left of http:// on any link to save or share, and Smub will automatically take you through the process. Make the link public to share with others, or keep it private just for yourself. Smub has built-in sharing to Facebook, Twitter, MySpace, and more.


2009 Semantic Technology Conference

What are the big players doing in semantic search? Which startups are challenging them? How does semantic technology change search results? What key advantages and new opportunities does semantics provide in consumer and business search markets?

At the 2009 Semantic Technology Conference, taking place from 14 to 18 June 2009 in San Jose, semantic applications and usage cases will be presented by product developers and technical experts in such fields as advertising, business process management, cloud computing, digital asset management, and e-commerce.

Hakia

Hakia is a general purpose "semantic" search engine that delivers a search experience based on focus, clarity, and credibility. Today's search engines retrive popular results via statistical ranking, but popular websites are not always credible and credible websites are not always popular.

Hakia's semantic technology provides a new search experience based on quality, not popularity. Its search results come from credible websites recommended by librarians; they represent the most recent information available and remain absolutely relevant to the query.

Find out what makes Hakia different.

Our Gracious Hosts and Blogging Software

370_rwwmt.jpgReadWriteWeb is hosted by Media Temple and is published using SixApart's Movable Type.

If you've ever wondered what ReadWriteWeb looks like behind the scenes, or if you've never seen the Movable Type publishing interface - that's it on the left. We recently upgraded to MT 4.23, which is the latest version. We got onto this release as soon as it was available - in fact our contacts at Six Apart emailed the actual code to us before it was up on their website. That's customer service for you!

The companies above pay our rents or mortgages and we appreciate it. We hope you'll stop by their sites and see what they've got to offer.

Have you got a smart company that could use some more visits by the sophisticated readers of a blog like ReadWriteWeb's? Drop us a line and let's talk.

Thanks to all our sponsors and our readers for your support!

]]> Discuss]]>
http://www.readwriteweb.com/archives/sponsors_post_23may09.php http://www.readwriteweb.com/archives/sponsors_post_23may09.php Sponsors Sun, 24 May 2009 16:15:32 -0800 Admin
12 Companies Targeting Early Tech Adopters readwritewebOur mission at ReadWriteWeb is to explore the latest Web technology products and trends. We're fortunate to have a great group of sponsors who support this goal. So, once a week, we write a post about them; about who they are, what they do, and what they've been up to lately. We hope you'll pay them a visit as a way to show your appreciation for their sponsorship of this site.

Interested in being a ReadWriteWeb sponsor? ReadWriteWeb is one of the most popular blogs in the world and is read by a sophisticated audience of thought leaders and decision-makers. We have several innovative new features in our sponsor packages that we'd love to tell you about. Email our COO Bernard Lunn for all the details.

]]> Ready to learn more about the smart companies that support this site you love to read? Read on...


Skip to info about: Calais: semantic Web API | Socialtext: enterprise 2.0 | Mashery: API management services | Rackspace: cloud computing experts | Aplus.net: Web hosting | Crowd Science: demographic data | Smub: mobile sharing | Web 3.0 Conference: next-era technology | 2009 Semantic Technology Conference: semantic search and tech | Hakia: semantic search | Media Temple and SixApart: our hosts and blogging software



Calais

370_tagaroo.jpgCalais, powered by Thomson Reuters, brings state-of-the-art semantic functionality into your blog, content management system, site or application. Calais 4.0 was released in January, for the first time allowing publishers to connect to the Linked Data Web standard. Calais 4.0 goes beyond meta-tagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com, and others. Calais 4.0 also lets publishers share semantic meta-data about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services, and more.

Check out the incredible work being done at Calais and let us know what you think.

Socialtext

Socialtext provides an enterprise wiki platform for organizations who want to accelerate knowledge sharing, foster collaboration, or build online communities.

Socialtext is currently offering a free white paper entitled "5 Best Practices for Enterprise Collaboration." It explains how collaboration solutions (a.k.a. Enterprise 2.0) can "dramatically reduce enterprise cycle times and costs. These results may be critical to survival in difficult economic times, and the right collaboration solution is the easiest, most cost effective way to achieve them."

Download Socialtext's free white paper at http://socialtext.com.

Crowd Science

Crowd Science gives online publishers reports on the demographics and attitudes of their audience. We at ReadWriteWeb have signed up to this new service, because demographic data is something we've struggled to get in the past. It's important for any online business to know their audience, so Crowd Science is a welcome addition to the stats armory that most of us in the Internet biz use.

You can sign up to get demographic data by clicking here.

Mashery

Mashery is a platform for Web services, allowing companies to manage their APIs using Mashery's expertise. At the "Business of APIs" conference, Mashery CEO Oren Michels explained to the audience that while APIs are a technology, their use is a business decision. He went on to say that Mashery has helped customers such as WhitePages.com, Thumbplay, Compete.com, and Calais. Check out the white paper "Five steps to scaling your business development using Web services" to discover how you can use APIs for your business.

You can find out more about APIs and their business use at www.mashery.com.

Rackspace

Rackspace is one of the world's largest hosting providers, but it's also competing in the cloud computing arena. In October Rackspace announced two major acquisitions: SliceHost and JungleDisk. Slicehost is a popular cloud computing and hosting provider with about 15,000 users, while JungleDisk is one of our favorite online backup services. JungleDisk used to rely on Amazon's S3 storage solution, but it now also supports Rackspace's cloud storage solution. At the same time, Rackspace also announced a new suite of services, Rackspace Cloud Hosting, which combines a hosting platform (CloudSites) with a cloud storage solution (CloudFS), and, in the long run, a tight integration with Slicehost's services.

Click here to explore Rackspace's hosting and cloud computing solutions.

Aplus.net

Aplus.net offers a variety of services relating to Web hosting, including shared hosting, dedicated server, managed hosting, Web design, marketing and online advertising services, search engine optimization, e-commerce solutions, and domain registration.

You can register for Aplus.net here.

Web 3.0 Conference

The core idea behind Web 3.0 is to extract much more meaningful, actionable insight from information. At the Web 3.0 Conference, taking place in New York City from 19 to 20 May 2009, participants will explore how companies are using the emerging technology collectively known as Web 3.0 for significant bottom-line impact in areas like marketing, corporate information management, customer service, and personal productivity.

ReadWriteWeb readers save 15% with the discount code XRWW.

Smub

Smub is the first truly mobile bookmarking, link-sharing tool. Smub lets you share and save any link easily from your iPhone, Mac, or PC without a plugin or application.

Type smub.it/ to the left of http:// on any link to save or share, and Smub will automatically take you through the process. Make the link public to share with others, or keep it private just for yourself. Smub has built-in sharing to Facebook, Twitter, MySpace, and more.


2009 Semantic Technology Conference

What are the big players doing in semantic search? Which startups are challenging them? How does semantic technology change search results? What key advantages and new opportunities does semantics provide in consumer and business search markets?

At the 2009 Semantic Technology Conference, taking place from 14 to 18 June 2009 in San Jose, semantic applications and usage cases will be presented by product developers and technical experts in such fields as advertising, business process management, cloud computing, digital asset management, and e-commerce.

Hakia

Hakia is a general purpose "semantic" search engine that delivers a search experience based on focus, clarity, and credibility. Today's search engines retrive popular results via statistical ranking, but popular websites are not always credible and credible websites are not always popular.

Hakia's semantic technology provides a new search experience based on quality, not popularity. Its search results come from credible websites recommended by librarians; they represent the most recent information available and remain absolutely relevant to the query.

Find out what makes Hakia different.

Our Gracious Hosts and Blogging Software

370_rwwmt.jpgReadWriteWeb is hosted by Media Temple and is published using SixApart's Movable Type.

If you've ever wondered what ReadWriteWeb looks like behind the scenes, or if you've never seen the Movable Type publishing interface - that's it on the left. We recently upgraded to MT 4.23, which is the latest version. We got onto this release as soon as it was available - in fact our contacts at Six Apart emailed the actual code to us before it was up on their website. That's customer service for you!

The companies above pay our rents or mortgages and we appreciate it. We hope you'll stop by their sites and see what they've got to offer.

Have you got a smart company that could use some more visits by the sophisticated readers of a blog like ReadWriteWeb's? Drop us a line and let's talk.

Thanks to all our sponsors and our readers for your support!

]]> Discuss]]>
http://www.readwriteweb.com/archives/sponsors_post_16may09.php http://www.readwriteweb.com/archives/sponsors_post_16may09.php Sponsors Sun, 17 May 2009 18:00:48 -0800 Admin
12 Companies Targeting Early Tech Adopters readwritewebOur mission at ReadWriteWeb is to explore the latest Web technology products and trends. We're fortunate to have a great group of sponsors who support this goal. So, once a week, we write a post about them; about who they are, what they do, and what they've been up to lately. We hope you'll pay them a visit as a way to show your appreciation for their sponsorship of this site.

Interested in being a ReadWriteWeb sponsor? ReadWriteWeb is one of the most popular blogs in the world and is read by a sophisticated audience of thought leaders and decision-makers. We have several innovative new features in our sponsor packages that we'd love to tell you about. Email our COO Bernard Lunn for all the details.

]]> Ready to learn more about the smart companies that support this site you love to read? Read on...


Skip to info about: Calais: semantic Web API | Socialtext: enterprise 2.0 | Adobe: Flash media server | Wistia: video for business | Mashery: API management services | TaxACT: online tax filing | Rackspace: cloud computing experts | Aplus.net: Web hosting | Crowd Science: demographic data | Eurekster: custom topic portals | Smub: mobile sharing | Media Temple and SixApart: our hosts and blogging software



Calais

370_tagaroo.jpgCalais, powered by Thomson Reuters, brings state-of-the-art semantic functionality into your blog, content management system, site or application. Calais 4.0 was released in January, for the first time allowing publishers to connect to the Linked Data Web standard. Calais 4.0 goes beyond meta-tagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com, and others. Calais 4.0 also lets publishers share semantic meta-data about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services, and more.

Check out the incredible work being done at Calais and let us know what you think.

Socialtext

Socialtext provides an enterprise wiki platform for organizations who want to accelerate knowledge sharing, foster collaboration, or build online communities.

Socialtext is currently offering a free white paper entitled "5 Best Practices for Enterprise Collaboration." It explains how collaboration solutions (a.k.a. Enterprise 2.0) can "dramatically reduce enterprise cycle times and costs. These results may be critical to survival in difficult economic times, and the right collaboration solution is the easiest, most cost effective way to achieve them."

Download Socialtext's free white paper at http://socialtext.com.

Crowd Science

Crowd Science gives online publishers reports on the demographics and attitudes of their audience. We at ReadWriteWeb have signed up to this new service, because demographic data is something we've struggled to get in the past. It's important for any online business to know their audience, so Crowd Science is a welcome addition to the stats armory that most of us in the Internet biz use.

You can sign up to get demographic data by clicking here.

Mashery

Mashery is a platform for Web services, allowing companies to manage their APIs using Mashery's expertise. At the "Business of APIs" conference, Mashery CEO Oren Michels explained to the audience that while APIs are a technology, their use is a business decision. He went on to say that Mashery has helped customers such as WhitePages.com, Thumbplay, Compete.com, and Calais. Check out the white paper "Five steps to scaling your business development using Web services" to discover how you can use APIs for your business.

You can find out more about APIs and their business use at www.mashery.com.

TaxACT

TaxACTTaxACT is an efficient way to file your taxes online, in either desktop or web-based versions. It offers two ways to enter data: the interview format, or the forms-based entry method. TaxACT also provides its users a highly reliable and robust alerts system to prevent costly mistakes generally caused by omissions or missed opportunities to maximize deductions. Regardless of the TaxACT version, all forms are IRS and State approved. The software was developed by professional accountants and CPAs.

You can see a tour of TaxACT online by clicking here.

Rackspace

Rackspace is one of the world's largest hosting providers, but it's also competing in the cloud computing arena. In October Rackspace announced two major acquisitions: SliceHost and JungleDisk. Slicehost is a popular cloud computing and hosting provider with about 15,000 users, while JungleDisk is one of our favorite online backup services. JungleDisk used to rely on Amazon's S3 storage solution, but it now also supports Rackspace's cloud storage solution. At the same time, Rackspace also announced a new suite of services, Rackspace Cloud Hosting, which combines a hosting platform (CloudSites) with a cloud storage solution (CloudFS), and, in the long run, a tight integration with Slicehost's services.

Click here to explore Rackspace's hosting and cloud computing solutions.

Adobe Flash Media Interactive Server 3.5

Adobe Flash Media Interactive Server 3.5 offers powerful streaming with a flexible environment for creating and delivering rich, interactive, multi-way social media experiences to a broad audience. You'll find a superior video experience, with new features such as Dynamic Streaming, DVR functionality, HTTP delivery support, and H.264 enhancements.

Check out the Adobe Flash Media Interactive Server 3.5 to add interactivity and media streaming to your social media applications.

Wistia

Wistia is a provider of secure video sharing and collaboration tools for business. The company says that "the use of video in business has grown immensely as cameras and video production have become significantly more accessible. However, sharing and collaborating on this content with your team still has many challenges, including large file sizes, numerous video formats, privacy and security, and lack of collaboration environment." Wistia aims to solve those challenges.

You can get a free 15-day trial of Wistia by clicking here.

Aplus.net

Aplus.net offers a variety of services relating to Web hosting, including shared hosting, dedicated server, managed hosting, Web design, marketing and online advertising services, search engine optimization, e-commerce solutions, and domain registration.

You can register for Aplus.net here.


Eurekster

370_aswicki.jpgEurekster is developer of the swicki that we use on ReadWriteWeb, a custom social search portal on the topic of your choice (in our case, Web tech), powered by the community.

People build swickis on all kinds of topics, some people build a lot of them. Alex Holmes, for example, builds really nice looking swickis on topics like the 2008 Election, Ocean Animals and Home Buying.

Smub

Smub is the first truly mobile bookmarking, link-sharing tool. Smub lets you share and save any link easily from your iPhone, Mac, or PC without a plugin or application.

Type smub.it/ to the left of http:// on any link to save or share, and Smub will automatically take you through the process. Make the link public to share with others, or keep it private just for yourself. Smub has built-in sharing to Facebook, Twitter, MySpace, and more.


 

 

Our Gracious Hosts and Blogging Software

370_rwwmt.jpgReadWriteWeb is hosted by Media Temple and is published using SixApart's Movable Type.

If you've ever wondered what ReadWriteWeb looks like behind the scenes, or if you've never seen the Movable Type publishing interface - that's it on the left. We recently upgraded to MT 4.23, which is the latest version. We got onto this release as soon as it was available - in fact our contacts at Six Apart emailed the actual code to us before it was up on their website. That's customer service for you!

The companies above pay our rents or mortgages and we appreciate it. We hope you'll stop by their sites and see what they've got to offer.

Have you got a smart company that could use some more visits by the sophisticated readers of a blog like ReadWriteWeb's? Drop us a line and let's talk.

Thanks to all our sponsors and our readers for your support!

]]> Discuss]]>
http://www.readwriteweb.com/archives/sponsors_post_10may09.php http://www.readwriteweb.com/archives/sponsors_post_10may09.php Sponsors Sun, 10 May 2009 18:59:13 -0800 Admin
Media Cloud Leverages Calais to Track News Trends Media Cloud, a new project from the Berkman Center at Harvard University, has an ambitious goal: It will do the heavy lifting of analyzing stories from thousands of traditional news sources, analyzing the semantics of the content through Calais (covered here and here), and then providing tools to quickly get trending results. This approach promises to bring what used to be an expensive and laborious process to anyone who has a need for this type of data but lacks the means to get it.

]]> At launch, Media Cloud will offer three primary trend visualization tools and a discussion forum for anyone to use. In addition, a news RSS feed and mailing list are available. We'll now take a moment to review how these visualization tools work, and we'd also like to point you to a very illustrative video interview of project developer Ethan Zuckerman, by the Nieman Journalism Lab.

Disclosure: Reuters Calais is an RWW sponsor. And they are awesome.

Top 10 Chart

This tool lets you compare up to three media sources, generating a list of the top ten most mentioned terms for that source and relative frequency of use for each term. This chart can be useful in a number of ways, indicating not only what terms are considered most important by each news source at the moment you generate the chart, but also showing if there is a clear standout term that may indicate a very hot topic. Also, when comparing two similar media sources, say for example the New York Times and the Washington Post, the resulting chart can give you an idea of what each paper considers more important leading topics.

Top 10 Term Pivot Chart

You can put in your own search term and up to three media sources in this tool to see what terms are most frequently mentioned alongside the search term in those sources' stories. This allows you to gain insight on how frequently related terms cluster together. So, for example if you search for Obama, you might find that, while United States is the most common related term, CNN's focus is more on Congress while FOX News writes more about the White House.

World Map Chart

This tool shows global coverage of all terms in the Media Cloud database for the selected media sources. Naturally, a newspaper that is focused on national US news will not have the depth of coverage of a source that has an international perspective. But even when comparing similar international sources, the weight each source gives to news from different regions can differ greatly. Take the New York Times versus BBC news coverage, you will see that darker colors mean that BBC has heavier coverage of European affairs, while NYT has stronger coverage of Canada and Mexico news topics.

Discussion

Media Cloud is a project that developed from discussions around where story trends came from. This tool attempts to serve as a foundation to help move these conversations forward, and the Berkman Center is keeping the door open for new ideas and ways of using this data. To that end, they also have a discussion forum where people can contribute suggestions, thoughts and ideas to the project. Media Cloud also provides an RSS feed and an email list you can subscribe to if you want to stay in the loop for any new developments.

Like our coverage of the New York Times R&D Labs, we see this as an example of how the Internet is driving traditional media to change and respond in new ways. We are excited by the scope and potential that Media Cloud brings to anyone interested in following news and media trends.

]]> Discuss]]>
http://www.readwriteweb.com/archives/media_cloud_leverages_calais_to_track_news_trends.php http://www.readwriteweb.com/archives/media_cloud_leverages_calais_to_track_news_trends.php News Wed, 11 Mar 2009 17:05:00 -0800 Phil Glockner
Calais 4.0 Released: Linked Data Meets the Commercial Web Thomson Reuters is today launching the latest version of its Calais web service and open API, Calais 4.0. Calais is a toolkit of products that enables publishers to incorporate semantic functionality within their properties - enabling them to categorize content as people, places, companies, facts, events, and more. Calais 4.0 is perhaps the most significant version since the launch of Calais one year ago, because it enables publishers to connect to the Linked Data web standard that Sir Tim-Berners Lee and others in the Semantic Web community have been promoting over the past few years.

]]> Up till now, we have yet to see much commercial activity in Linked Data - developments have been largely confined to the academic and scientific communities. So we think Calais 4.0 represents an important move forward in the commercial Semantic Web - and we expect to see some big media companies using it before long.

Specifically, Calais 4.0 goes beyond metatagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com and others. Calais 4.0 also lets publishers share semantic metadata about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services and more.

ReadWriteWeb named Calais as one of our top 10 Semantic Web Apps of 2008, due to the progress it made last year. Since launching the Open Calais API early in 2008, over 9,000 developers have registered with it and Calais has processed 200+ million articles.

What's New in 4.0

We spoke with Thomas Tague, Calais lead at Thomson Reuters, about what specifically is new with Calais 4.0 and what use cases we might see over the coming year for it.

Tague explained to ReadWriteWeb that there are 3 pillers to the Calais initiative:

1. Getting semantic data out of text; which is what the first 3 versions of Calais focused on.
2. Connecting that semantic data to the linked data world.
3. Providing some way for people to share metadata, for example syndicating it - which Tague termed the "transport" piller.

Calais 4.0, explained Tague, fills in the final 2 of those pillers. It supports approximately 25 entity types in Linked Data - URIs are de-referencable to Calais RDF pages. Thomson Reuters is also publishing their ontology in RDFS. Calais will contribute data too, which Thomson Reuters claims is "the first contribution to the Linked Data cloud made by a major publisher." The data that Thomson Reuters is giving to the Linked Data world includes company descriptions, stock tickers, management teams and more. This data will be available to external developers to programmatically use in their apps.

Thomas Tague told ReadWriteWeb that Thomson Reuters has some big data assets and that over time "we're going to populate linked data endpoints with Thomson Reuters data". We asked Tague whether he thinks Calais 4.0 is the biggest commercial use of the Linked Data standard yet? He thinks it is; in his opinion, Linked Data has mostly been used so far for open data projects and relatively small sets of data. Tague said that "we fundamentally believe that companies need to jump into this [Linked Data]".


The Linking Open Data dataset cloud; by Richard Cyganiak

In terms of piller 3, the metadata transportation, Tague explained to us that a document gets a unique identifier - and to syndicate content, publishers just need to make available that unique identifier to external parties.

Conclusion

It will be interesting to see what companies make use of Calais over 2009. Last year we noted that IBM was using Calais - and we presume that with the extra Linked Data and transport functionality, other big companies will want to make use of Calais data too. Thomas Tague told us that they hope to announce 2 big product partners soon. He also said that they're seeing major traction around Drupal. Healthcare IT News from MedTech Publishing, a site developed in Drupal, features the full Calais suite for publishers including "More Like This", their related content plugin.

As we noted at the beginning of this post, we've been impressed with the progress Calais has made since its launch at the start of 2008. With 4.0, we expect to see it gain more traction among commercial publishers in 2009. Indeed as a (we like to think) ahead-of-the-curve 'new media' company ourselves, we're about to embark on our own project using Calais! Stay tuned for more information on that.

]]> Discuss]]>
http://www.readwriteweb.com/archives/calais_4_linked_data.php http://www.readwriteweb.com/archives/calais_4_linked_data.php Product Reviews Thu, 15 Jan 2009 05:00:00 -0800 Richard MacManus
Semantic Web Patterns: A Guide to Semantic Technologies In this article, we'll analyze the trends and technologies that power the Semantic Web. We'll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb's founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

]]> Editor's note: Looking back over 2008, there were some posts on ReadWriteWeb that did not get the attention we felt they deserved - whether because of timing, competing news stories, etc. So in this end-of-year series, called Redux, we're resurrecting some of those hidden gems. This is one of them, we hope you enjoy (re)reading it!

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence - computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web - the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers - publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exist, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction - such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure: Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.]

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get - because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a trade off between simplicity and completeness. The most comprehensive approach is RDF - a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

At present, most use of RDF is for interoperability. For example, the medical community uses RDF to describe genomic databases. Because the information is normalized, the databases that were previously silos can now be queried together and correlated. In general, in addition to semantic soundness, the major benefit of RDF is interoperability and standardization, particularly for enterprises, as we will discuss below.

Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles. The metadata is compact and is embedded inside the actual HTML. Popular microformats are hCard, which describes personal and company contact information, hReview, which adds meta information to review pages, and hCalendar, which is used to describe events.

Microformats are gaining popularity because of their simplicity, but they are still quite limiting. There is no way to describe type hierarchies, which the classic semantic community would say is critical. The other issue is that microformats are somewhat cryptic, because the focus is to keep the annotations to a minimum. This, in turn, brings up another question of whether embedding metadata into the view (HTML) is a good idea. The question is: what happens if the underlying data changes when someone makes a copy of the HTML document? Nevertheless, despite these issues, microformats are gaining popularity because they are simple. Microformats are currently used by Flickr, Eventful, and LinkedIn; and many other companies are looking to adopt microformats, particularly because of the recent Yahoo! announcement.

An even simpler approach is to put meta data into the meta headers. This approach has been around for a while and it is a shame that it has not been widely adopted. As an example, the New York Times recently launched extended annotations for its news pages. The benefit of this approach is that it works great for pages that are focused on a topic or a thing. For example, a news page can be described with a set of keywords, geo location, date, time, people, and categories. Another example would be for book pages. O'Reilly.com has been putting book information into the meta headers, describing the author, ISBN, and category of the book.

Despite the fact that all these approaches are different, they are also somewhat complementary; and each of them is helpful. The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.

3. Consumer and Enterprise

Yet another dimension of the conversation about the Semantic Web is the focus on consumer and enterprise applications. In the consumer arena we have been looking for a Killer App - something that delivers tangible and simple consumer value. People simply do not care that a product is built on the Semantic Web; all they are looking for is utility and usefulness.

Up until recently, the challenge has been that the Semantic Web focused on rather academic issues - like annotating information to make it machine-readable. The promise was that once the information is annotated and the web becomes one big giant RDF database, then exciting consumer applications would come. The skeptics, however, have been pointing out that first there needs to be a compelling use case.

Some consumer applications based on the Semantic Web: generic and vertical search, contextual shortcuts and previews, personal information management systems, semantic browsing tools. All of these applications are in their early days and have a long way to go before being truly compelling for the average web user. Still, even if these applications succeed, consumers will not be interested in knowing about the underlying technology - so there is really no marketing play for the Semantic Web in the consumer space.

Enterprises are a different story for a couple of reasons. First, enterprises are much more used to techno speak. To them utilizing semantic technologies translates into being intelligent and that, in turn, is good marketing. 'Our products are better and smarter because we use the Semantic Web' sounds like a good value proposition for the enterprise.

But even above the marketing speak, RDF solves a problem of data interoperability and standards. This "Tower of Babel" situation has been in existence since the early days of software. Forget semantics; just a standard protocol, a standard way to pass around information between two programs, is hugely valuable in the enterprise.

RDF offers a way to communicate using XML-based language, which on top of it has sound mathematical elements to enable semantics. This sounds great, and even the complexity of RDF is not going to stop enterprises from using it. However, there is another problem that might stop it - scalability. Unlike relational databases, which have been around for ages and have been optimized and tuned, XML-based databases are still not widespread. In general, the problem is in the scale and querying capabilities. Like object-oriented database technologies of the late '90s, XML-based databases hold a lot of promise, but we have yet to see them in action in a big way.

4. Semantic APIs

With the rise of Semantic Web applications, we are also seeing the rise of Semantic APIs. In general, these web services take as an input unstructured information and find entities and relationships. One way to think of these services is mini natural language processing tools, which are only concerned with a subset of the language.

The first example is the Open Calais API from Reuters that we have covered in two articles here and here. This service accepts raw text and returns information about people, places, and companies found in the document. The output not only returns the list of found matches, but also specifies places in the document where the information is found. Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.

Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it. This API classifies information in documents into categories called semantic signatures. Given a document, it outputs entities or topics that the document is about. It is kind of like Calais, but also delivers a topical hierarchy, where the actual objects are leafs.

Another semantic API is offered by Dapper - a web service which facilitates the extraction of structure from unstructured HTML pages. Dapper works by enabling users to define attributes of an object based on the bits of the page. For example, a book publisher might define where the information about author, ISBN and number of pages is on a typical book page and the Dapper application would then create a recognizer for any page on the publisher site and enable access to it via REST API.

While this seems backwards from an engineering point of view, Dapper's technology is remarkably useful in the real world. In a typical scenario, for websites that do not have clean APIs to access their information, even non-technical people can build an API in minutes with Dapper. This is a powerful way of quickly turning websites into web services.

5. Search Technologies

Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search. The premise that a semantic understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google's algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies. When asked What is the capital of France? Google returns a good enough answer.

There is a growing realization that marginal improvement in search might not be enough to beat Google or to declare search the killer app for the Semantic Web. Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.

Alternative approaches also attempt to overlay semantics on top of the search results. Even Google ventures into verticals by partitioning the results into different categories. The consumer can then decide which type of answer they are interested in.

Yet search is a game that is far from won and a lot of semantic companies are really trying to raise the bar. There may be another twist to the whole search play - contextual technologies, as well as semantic databases, could lead to qualitatively better results. And so we turn to these next.

6. Contextual Technologies

We are seeing an increasing number of contextual tools entering the consumer market. Contextual navigation does not just improve search, but rather shortcuts it. Applications like Snap or Yahoo! Shortcuts, and SmartLinks "understand" the objects inside text and links and bring relevant information right into the user's context. The result is that the user does not need to search at all.

Thinking about this more deeply, one realizes that contextual tools leverage semantics in a much more interesting way. Instead of trying to parse what a user types into the search box, contextual technologies rely on analyzing the content. So the meaning is derived in a much more precise way - or rather, there is less guessing. The contextual tools then offer the users relevant choices, each of which leads to a correct result. This is fundamentally different from trying to pull the right results from a myriad of possible choices resulting from a web search.

We are also seeing an increasing number of contextual technologies make their way into the browser. Top-down semantic technologies need to work without publishers doing anything; and so to infer context, contextual technologies integrate into the browser. Firefox's recommended extensions page features a number of contextual browsing solutions - Interclue, ThumbStrips, Cooliris, and BlueOrganizer (from my own company).

The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.

7. Semantic Databases

Semantic databases are another breed of semantic applications focused on annotating web information to be more structured. Twine, a product of Radar Networks and currently in private beta, focuses on building a personal knowledge base. Twine works by absorbing unstructured content in various forms and building a personal database of people, companies, things, locations, etc. The content is sent to Twine via a bookmarklet, via email, or manually. The technology needs to evolve more, but one can see how such databases can be useful once the kinks are worked out. One of the very powerful applications that could be built on top of Twine, for example, is personalized search - a way to filter the results of any search engine based on a particular individual.

It is worth noting that Radar Networks has spent a lot of time getting the infrastructure right. The underlying representation is RDF and is ready to be consumed by other semantic web services. But a big chunk of the core algorithms, the ones that are dealing with entity extraction, are being commoditized by Semantic Web APIs. Reuters offers this as an API call, for example, and so moving forward, Twine won't need to be concerned with how to do that.

Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia - with RDF inside and less information in total. The overall goal of Freebase, however, is to build a Wikipedia equivalent of the world's information. Such a database would be enormously powerful because it could be queried exactly - much like relational databases. So once again the promise is to build much better search.

But the problem is, how can Freebase keep up with the world? Google indexes the Internet daily and grows together with the web. Freebase currently allows editing of information by individuals and has bootstrapped by taking in parts of Wikipedia and other databases, but in order to scale this approach, it needs to perfect the art of continuously taking in unstructured information from the world, parsing it, and updating its database.

The problem of keeping up with the world is common to all database approaches, which are effectively silos. In the case of Twine, there needs to be continuous influx of user data, and in the case of Freebase there needs to be influx of data from the web. These problems are far from trivial and need to be solved successfully in order for the databases to be useful.

Conclusion

With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing. Yet the Semantic Web means different things to different people. Indeed, its definitions in the enterprise and consumer spaces are different, and there are different means to a common end - top-down vs. bottom-up and microformats vs. RDF. In addition to these patterns, we are observing the rise of semantic APIs and contextual browsing tools. All of these are in their early days but hold a big promise to fundamentally change the way we interact with information on the web.

What do you think about Semantic Web Patterns? What trends are you seeing and which applications are you waiting for? And if you work with semantic technologies in the enterprise, please share your experiences with us in the comments below.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php Trends Fri, 26 Dec 2008 09:00:00 -0800 Alex Iskold
SemanticProxy: Jump-Starting the Semantic Web semanticproxy_logo.pngWhile it has great potential, the Semantic Web has failed to live up to its promises so far. Part of the problem, as Thomson Reuters sees it, is that developers will not add a lot of semantic features to their products until publishers start publishing more semantic data. Reuters' OpenCalais represents one way around this problem. But starting today, Reuters' newest project SemanticProxy will give developers an easier way to extract semantic data from any web site.

]]> Even though SemanticProxy is geared towards developers, Reuters has created a demo site that you can try out on the web by just copying and pasting the URL of any web page into a simple form. We tested it with articles on CNN, Wikipedia, and a number of blogs, and it always returned a highly relevant set of results (as long as the page was not excessively long). The service is optimized for performance on 30 of the world's largest news sites, but it also works just as well for other sites.

semanticproxy_demo.png

For a news story, for example, SemanticProxy will identify politicians, cities, countries, etc. that are mentioned in the article. Once parsed, the service returns the semantic metadata of the page in three possible formats: RDF, MicroFormats, or standard HTML.

As the name implies, SemanticProxy acts as a proxy and aggressively caches all its data, which should make it easy for a developer to scale a project that relies on this service.

Catalyst

SemanticProxy is part of Reuters' attempt to jump-start the semantic web. As Tom Tague, the leader of the Calais initiative at Reuters, points out, SemanticProxy can hopefully act as a catalyst and get more developers to look at semantic data, which, in return, will give more developers a reason to publish this data themselves.

Disclosure: Calais is a RWW sponsor

]]> Discuss]]>
http://www.readwriteweb.com/archives/reuters_semanticproxy_jump-start.php http://www.readwriteweb.com/archives/reuters_semanticproxy_jump-start.php Product Reviews Tue, 23 Sep 2008 08:19:34 -0800 Frederic Lardinois
Hakia Announces Semantic API Semantic search engine Hakia today announced a set of APIs that opens up their natural language processing and search platform to developers. Hakia's Syndication Web Services really comes in two parts: search queries, which allow developers to add web search functionality leveraging Hakia's five billion page index, and XML feed calls, which give developers access to Hakia's underlying natural language processing technology. The latter of the two is clearly the more compelling of the offerings.

]]> Mobile video firm, Berggi, released Berggi Search, a mobile search application that lets users search Hakia's index via the API from mobile phones. Berggi is leveraging the part of the Hakia's API that lets developers lean on the company's search platform -- that, however, is not the part that really interests us.

What is more interesting are the XML feed calls that Hakia is offering that give access to their underlying NLP engine. Right now, only the "Summarizer" element is available. Summarizer, which Hakia says can be used to suggest tags or abstracts, analyzes and extracts meaning from large blocks of text or the contents of URLs. Other elements that are not yet available are Categorizer, which identifies "categorical phrases" in text, Characterizer, which "identifies and expands descriptive keywords or tags," and Text Meaning Representation.

Hakia has an XML testing form up on their Club Hakia page, and in our testing it seemed a little rough around the edges. Compared to our testing of Open Calais from Reuters (our coverage), the summaries and tags the XML testing form returned using the Summarizer element weren't very impressive. Mostly, it seemed to just return the headline or first sentence as the summary for articles we threw at it. And for RWW articles, Hakia Summarizer would suggest as tags the tags that we entered by hand in MovableType.

Hakia's Syndication Web Services are free for up to 30,000 requests per day for search services (unlimited free queries for Quotes and Cartoons), and free for up to 1,000 requests per day for XML feed calls. Have you had a chance to play with Hakia's new semantic API? If so, what did you think? How does it compare to Calais or Semantic Hacker? Let us know in the comments below.

Full Disclosure: Occasional ReadWriteWeb contributor Emre Sokullu is a technology evangelist at Hakia.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php Semantic Web Thu, 19 Jun 2008 12:56:42 -0800 Josh Catone
Semantic Feed Reading With FeedzZ At first glance, the social news aggregation site called FeedzZ appears to nothing more that an Alltop clone with fewer categories. But look again - FeedzZ is actually doing something quite different than Alltop, OriginalSignal, Shyftr, or any other news aggregation web site - it's using the Calais API to offer a semantic component to the feed reading experience. This semantic technology is combined with Digg-like voting buttons and an online feed reader which you can use with your own OPML file, all of which lays the groundwork for a unique feed-reading experience.

]]> From the FeedzZ homepage, you have access to main category pages: Science, Technology, Celebrity, Film, Health, Business, Sports, Music, and Politics. Click on any of these headers to see the feeds listed. Only a handful of popular feeds are listed on each category page, but to the left is a list of feeds under the heading "Incoming," meaning feeds that are gaining in popularity.

When you're reading any item from a particular feed, you'll notice thumbs up/thumbs down buttons at the top for voting and a button that keeps track of how many votes a particular post has received. There's also an option to email the article to a friend or bookmark it for yourself.

Viewing a Post on FeedzZ

However, what's really interesting are the tags at the bottom of the post. These tags aren't generated by people, but by the underlying semantic technologies. For example, our recent post "Watch Out Silicon Valley: Here Comes NYC" was tagged: new york michael bloomberg internet week web-oriented technologies seed-stage technology fund. There's also a "related entries" link which displays a list of posts with at least one of the same tags. In this example, thanks to the tag "New York," there were several unrelated entries listed here, but there was also a link to an article about the NYC Seed Fund. So in this case, the more accurate results came from just viewing the "internet week" tag.

In addition to the tags on each post, every page of FeedzZ has an automatically generated, semantically created tag cloud on the left which you can use to see all the posts about a particular subject (Example: Bill Gates).

Issues With FeedzZ

Of course, these related entries and tags could become infinitely more useful if you were to upload your own OPML file. Unfortunately, for true feed junkies that's probably something that will have to wait, since FeedzZ currently imposes a limit on OPML file sizes, restricting them 100 KB or less. (At 142 KB for my subscription list, I was out of luck).

FeedzZ is certainly an interesting experiment in semantics, but that being said, the site still needs a lot more finesse to really be successful. The OPML restriction is only one of the issues. Even if you manage to get your OPML uploaded, it's difficult to determine how to proceed with the data you've imported. You have to find your way into your profile section (no link is provided) and then you have to create a folder structure and classify your feeds. Shouldn't a semantic system know where the feeds belong? When I tried this, I couldn't even classify my feeds manually. Although I clicked the "Classify" button, there was never a feed in the drop-down list to select (see below), so I couldn't proceed. It's as if that piece of the web site was not even built yet.

Attempting to Classify a Feed

These types of issues are major problems in terms of usability, so it's hard to truly recommend the site at this time. However, if these problems were resolved, FeedzZ could then have a shot at being a useful online feed aggregator or even a great research tool for finding related news items on the topics that interest you. It's great that FeedzZ has managed to get the semantic RSS technologies working, but now they need to turn their attention to the user experience and UI design so we all can appreciate their efforts.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantic_feed_reading_with_feedzz.php http://www.readwriteweb.com/archives/semantic_feed_reading_with_feedzz.php Product Reviews Wed, 04 Jun 2008 05:00:00 -0800 Sarah Perez
Reuters Launches Calais 2.0 - Now With Pop-Culture Thomson Reuters' Calais, a semantic markup API that we first reviewed in February, has reached its 2.0 release. The latest version aims to fix one of the main issues with Calais -- that it was too focused on business. Because Calais has roots as Clearforest, the rules it applies while parsing text are biased toward the language of business, which meant that its utility was limited. Version 2.0 has added new semantic entity types in an effort to rectify that.

]]> Calais 2.0 has a dozen new semantic entity types, which Reuters says will increase its utility for "pop-culture publishers and bloggers covering media, music, entertainment and sports, as well as those covering pharmaceuticals, medicine and healthcare." In addition to expanded semantic identification capabilities, Calais 2.0 can now prints results in the Simple Tags format and Microformats, as well as the original RDF.

More than 3,200 developers have signed up to work with Calais since launch, according to product lead Thomas Tague, who said in a press release that Calais and plugins and services built on the API will "make it easy to kick-start metatagging and enter the era of the Semantic Web."

Along with an updated web site, a handful of new code samples and libraries, Thomson Reuters is announcing three new plugins that utilize Calais.

  • Calais Marmoset is a tool that enables developers to automatically create metadata for use with Yahoo!'s open search platform, Search Monkey (our coverage).
  • Calais is also announcing the official release of Tagaroo, a Wordpress plugin that allows bloggers to automatically tag relevant people, places and things in their posts, as well as pull in semantically relevant Flickr photos. We wrote recently about an unofficial Wordpress plugin for Calais, and noted that its utility would be limited mainly to business and tech bloggers because those were the API's strengths. Calais 2.0 should theoretically improve the utility for both plugins for a wider variety of bloggers.
  • Though they've been out since last month, Thomson Reuters is also officially introducing their Calais plugins for Drupal, a popular content management system, that it developed with Phase2Technology.

Calais is an awesome top-down semantic API that can help fuel the bottom-up approach by combing unstructured data and spitting out structured tags. We're excited for the second version of Reuters' product and the added utility that new semantic entity types should bring.

]]> Discuss]]>
http://www.readwriteweb.com/archives/calais_20_launches.php http://www.readwriteweb.com/archives/calais_20_launches.php Product Reviews Sun, 18 May 2008 21:01:01 -0800 Josh Catone
Semantic Web Patterns: A Guide to Semantic Technologies In this article, we'll analyze the trends and technologies that power the Semantic Web. We'll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb's founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

]]> The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence - computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web - the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers - publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exists, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction - such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure: Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.]

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get - because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF - a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

At present, most use of RDF is for interoperability. For example, the medical community uses RDF to describe genomic databases. Because the information is normalized, the databases that were previously silos can now be queried together and correlated. In general, in addition to semantic soundness, the major benefit of RDF is interoperability and standardization, particularly for enterprises, as we will discuss below.

Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles. The metadata is compact and is embedded inside the actual HTML. Popular microformats are hCard, which describes personal and company contact information, hReview, which adds meta information to review pages, and hCalendar, which is used to describe events.

Microformats are gaining popularity because of their simplicity, but they are still quite limiting. There is no way to described type hierarchies, which the classic semantic community would say is critical. The other issue is that microformats are somewhat cryptic, because the focus is to keep the annotations to a minimum. This, in turn, brings up another question of whether embedding metadata into the view (HTML) is a good idea. The question is: what happens if the underlying data changes when someone makes a copy of the HTML document? Nevertheless, despite these issues, microformats are gaining popularity because they are simple. Microformats are currently used by Flickr, Eventful, and LinkedIn; and many other companies are looking to adopt microformats, particularly because of the recent Yahoo! announcement.

An even simpler approach is to put meta data into the meta headers. This approach has been around for a while and it is a shame that it has not been widely adopted. As an example, the New York Times recently launched extended annotations for its news pages. The benefit of this approach is that it works great for pages that are focused on a topic or a thing. For example, a news page can be described with a set of keywords, geo location, date, time, people, and categories. Another example would be for book pages. O'Reilly.com has been putting book information into the meta headers, describing the author, ISBN, and category of the book.

Despite the fact that all these approaches are different, they are also somewhat complimentary; and each of them is helpful. The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.

3. Consumer and Enterprise

Yet another dimension of the conversation about the Semantic Web is the focus on consumer and enterprise applications. In the consumer arena we have been looking for a Killer App - something that delivers tangible and simple consumer value. People simply do not care that a product is built on the Semantic Web, all they are looking for is utility and usefulness.

Up until recently, the challenge has been that the Semantic Web is focused on rather academic issues - like annotating information to make it machine readable. The promise was that once the information is annotated and the web becomes one big giant RDF database, then exciting consumer applications will come. The skeptics, however, have been pointing out that first there needs to be a compelling use case.

Some consumer applications based on the Semantic Web: generic and vertical search, contextual shortcuts and previews, personal information management systems, semantic browsing tools. All of these applications are in their early days and have a long way to go before being truly compelling for the average web user. Still, even if these applications succeed, consumers will not be interested in knowing about the underlying technology - so there is really no marketing play for the Semantic Web in the consumer space.

Enterprises are a different story for a couple of reasons. First, enterprises are much more used to techno speak. To them utilizing semantic technologies translates into being intelligent and that, in turn, is good marketing. 'Our products are better and smarter because we use the Semantic Web' sounds like a good value proposition for the enterprise.

But even above the marketing speak, RDF solves a problem of data interoperability and standards. This "Tower of Babel" situation has been in existence since the early days of software. Forget semantics; just a standard protocol, a standard way to pass around information between two programs, is hugely valuable in the enterprise.

RDF offers a way to communicate using XML-based language, which on top of it has sound mathematical elements to enable semantics. This sounds great, and even the complexity of RDF is not going to stop enterprises from using it. However, there is another problem that might stop it - scalability. Unlike relational databases, which have been around for ages and have been optimized and tuned, XML-based databases are still not widespread. In general, the problem is in the scale and querying capabilities. Like object-oriented database technologies of the late nineties, XML-based databases hold a lot of promise, but we are yet to see them in action in a big way.

4. Semantic APIs

With the rise of Semantic Web applications, we are also seeing the rise of Semantic APIs. In general, these web services take as an input unstructured information and find entities and relationships. One way to think of these services is mini natural language processing tools, which are only concerned with a subset of the language.

The first example is the Open Calais API from Reuters that we have covered in two articles here and here. This service accepts raw text and returns information about people, places, and companies found in the document. The output not only returns the list of found matches, but also specifies places in the document where the information is found. Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.

Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it. This API classifies information in documents into categories called semantic signatures. Given a document, it outputs entities or topics that the document is about. It is kind of like Calais, but also delivers a topical hierarchy, where the actual objects are leafs.

Another semantic API is offered by Dapper - a web service which facilitates the extraction of structure from unstructured HTML pages. Dapper works by enabling users to define attributes of an object based on the bits of the page. For example, a book publisher might define where the information about author, isbn and number of pages is on a typical book page and the Dapper application would then create a recognizer for any page on the publisher site and enable access to it via REST API.

While this seems backwards from an engineering point of view, Dapper's technology is remarkably useful in the real world. In a typical scenario, for web sites that do not have clean APIs to access their information, even non-technical people can build an API in minutes with Dapper. This is a powerful way of quickly turning web sites into web services.

5. Search Technologies

Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search. The premise that semantical understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google's algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies. When asked What is the capital of France? Google returns a good enough answer.

There is a growing realization that marginal improvement in search might not be enough to beat Google, and to declare search the killer app for the Semantic Web. Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.

Alternative approaches also attempt to overlay semantics on top of the search results. Even Google ventures into verticals by partitioning the results into different categories. The consumer can then decide which type of answer they are interested in.

Yet search is a game that is far from won and a lot of semantic companies are really trying to raise the bar. There may be another twist to the whole search play - contextual technologies, as well as semantic databases, could lead to qualitatively better results. And so we turn to these next.

6. Contextual Technologies

We are seeing an increasing number of contextual tools entering the consumer market. Contextual navigation does not just improve search, but rather shortcuts it. Applications like Snap or Yahoo! Shortcuts or SmartLinks "understand" the objects inside text and links and bring relevant information right into the user's context. The result is that the user does not need to search at all.

Thinking about this more deeply, one realizes that contextual tools leverage semantics in a much more interesting way. Instead of trying to parse what a user types into the search box, contextual technologies rely on analyzing the content. So the meaning is derived in a much more precise way - or rather, there is less guessing. The contextual tools then offer the users relevant choices, each of which leads to a correct result. This is fundamentally different from trying to pull the right results from a myriad of possible choices resulting from a web search.

We are also seeing an increasing number of contextual technologies make their way into the browser. Top-down semantic technologies need to work without publishers doing anything; and so to infer context, contextual technologies integrate into the browser. Firefox's recommended extensions page features a number of contextual browsing solutions - Interclue, ThumbStrips, Cooliris, and BlueOrganizer (from my own company).

The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.

7. Semantic Databases

Semantic databases are another breed of semantic applications focused on annotating web information to be more structured. Twine, a product of Radar Networks and currently in private beta, focuses on building a personal knowledge base. Twine works by absorbing unstructured content in various forms and building a personal database of people, companies, things, locations, etc. The content is sent to Twine via bookmarklet or via email or manually. The technology needs to evolve more, but one can see how such databases can be useful once the kinks are worked out. One of the very powerful applications that could be built on top of Twine, for example, is personalized search - a way to filter the results of any search engine based on a particular individual.

It is worth noting that Radar Networks has spent a lot of time getting the infrastructure right. The underlying representation is RDF and is ready to be consumed by other semantic web services. But a big chunk of the core algorithms, the ones that are dealing with entity extraction, are being commoditized by Semantic Web APIs. Reuters offers this as an API call, for example, and so moving forward, Twine won't need to be concerned with how to do that.

Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia - with RDF inside and less information in total. The overall goal of Freebase, however, is to build a Wikipedia equivalent of the world's information. Such a database would be enormously powerful because it could be queried exactly - much like relational databases. So once again the promise is to build much better search.

But the problem is, how can Freebase keep up with the world? Google indexes the Internet daily and grows together with the web. Freebase currently allows editing of information by individuals and has bootstrapped by taking in parts of Wikipedia and other databases, but in order to scale this approach, it needs to perfect the art of continuously taking in unstructured information from the world, parsing it, and updating its database.

The problem of keeping up with the world is common to all database approaches, which are effectively silos. In the case of Twine, there needs to be continuous influx of user data, and in the case of Freebase there needs to be influx of data from the web. These problems are far from trivial and need to be solved successfully in order for the databases to be useful.

Conclusion

With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing. Yet the Semantic Web means different things to different people. Indeed, its definition in the enterprise and consumer spaces is different, and there are different means to a common end - top-down vs. bottom up and microformats vs. RDF. In addition to these patterns, we are observing the rise of semantic APIs and contextual browsing tools. All of these are in their early days, but hold a big promise to fundamentally change the way we interact with information on the web.

What do you think about Semantic Web Patterns? What trends are you seeing and which applications are you waiting for? And if you work with semantic technologies in the enterprise, please share your experiences with us in the comments below.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantic_web_patterns.php http://www.readwriteweb.com/archives/semantic_web_patterns.php Trends Tue, 25 Mar 2008 15:20:45 -0800 Alex Iskold