open calais - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/open calais en Copyright 2009 Richard MacManus readwriteweb@gmail.com Tue, 24 Nov 2009 12:40:23 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Calais Gets a Wordpress Plugin Open Calais, a semantic markup API from Reuters that we've written about on ReadWriteWeb before, has finally gotten the Wordpress plugin it has been looking for since January, when it started a bounty program seeking one. The new plugins come from developer Dan Grossman and represent one of the first public-facing applications of the API (as opposed to private uses like that of the Powerhouse Museum).

]]>Sponsor

]]> As we reported in March, even with a $5000 bounty, Calais didn't receive much of a response. "Unfortunately - and unexpectedly - we haven't seen any reasonable applications for the bounty process so we'll most likely be contracting for the development of the WordPress plugin," wrote Reuters' Tom Tague at the time. We speculated then that the relatively small size of the bounty may have been the issue. However, Grossman's plugins took just "a few hours" to complete, and though they don't technically meet all of the bounty requirements (they don't do tag clouds or have GUIDs om the RSS items), Grossman estimates that "it'd take only a few hours more to have met all the bounty conditions."

Grossman's plugins, which are available as an auto tagger and an archive tagger (to go back and tag old posts), received over 500 downloads in the first two days. The plugins work by sending post text to Calais and retrieving a list of suggested tags. The plugins rely on an Open Calais PHP class, also written by Grossman. Eventually, the plugins will be released under a Creative Commons license. Grossman tells us he's waiting until the next Calais feature update, scheduled for May 1st, before adding any more features to his plugins.

As we've noted, because of Calais' roots as Clearforest the rules it applies while parsing text are biased toward the language of business. That means that business or tech bloggers will likely find more utility in Calais for the time being. If you're writing about Fortune 500 companies, the Calais Wordpress Auto Tagger plugin might be very useful, but if you routinely write about sewing teddy bears, though, its usefulness might be dubious.

Unfortunately for Grossman, the application deadline for the $5000 bounty passed in March and Reuters has since farmed out the work of creating a Wordpress plugin to a commercial firm. Though work on that plugin continues, we're told that people at Calais have expressed interested in working with Grossman on future Calais-related projects. Open Calais is one of the most interesting new semantic APIs, and we're keen to see developers finally start to embrace it and make some useful mashups.

]]>Discuss]]>
http://www.readwriteweb.com/archives/calais_gets_a_wordpress_plugin.php http://www.readwriteweb.com/archives/calais_gets_a_wordpress_plugin.php Semantic Web Tue, 15 Apr 2008 11:25:48 -0800 Josh Catone
Australian Museum Uses Open Calais to Tag Collection The Powerhouse Museum of Science and Design in Sydney, Australia has begun to utilize the Reuters Open Calais API (our coverage) to tag their collection. The museum's online collection database houses some 66,303 objects, so tagging them all by hand would be quite a task. By using the Open Calais web service, the museum is able to automate much of the process.

]]>Sponsor

]]> That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum's electronically documented collection is online in the database which went live in June 2006. Museum objects are searchable, taggable (by humans) and painstakingly described.

However, there are so many objects, that even though users can help to tag them, many of them haven't yet been tagged. Sebastian Chan, who is the Manager of Web Services at the museum, told us that Open Calais is being used to compliment the people-powered tagging they've had running for two years. "What Open Calais lets us do now is connect people, places and companies across our collection and has already revealed many new pathways through our dataset (navigating by designer or inventor is now much easier for example)," he said.

The automatically generated tags at right were created by the API for some swim wear designed by Speedo for the 1991 Australian swimming team that competed at the World Swimming Championships in Perth. Open Calais was correctly able to identify some important locations in the document -- Perth where the competition took place, and Sydney where Speedo is based -- as well as an important corporation (Speedo). It also picked up the name of the designer, and the name of the person who owned the suits before the museum.

However, as you can see, the API made some mistakes too -- it classified "World Championships" as a company, and mistook the general text "international swimming organisation" as an actual organized body. It missed the actual organization (FINA) and probably should have picked up the MacRae Knitting Mills company, which was a predecessor to Speedo. Further, because Open Calais is built around people, places, and companies, general information about items may be lost on it. Tags that would be obvious to humans, such as swimming, swim wear, Olympics, or the year 1991, are beyond the scope of Open Calais.

"These errors and other like them reveal Open Calais' history as Clearforest in the business world," said Chan. "The rules it applies when parsing text as well as the entities that it is 'aware' of are rooted in the language of enterprise, finance and commerce." On the other hand, according to Chan, the technology has already revealed "many new connections between objects," even though it has so far been deployed only very sparingly across the collection.

Powerhouse's use of Open Calais may be the first large scale deployment of the technology across a large public data set. It will be interesting to see the results as they evolve. "It is important to remember that there is no way that this structured data could be generated manually - the volume of legacy data is too great and the burden on curatorial and cataloguing staff would be too great," reminded Chan.

]]>Discuss]]>
http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php Trends Tue, 01 Apr 2008 16:45:34 -0800 Josh Catone
Aggregate Knowledge's Content Discovery - How Good is it, Really? Aggregate Knowledge, which operates a content discovery network under the brand name Pique, today announced a deal with BusinessWeek to deliver "user-driven content suggestions" on their website. It's the latest in a string of similar deals - Aggregate Knowledge powers "discovery" of both editorial content and product recommendations for over 100 websites, with a particular focus on retail and media. In this post we take a closer look at the implementation at BusinessWeek - and ask if the results come up to scratch.

]]>Sponsor

]]> At last year's Supernova, Aggregate Knowledge CEO Paul Martino referred to his company as the "world's largest implicit social network." The company told ReadWriteWeb today that media sites like BusinessWeek.com, WashingtonPost.com and LATimes.com are using Aggregate Knowledge's Pique Discovery Network "to help users discover new and exciting content on their site." The company has some high powered backing, including uber VC firm Kleiner Perkins.

How Well Does it Work?

Here's how Aggregate Knowledge describes the system for BusinessWeek.com:

"When a reader clicks on a breaking news story on the site, the Aggregate Knowledge Pique Discovery Window automatically provides user-driven content suggestions in the form of “More from BusinessWeek.” These suggestions are based on what visitors are actually reading across BusinessWeek.com."

I clicked some stories on the BusinessWeek.com homepage, and noticed a "More from BusinessWeek" list of links to the right of each story. However, none of these links seemed very relevant to the story. Check out this example from a story about Apple iTunes:

No Apple or even tech stories are linked to. Here's another example - about Russian police visiting BP offices. Curiously, this one lists an Apple story!

No Actual Content Analysis?

So based on my tests, it doesn't seem like there is much - if any - semantic analysis of the page content in order to come up with the "More from BusinessWeek" links. Reading between the lines of the AK quote above, this discovery system is based on clicks and not content.

It appears as if this is collaborative filtering - i.e. users who clicked X also clicked Y. This is basically the system that Amazon and Netflix use. For Aggregate Knowledge, collaborative filtering is still going to give interesting results. But how is it better than - for example - the 'Related Entries' plugin that we use here on ReadWriteWeb (which is based on tags, and so is much more closely aligned to the content itself). See bottom of this post for an example.

Surely for media sites a content discovery system that analyzes the content of a page, such as Reuters Open Calais does, would give better results. Please let us know your opinion in the comments.

]]>Discuss]]>
http://www.readwriteweb.com/archives/aggregate_knowledge_businessweek.php http://www.readwriteweb.com/archives/aggregate_knowledge_businessweek.php Products Wed, 19 Mar 2008 21:20:56 -0800 Richard MacManus