ReadWriteWeb

Australian Museum Uses Open Calais to Tag Collection

Written by Josh Catone / April 1, 2008 4:45 PM / 6 Comments

The Powerhouse Museum of Science and Design in Sydney, Australia has begun to utilize the Reuters Open Calais API (our coverage) to tag their collection. The museum's online collection database houses some 66,303 objects, so tagging them all by hand would be quite a task. By using the Open Calais web service, the museum is able to automate much of the process.

That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum's electronically documented collection is online in the database which went live in June 2006. Museum objects are searchable, taggable (by humans) and painstakingly described.

However, there are so many objects, that even though users can help to tag them, many of them haven't yet been tagged. Sebastian Chan, who is the Manager of Web Services at the museum, told us that Open Calais is being used to compliment the people-powered tagging they've had running for two years. "What Open Calais lets us do now is connect people, places and companies across our collection and has already revealed many new pathways through our dataset (navigating by designer or inventor is now much easier for example)," he said.

The automatically generated tags at right were created by the API for some swim wear designed by Speedo for the 1991 Australian swimming team that competed at the World Swimming Championships in Perth. Open Calais was correctly able to identify some important locations in the document -- Perth where the competition took place, and Sydney where Speedo is based -- as well as an important corporation (Speedo). It also picked up the name of the designer, and the name of the person who owned the suits before the museum.

However, as you can see, the API made some mistakes too -- it classified "World Championships" as a company, and mistook the general text "international swimming organisation" as an actual organized body. It missed the actual organization (FINA) and probably should have picked up the MacRae Knitting Mills company, which was a predecessor to Speedo. Further, because Open Calais is built around people, places, and companies, general information about items may be lost on it. Tags that would be obvious to humans, such as swimming, swim wear, Olympics, or the year 1991, are beyond the scope of Open Calais.

"These errors and other like them reveal Open Calais' history as Clearforest in the business world," said Chan. "The rules it applies when parsing text as well as the entities that it is 'aware' of are rooted in the language of enterprise, finance and commerce." On the other hand, according to Chan, the technology has already revealed "many new connections between objects," even though it has so far been deployed only very sparingly across the collection.

Powerhouse's use of Open Calais may be the first large scale deployment of the technology across a large public data set. It will be interesting to see the results as they evolve. "It is important to remember that there is no way that this structured data could be generated manually - the volume of legacy data is too great and the burden on curatorial and cataloguing staff would be too great," reminded Chan.

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Wow - this is awesome!

    Posted by: Alex Iskold | April 1, 2008 6:24 PM


  • Agreed - amazing!

    Posted by: Terra Andersen | April 1, 2008 6:41 PM


  • Many congratulations to the Australian Museum for being on the bleeding edge of semantic tagging. It's particularly interesting to see the results given that the knowledge domain in the museum's collection is vastly different than the ones Calais was originally built upon. The "errors" that are pointed out in the post underscore the major challenge Calais faces: how to get the system to learn more semantic knowledge about specialized domains. It would be excellent if the Calais API had functions that allowed people to upload vertical-specific knowledge dictionaries. Once such functionality is tied in with Metaweb's Freebase, then things will get _very_ exciting!

    Posted by: Samidh Chakrabarti | April 1, 2008 10:34 PM


  • Samidh

    We're actually the Powerhouse Museum, one of many Australian museums . . . but not the 'Australian Museum' (they are the natural history museum in Sydney).

    We included the red X beside each Calais tag so that users can help us eliminate the irrelevant tags as they go. The Calais tags are used to compliment social tagging, search recommendations and several internal taxonomies which all run in parallel.

    Excitingly for us we are starting to see connections between collection objects that were previously not visible - especially between people across the collection.

    We will be adding more features shortly.

    Seb Chan
    Powerhouse Museum

    Posted by: Seb Chan | April 1, 2008 11:52 PM


  • Hi all, Tom Tague from Calais here.

    As I commented on the Museum blog - we're really excited by this. When someone has gone to the effort of laying an exceptional foundation (user driven tagging, great collection documentation, etc) and we're able to add value to it - cool.

    Of course it's not surprising we had some difficulties in this domain - but we try to remind ourselves that the 80% that is great is just as important as the 20% that isn't.

    In our next few releases you'll start to see distinctly better metadata generation outside of a directly business related domain. We're starting with sports and entertainment - but there will be new domains on a regular basis.

    Some of these improvements will be due to enhancements in our natural language processing and some will be due to our rapidly increasing incorporation of external data assets as lexicons. Basically - Calais is getting broader and deeper in its capabilities on a monthly basis.

    Congratulations to the Powerhouse Museum for putting such a great tool in front of your users.

    Regards

    Posted by: Tom Tague | April 2, 2008 6:13 AM


  • This is one of the most breathtaking "power of us" applications to potentially attract and involve a diverse mix of us amateurs and experts.

    What a pubic service - to expand access to these collection and enriching the knowledge of it.

    I can hardly wait to see other kinds of orgs. use this.

    Kudos to Samidh, Ton and the rest of the team. Bet you learned alot from each other as you did this project.

    Kare

    Posted by: Kare Anderson | April 2, 2008 10:01 AM




RECENT JOBS


RWW READERS


TEXT LINK ADS


RWW PARTNERS

adaptiveblue

Yahoo Buzz