reuters - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/reuters en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sun, 22 Nov 2009 19:36:29 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss News Pro: Reuters App for the IPhone Dissapoints treuters_logo_may09.pngThomson Reuters, the U.K.-based news service, released News Pro today, a new application for the iPhone, iPod touch, and Blackberry that gives users easy, almost real-time access to Reuters' news wire. We spent some time with the iPhone version of the application today (iTunes link), and while we like the fact that the app gives us easy access to a lot of great content, the application itself could use a lot of polishing, especially when compared to some of its closest competitors from the Associated Press (iTunes link) and Bloomberg (iTunes link).

]]>Sponsor

]]> It should be noted that the iPhone and BlackBerry apps are bit different. According to PaidContent's David Kaplan, the BlackBerry version is more text-centric, while the iPhone app puts more emphasis on Reuter's video and photo content. The BlackBerry app can be found here.

reuters_stocks.jpgCurrently both apps are available for free (with ads), but Thomson Reuters is looking into a subscription model as well, though according to PaidContent, it will be a few months before we will hear more details about this.

iPhone App Needs Polish

We have seen a number of impressive news applications for the iPhone from prominent players like the New York Times (our review), Wall Street Journal, the Associated Press, the BBC, and Bloomberg. Sadly, the Reuters app turned out to be one of the weakest applications in this group.

Let's start with the good news. While a lot of the other apps tend to take a long time to start up and update their news feeds (though the Bloomberg app also starts up and updates quite quickly), the Thomson Reuters app is ready to go within seconds.

Unlike all of its competitors, though, the Reuters app starts up with the Top News feed by default, and presents the rest of the news categories in a long list, without the ability to customize the order of these categories. If you want the app to show you the latest 'Internet News' when you start the program, for example, you are out of luck, as you have to flick past the stories in the 'Top News' section first.

news_apps_comparison.jpg

All the other apps also allow users to set shortcuts to their favorite sections, while the shortcut menu in the Reuters app is static (News, Pictures, Video, Markets, and Stocks).

In terms of presentation, the image section is nicely done, but the videos look blocky (even with a fast connection) and the Bloomberg app does a way better job at showing information about the stock market.

Verdict

The app has a lot of potential, especially thanks to the excellent and timely content that Reuters is able to offer. Sadly, the iPhone app currently falls short and doesn't quite deliver the experience to complement the quality of the available content. It should be relatively easy to rectify some of these problems, though, and we hope the see a new and better version of the app soon.

]]>Discuss]]>
http://www.readwriteweb.com/archives/news_pro_reuters_on_the_iphone.php http://www.readwriteweb.com/archives/news_pro_reuters_on_the_iphone.php Products Mon, 11 May 2009 10:39:14 -0800 Frederic Lardinois
Calais 4.0 Released: Linked Data Meets the Commercial Web Thomson Reuters is today launching the latest version of its Calais web service and open API, Calais 4.0. Calais is a toolkit of products that enables publishers to incorporate semantic functionality within their properties - enabling them to categorize content as people, places, companies, facts, events, and more. Calais 4.0 is perhaps the most significant version since the launch of Calais one year ago, because it enables publishers to connect to the Linked Data web standard that Sir Tim-Berners Lee and others in the Semantic Web community have been promoting over the past few years.

]]>Sponsor

]]> Up till now, we have yet to see much commercial activity in Linked Data - developments have been largely confined to the academic and scientific communities. So we think Calais 4.0 represents an important move forward in the commercial Semantic Web - and we expect to see some big media companies using it before long.

Specifically, Calais 4.0 goes beyond metatagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com and others. Calais 4.0 also lets publishers share semantic metadata about their content with "content consumers" such as search engines, news aggregators, related stories recommendation services and more.

ReadWriteWeb named Calais as one of our top 10 Semantic Web Apps of 2008, due to the progress it made last year. Since launching the Open Calais API early in 2008, over 9,000 developers have registered with it and Calais has processed 200+ million articles.

What's New in 4.0

We spoke with Thomas Tague, Calais lead at Thomson Reuters, about what specifically is new with Calais 4.0 and what use cases we might see over the coming year for it.

Tague explained to ReadWriteWeb that there are 3 pillers to the Calais initiative:

1. Getting semantic data out of text; which is what the first 3 versions of Calais focused on.
2. Connecting that semantic data to the linked data world.
3. Providing some way for people to share metadata, for example syndicating it - which Tague termed the "transport" piller.

Calais 4.0, explained Tague, fills in the final 2 of those pillers. It supports approximately 25 entity types in Linked Data - URIs are de-referencable to Calais RDF pages. Thomson Reuters is also publishing their ontology in RDFS. Calais will contribute data too, which Thomson Reuters claims is "the first contribution to the Linked Data cloud made by a major publisher." The data that Thomson Reuters is giving to the Linked Data world includes company descriptions, stock tickers, management teams and more. This data will be available to external developers to programmatically use in their apps.

Thomas Tague told ReadWriteWeb that Thomson Reuters has some big data assets and that over time "we're going to populate linked data endpoints with Thomson Reuters data". We asked Tague whether he thinks Calais 4.0 is the biggest commercial use of the Linked Data standard yet? He thinks it is; in his opinion, Linked Data has mostly been used so far for open data projects and relatively small sets of data. Tague said that "we fundamentally believe that companies need to jump into this [Linked Data]".


The Linking Open Data dataset cloud; by Richard Cyganiak

In terms of piller 3, the metadata transportation, Tague explained to us that a document gets a unique identifier - and to syndicate content, publishers just need to make available that unique identifier to external parties.

Conclusion

It will be interesting to see what companies make use of Calais over 2009. Last year we noted that IBM was using Calais - and we presume that with the extra Linked Data and transport functionality, other big companies will want to make use of Calais data too. Thomas Tague told us that they hope to announce 2 big product partners soon. He also said that they're seeing major traction around Drupal. Healthcare IT News from MedTech Publishing, a site developed in Drupal, features the full Calais suite for publishers including "More Like This", their related content plugin.

As we noted at the beginning of this post, we've been impressed with the progress Calais has made since its launch at the start of 2008. With 4.0, we expect to see it gain more traction among commercial publishers in 2009. Indeed as a (we like to think) ahead-of-the-curve 'new media' company ourselves, we're about to embark on our own project using Calais! Stay tuned for more information on that.

]]>Discuss]]>
http://www.readwriteweb.com/archives/calais_4_linked_data.php http://www.readwriteweb.com/archives/calais_4_linked_data.php Products Thu, 15 Jan 2009 05:00:00 -0800 Richard MacManus
AP: The Modern Newsroom Looks Like a Little RSS Reader APExchangelogo.jpgThe 20th century news and stock ticker used to be one of the most archetypal images of newsrooms all around the world. It was timely and exciting, if a bit impersonal, for editors to watch the wires for breaking news from the big news syndicates and select stories to run in the local paper. That ticker doesn't print everything out any more, though, and a constant stream of news is something that millions of consumers now see for themselves inside their RSS feed readers.

How are newspapers adapting to digital syndication? Today the Associated Press announced that more than 500 newspapers are using their service called the AP Member Marketplace. To web savvy consumers, the Marketplace might look like an RSS reader that publishes selected stories to a webpage built out of Del.icio.us badges. It's a pretty interesting program.

]]>Sponsor

]]> The Interface

The AP Marketplace interface looks like a sophisticated, multi-media RSS reader but with limited sources. Publishers set up a workflow that lets editors send selected media items directly from the reader out onto the paper's website.

Below, the AP newsreader, click to view full screen image.

apreadersmall.jpg

It's very reminiscent of of the CMS built by the Crowd Fusion team, which we profiled last week. There's one huge difference though between the AP's project and things like the Crowd Fusion project, the red-hot world of cool-hunting aggregation and even the new publishing strategy of web giants like Yahoo and AOL. The AP service finds and publishes AP stories, not content from around the whole web.

There was a time when it must have been hard to imagine getting more news to choose from than what the wires brought publishers each day. That time has passed and while the small Midwestern US newspapers that the AP highlights as happy users of the Marketplace may be on board - it's hard to say how for how long readers will remain excited about AP fueled news websites. Especially once they discover a little more about how the internet works. (We don't mean to be critical of Mid Westerners, they were just the demographic of several AP demo sites.)

The online research tools used by financial professionals, for example, could probably slap this service both ways to Sunday before it knew which way was up. The AP says, though, that many local papers find their readers overjoyed with the breadth of topical AP content published to content sections or niche websites.

nwabikes.jpgLeft: The North West Arkansas biker scene had nothing like this news site before the AP Exchange came to town, the AP says. This kind of site does look like a good idea for everyone.

Training Component

One very interesting part of the AP Marketplace is that it's very search-centric and the wire service offers weekly 30 minute-long classes in online search skills. The AP Exchange School of Search is a great idea.

apscreen2.jpg

Not all parts of the program are working well, admittedly. The Exchange "blog" and community on Ning are dead, for example. Perhaps early participants learned enough to escape out into the web at large.

News Publishing Around the Web

A year ago media analyst Jeff Jarvis wrote an excellent post about what Editor 2.0 jobs are shaping up to look like. Two years ago we wrote here about some of the exciting things that AP competitor Reuters is doing. [Disclosure, the Reuters semantic web project Calais is now an RWW sponsor.] The media business blog PaidContent says that the AP Marketplace/Exchange service is pitted against new aggregation services explicitely aimed at replacing the AP, like Politico.

It's a time of deep change in the news media world and though we love the feel of a good local paper and its website - their ongoing success cannot be taken for granted. Tools like the AP Exchange look like a great step to take and we enjoy getting to see what the RSS reader equivalent is inside hundreds of local newsrooms.

]]>Discuss]]>
http://www.readwriteweb.com/archives/ap_the_modern_newsroom_looks_like_a_little_rss_reader.php http://www.readwriteweb.com/archives/ap_the_modern_newsroom_looks_like_a_little_rss_reader.php Publishing Services Mon, 29 Sep 2008 10:43:07 -0800 Marshall Kirkpatrick
SemanticProxy: Jump-Starting the Semantic Web semanticproxy_logo.pngWhile it has great potential, the Semantic Web has failed to live up to its promises so far. Part of the problem, as Thomson Reuters sees it, is that developers will not add a lot of semantic features to their products until publishers start publishing more semantic data. Reuters' OpenCalais represents one way around this problem. But starting today, Reuters' newest project SemanticProxy will give developers an easier way to extract semantic data from any web site.

]]>Sponsor

]]> Even though SemanticProxy is geared towards developers, Reuters has created a demo site that you can try out on the web by just copying and pasting the URL of any web page into a simple form. We tested it with articles on CNN, Wikipedia, and a number of blogs, and it always returned a highly relevant set of results (as long as the page was not excessively long). The service is optimized for performance on 30 of the world's largest news sites, but it also works just as well for other sites.

semanticproxy_demo.png

For a news story, for example, SemanticProxy will identify politicians, cities, countries, etc. that are mentioned in the article. Once parsed, the service returns the semantic metadata of the page in three possible formats: RDF, MicroFormats, or standard HTML.

As the name implies, SemanticProxy acts as a proxy and aggressively caches all its data, which should make it easy for a developer to scale a project that relies on this service.

Catalyst

SemanticProxy is part of Reuters' attempt to jump-start the semantic web. As Tom Tague, the leader of the Calais initiative at Reuters, points out, SemanticProxy can hopefully act as a catalyst and get more developers to look at semantic data, which, in return, will give more developers a reason to publish this data themselves.

Disclosure: Calais is a RWW sponsor

]]>Discuss]]>
http://www.readwriteweb.com/archives/reuters_semanticproxy_jump-start.php http://www.readwriteweb.com/archives/reuters_semanticproxy_jump-start.php Products Tue, 23 Sep 2008 08:19:34 -0800 Frederic Lardinois
Reuters Launches Calais 2.0 - Now With Pop-Culture Thomson Reuters' Calais, a semantic markup API that we first reviewed in February, has reached its 2.0 release. The latest version aims to fix one of the main issues with Calais -- that it was too focused on business. Because Calais has roots as Clearforest, the rules it applies while parsing text are biased toward the language of business, which meant that its utility was limited. Version 2.0 has added new semantic entity types in an effort to rectify that.

]]>Sponsor

]]> Calais 2.0 has a dozen new semantic entity types, which Reuters says will increase its utility for "pop-culture publishers and bloggers covering media, music, entertainment and sports, as well as those covering pharmaceuticals, medicine and healthcare." In addition to expanded semantic identification capabilities, Calais 2.0 can now prints results in the Simple Tags format and Microformats, as well as the original RDF.

More than 3,200 developers have signed up to work with Calais since launch, according to product lead Thomas Tague, who said in a press release that Calais and plugins and services built on the API will "make it easy to kick-start metatagging and enter the era of the Semantic Web."

Along with an updated web site, a handful of new code samples and libraries, Thomson Reuters is announcing three new plugins that utilize Calais.

  • Calais Marmoset is a tool that enables developers to automatically create metadata for use with Yahoo!'s open search platform, Search Monkey (our coverage).
  • Calais is also announcing the official release of Tagaroo, a Wordpress plugin that allows bloggers to automatically tag relevant people, places and things in their posts, as well as pull in semantically relevant Flickr photos. We wrote recently about an unofficial Wordpress plugin for Calais, and noted that its utility would be limited mainly to business and tech bloggers because those were the API's strengths. Calais 2.0 should theoretically improve the utility for both plugins for a wider variety of bloggers.
  • Though they've been out since last month, Thomson Reuters is also officially introducing their Calais plugins for Drupal, a popular content management system, that it developed with Phase2Technology.

Calais is an awesome top-down semantic API that can help fuel the bottom-up approach by combing unstructured data and spitting out structured tags. We're excited for the second version of Reuters' product and the added utility that new semantic entity types should bring.

]]>Discuss]]>
http://www.readwriteweb.com/archives/calais_20_launches.php http://www.readwriteweb.com/archives/calais_20_launches.php Products Sun, 18 May 2008 21:01:01 -0800 Josh Catone
Aggregate Knowledge's Content Discovery - How Good is it, Really? Aggregate Knowledge, which operates a content discovery network under the brand name Pique, today announced a deal with BusinessWeek to deliver "user-driven content suggestions" on their website. It's the latest in a string of similar deals - Aggregate Knowledge powers "discovery" of both editorial content and product recommendations for over 100 websites, with a particular focus on retail and media. In this post we take a closer look at the implementation at BusinessWeek - and ask if the results come up to scratch.

]]>Sponsor

]]> At last year's Supernova, Aggregate Knowledge CEO Paul Martino referred to his company as the "world's largest implicit social network." The company told ReadWriteWeb today that media sites like BusinessWeek.com, WashingtonPost.com and LATimes.com are using Aggregate Knowledge's Pique Discovery Network "to help users discover new and exciting content on their site." The company has some high powered backing, including uber VC firm Kleiner Perkins.

How Well Does it Work?

Here's how Aggregate Knowledge describes the system for BusinessWeek.com:

"When a reader clicks on a breaking news story on the site, the Aggregate Knowledge Pique Discovery Window automatically provides user-driven content suggestions in the form of “More from BusinessWeek.” These suggestions are based on what visitors are actually reading across BusinessWeek.com."

I clicked some stories on the BusinessWeek.com homepage, and noticed a "More from BusinessWeek" list of links to the right of each story. However, none of these links seemed very relevant to the story. Check out this example from a story about Apple iTunes:

No Apple or even tech stories are linked to. Here's another example - about Russian police visiting BP offices. Curiously, this one lists an Apple story!

No Actual Content Analysis?

So based on my tests, it doesn't seem like there is much - if any - semantic analysis of the page content in order to come up with the "More from BusinessWeek" links. Reading between the lines of the AK quote above, this discovery system is based on clicks and not content.

It appears as if this is collaborative filtering - i.e. users who clicked X also clicked Y. This is basically the system that Amazon and Netflix use. For Aggregate Knowledge, collaborative filtering is still going to give interesting results. But how is it better than - for example - the 'Related Entries' plugin that we use here on ReadWriteWeb (which is based on tags, and so is much more closely aligned to the content itself). See bottom of this post for an example.

Surely for media sites a content discovery system that analyzes the content of a page, such as Reuters Open Calais does, would give better results. Please let us know your opinion in the comments.

]]>Discuss]]>
http://www.readwriteweb.com/archives/aggregate_knowledge_businessweek.php http://www.readwriteweb.com/archives/aggregate_knowledge_businessweek.php Products Wed, 19 Mar 2008 21:20:56 -0800 Richard MacManus
Reuters Open Calais Update: Apps Progress, Interview A month ago we wrote about Reuters launching an API called Open Calais, a technology that "does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events." I mentioned Calais in my Media08 presentation last week entitled Web Technology Trends for 2008 and Beyond. It generated interest in the media-focused audience I presented to, so in this post we follow up with Reuters and ask what progress is being made. Specifically we look at what apps have been built so far on Calais and get feedback from Reuters' Tom Tague.

]]>Sponsor

]]> Quick Recap of Open Calais

Open Calais is a Semantic Web technology - and in this case the next generation of the Clear Forest product, which Reuters acquired in April '07 (see our Dec '06 review). Alex Iskold's post last month is 'must read' to understand what Open Calais is and why Reuters bought it. This diagram summarizes:

The API is free for both commercial and non-commercial use and Reuters told us last month that it is prepared to scale for a massive concurrent demand. The API is great for third party developers, because it gives them access to Reuters data. And it benefits Reuters, because it enables Reuters to aggregate metadata for its own uses.

Alex listed some possible uses: intelligent search engines that look for related content, automatically inserting links into raw text, structured alerts, on-the-fly text analysis within your browser.

Example Apps?

So it sounds great in theory, but are there any examples of Open Calais apps so far? Reuters has a "bounty" program set up, whereby developers are invited to create Open Calais applications and Reuters will pay for that. However, it seems there has been little - if any - takeup of the bounties.

Top of the list of wanted apps was a Wordpress plugin. Tom Tague, who is leading the Calais initiative at Reuters, noted in the forum that "unfortunately - and unexpectedly - we haven't seen any reasonable applications for the bounty process so we'll most likely be contracting for the development of the WordPress plugin." Perhaps the amount of the bounty in this case was an issue - Reuters only offered $5000 for the Wordpress plugin, which doesn't seem like much of an incentive.

So Reuters has been forced to take the initiative and release some apps of their own. One is a new web based document submission tool and viewer. There is some sign of action in the Open Calais forum, on a page where developers can list what they're working on. A developer named Craig has built an example of Calais semantics using pure PHP and Abhay Kumar has a similar service. These are all 'data input' tools. For an 'output' example, check out Mark Choate's RSS implementation of Calais data (example below).

Interview with Reuters' Tom Tague

Clearly, it's early days. I asked Open Calais lead Tom Tague how the initiative is progressing? Tom replied that "we’re about where we expected to be in terms of applications for Calais." He told us that the service is "just a little over 45 days old and much of the effort we’re seeing is in building tools to explore the capabilities themselves."

At this time Open Calais has just over 1,500 developers signed up; with about 30% of those developers actually making calls to and experimenting with the service. "One of the more exciting things that’s going on," Tom Tague told us, "are several community-led efforts to build Calais libraries for Ruby, PHP, ASP.NET and others. These will provide a great accelerant for developers to gain access to the service."

How is Reuters using Calais In-house?

So, at this point there is nothing to see for non-developers - the apps that have come out so far are developer-focused and not something the rest of us can use. So my next question to Tom was: how is Reuters itself using the Calais technology?

Tom replied that Reuters has several things underway:

"We're in the process of adding rich metadata to over 20 years of historical news archives (many millions of articles) to improve searchability and organization. We’re doing a lot of work in automating and generally improving the efficiency of a massive real time content ingestion process. We’re working with one of the community platforms deployed for Reuters customers to improve the tagging and classification of user generated content. And, of course, we have significant efforts under way to generate “machine readable news” to drive low-latency algorithmic trading. All of these efforts are based on the same technology platform driving the Calais initiative."

Conclusion: Show Us The Apps!

I must admit that I was expecting to see some working apps by now. Perhaps it is a similar case to Marshall Kirkpatrick's experience of Twine (published earlier today), the Semantic knowledge management service that received much early hype. Marshall thinks that Twine is underdone at this time and that the 'consumer' experience is lacking. Calais is much newer of course and, as Tom Tague said, it has only been out in the open for 45 days. So it would be unfair to compare the two efforts. Nevertheless, it would be great to see some compelling consumer-facing apps for Open Calais; even better would be to see something from Reuters that shows the public the benefits of semantic technologies.

Alex Iskold listed a number of consumer apps that could be built using Calais, by Reuters or external parties. I think people need to see at least one of those pretty soon - in order to translate the interest that Open Calais is generating from media and other people, into something non-geeks can see working on the Web and producing noticeably better information results. To paraphrase the famous Jerry Maguire quote, 'Show me the apps!'.

]]>Discuss]]>
http://www.readwriteweb.com/archives/reuters_open_calais_apps_interview.php http://www.readwriteweb.com/archives/reuters_open_calais_apps_interview.php Analysis Tue, 11 Mar 2008 14:37:08 -0800 Richard MacManus
Weekly Wrapup, 4-8 Feb 2008 Here is a summary of the week's Web Tech action on ReadWriteWeb. For those of you reading this via our website, note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email.

Highlights this week: Josh explores Super Tuesday on the Web and pinpoints why Obama and Paul are the Internet kings; Marshall dives deep into the MySpace and Facebook platforms, and ponders the privacy implications of Google's Social Graph API; Alex analyses Reuters' new Semantic Web initiative; Sarah looks at MySpace's partnership with web browser Flock; and Bernard tells us why the current recession isn't our bubble.

]]>Sponsor

]]> Web News

This week Microsoft's $44.6 Billion bid for Yahoo! continued to make headlines and keep bloggers busy with pontifications on what may or may not eventuate. So far there is no official word on whether Yahoo will accept the offer, but probably the most telling development of the week was Google raising questions about the deal. In a blog post, Google wondered whether Microsoft could "now attempt to exert the same sort of inappropriate and illegal influence over the Internet that it did with the PC?". Specifically Google is worried about a potential monopoly in web email, IM, and web-based services. The search giant claims that the Microsoft bid for Yahoo! threatens “the underlying principles of the Internet: openness and innovation.”

In a RWW poll, we asked whether Google had lost the plot. 33% of respondents said that Google is fear-mongering, and a further 24% said that it smacks of desparation. So public opinion seems to be against Google on this one, although not overwhelmingly so.

See also: Microsoft, Yahoo! and the Effect on OpenID

This week was significant because it was 'Super Tuesday' in the US presidential race. There are a number of tools on the web to make election watching easier, and we rounded up some of our favorite Super Tuesday websites.

Josh Catone also wrote an analysis of why Barack Obama and Ron Paul are the kings of US politics on the Internet. Josh wrote that "they both command the lion's share of their party's attention online and seem to dominate social networking and social media sites." But he wondered: why is only one of those campaigns actually working?

Web Trends

Web 3.0: Is It About Personalization?

On the UK's Guardian newspaper site today, writer Jemina Kiss suggested that Web 3.0 will be about recommendation. "If web 2.0 could be summarized as interaction, web 3.0 must be about recommendation and personalization," she wrote. Using Last.fm and Facebook's Beacon as an example, Kiss painted a picture of a web where personalized recommendation services can feed us information on new music, new products, and where to eat. It's a marketers dream and it's really not far off from the definitions we've come up with in the past here on ReadWriteWeb.

NOTE: check the comments of this post, as it sparked a very interesting discussion.

Is Google's Social Graph API a Creeping Privacy Violation?

The new Google Social Graph API lets developers draw connections between your friends on one service and your friends on another. It indexes XFN (XHTML Friends Network) and FOAF (Friend of a Friend) data, standard microformats that publishers like Twitter or Facebook can append to your friend relationships inside their services.

Though in most cases the API pulls in publicly available information explicitly marked up with one of two microformats, there is no standard yet developed for user opt-in or opt-out. Google's Social Graph API is also not limited to XNF and FOAF data. MySpace CTO Aber Whitcomb told Marshall Kirkpatrick this week that the API includes a custom mechanism to extract social connections between friends on MySpace, though that social network does not yet publish XFN/FOAF.

This Is Not Our Bubble

Back in early October Bernard Lunn posted about coming economic storms and what entrepreneurs could do to prepare. Given recent news, it is now almost certain that we are in recession. The bad news from financial institutions and credit markets is like a steady drumbeat, so it would be easy to write about “battening down the hatches” or even jumping for the lifeboats.

Far from it, wrote Bernard. These are great times for entrepreneurs. Really. This is not our bubble. We had our bubble and it burst in March 2000.

SEE MORE WEB TRENDS COVERAGE IN OUR TRENDS CATEGORY

Web Products

MySpace Platform Aims to Pick Up Where Facebook Left Off

MySpace launched its developer platform this week and is went to great lengths to highlight the ways it's different from the Facebook Platform. That's ironic given that the dominant reaction to the Facebook Platform, from users at least if not the press, is that it's made the site too much like MySpace.

None the less, there are some very interesting details available about the MySpace Platform. After all, that is where the action is - there's far more traffic to MySpace than Facebook.

See also: Your MySpace Web Browser Is Coming

Facebook to Punish Stupid Applications, Reward Good Ones

On the same night the sophisticated MySpace Application Platform was released to developers, Facebook announced an important forthcoming development that should make FB apps a whole lot less annoying. Let the Platform Wars begin!

Starting next week, Facebook apps that get good user responses from Newsfeed messages (clickthroughs, app installs) will be allowed to send more notifications and apps that get fewer user responses to their notices will have the number of notices they can send cut down. Metered messaging based on user engagement could save the Facebook Platform from a growing sense of app fatigue.

See also: When Facebook Ads Go Wrong

Reuters Wants The World To Be Tagged

As we recently predicted, in 2008 we'll witness the rise of semantic web services. From the native support for Microformats in Firefox 3, to the New York Times' utilization of rich headers metadata, to this week's release of the Social Graph API by Google, semantics are starting to slip onto the web. The impact is being felt because large companies are really starting to focus on structured information.

In the same vein, last week Reuters - an international business and financial news giant - launched an API called Open Calais. Alex Iskold analyzed it in this post.

SEE MORE WEB PRODUCTS COVERAGE IN OUR PRODUCTS CATEGORY

That's a wrap for another week! Enjoy your weekend everyone.

]]>Discuss]]>
http://www.readwriteweb.com/archives/weekly_wrapup_4-8_feb_2008.php http://www.readwriteweb.com/archives/weekly_wrapup_4-8_feb_2008.php Weekly Wrapups Sun, 10 Feb 2008 12:04:01 -0800 Richard MacManus
Reuters Wants The World To Be Tagged As Richard MacManus recently predicted, in 2008 we'll witness the rise of semantic web services. From the native support for Microformats in Firefox 3, to the New York Times' utilization of rich headers metadata, to this week's release of the Social Graph API by Google, semantics are starting to slip onto the web. The impact is being felt because large companies are really starting to focus on structured information.

In the same vein, last week Reuters - an international business and financial news giant - launched an API called Open Calais.

]]>Sponsor

]]> The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. This technology is the next generation of the Clear Forest offering, which Reuters acquired last year. We have profiled Clear Forest on ReadWriteWeb and in this post we will look at what Reuters opened up and why.

Open Calais API Basics

The idea behind Calais is simple - identify interesting bits into metadata in documents. In this implementation the focus is on People, Companies, Places, and Events, but surely the technology can be adopted to other entities. The heavy lifting is done by the combination of a natural language processing engine and a massive hard coded, learning database that Clear Forest has built.

For any document submitted into Calais, entities are identified, extracted and annotated. For example, when the press release about the acquisition of Clear Forest is analyzed, the following meta data is identified:

  • Relations: Acquisition, CompanyInvestment, PersonProfessionalPast
  • Organization: Palo Alto Research Center
  • IndustryTerm: broader search development effort, text search, text analytics software, ...
  • Company: Time Warner Inc.,Reuters, Pitango Venture Capital, Inxight, ClearForest Ltd, ...
  • Person: Gerry Campbell
  • Country: United States, Israel
  • City: Tel Aviv, SAN FRANCISCO, Waltham

This is rather impressive set of information. According to the documentation page, the response is delivered in under one second for larger documents, and much faster for smaller ones - in other words, real time or near to it.

What was not quite clear from the documentation is if Calais can deal with raw HTML pages. It appears that the API requires an XML document, where the main text is marked differently from the header and footer. Ideally, an API like this should be able to accept URLs, because distilling structure from HTML would not be trivial for developers. Another thing that we noticed is that the resulting document is extensively marked up. What the developers get back is literally the output of the Calais engine. It would be good to be able to get a lighter version, which simply identifies entities and their positions in the text.

Currently the API is free for both commercial and non-commercial use and Reuters says it is prepared to scale for a massive concurrent demand. The question is then how can this be used?

What is Calais Good For?

There are quite a few interesting applications for this technology. First - better search. Knowing the kinds of entities in the text allows developers to build intelligent search engines that look for related content. For example, imagine a page on Reuters with this press release and in the sidebar links to learn more about Clear Forest, Reuters, Inxight, etc. Similarly, Calais could enable links to countries and cities mentioned in the document. And these searches need not be generic searches, but rather specific vertical ones.

Another application would be to build engines like Inform, which automatically inserts links into raw text. By automatically identifying entities in the document, Calais also identifies what should be linked. So a big piece of Inform's secret sauce is trivialized. The rest is basically a raw search through the archive, which can be done with a Google custom search engine, for example. It is possible that more tech savvy media companies could leverage Calais in exactly this way.

Another application is structured alerts. Modern alert systems are keyword based and suffer from false positives. Using Calais it is possible to build precise alerts for people, companies, places and events like corporate acquisitions. With the flood of junk in our RSS readers this is rather welcomed news.

Yet another application would be to incorporate on the fly text analysis into the browsers. In a way, this is not much different from having Microformat annotations on the page, except that the annotations are delivered on the fly. For example, a browser could call Calais on document load and obtain a list of people, places, companies, etc. which are embedded in the document. With this information the browser would be able to create a more interesting, more contextual, and relevant experience.

What's In It For Reuters?

Reuters has opened up a generous API, but why? During our interview, Gerry Campbell, the President/Global Head of Search & Content Technologies at Reuters, explained that Reuters wants the world to be tagged. When the world's content is quickly and readily accessible to their customers, Reuters wins. Semantic technologies result in better, faster, more precise and relevant information, and Reuters, as a big player in the information space, wants to be one of the first companies delivering this kind of experience.

Beyond an outstanding customer experience, Calais leads to a unique, attractive set of assets. First - a growing semantic database of people, places, companies and events. With each new document submitted into Calais the database gets richer and more complete. This is a roadmap to a semantic business powerhouse, which is clearly a great position to be in for any business media company. And in a way, what grows beneath Calais will not be that unlike Freebase. Except of course, it is happening completely automatically.

The second big advantage of having an open API is training the system. Any AI-based solution like Clear Forest is in constant need of tuning and evolution. Having other companies use the system would allow the engineers to run into cases that they have not thought about and broaden the capabilities of the system. Campbell told us that Calais is already processing a significant subset of Reuters information in nearly real time. This is both impressive technically and smart from an engineering point of view - it is an "eat your own dog food" approach to building a great piece of software.

Conclusion

The Calais API is another big win for top-down semantic web technologies. Using a mix of natural language processing, AI techniques, and a massive databases, Reuters' solution extracts important bits of information from raw HTML pages. People, Companies, Places, and Events are really at the heart of many business articles, so being able to instantly identify them in the text is a big deal. From better search to better cross-linking and more intelligent browsing, the Calais API is an invitation to tap into one of the most powerful and pragmatic semantic platforms that exists and works today.

What sort of things do you envision to be possible with Calais? What applications would you like to see built with this platform?

]]>Discuss]]>
http://www.readwriteweb.com/archives/reuters_calais.php http://www.readwriteweb.com/archives/reuters_calais.php Products Wed, 06 Feb 2008 01:47:18 -0800 Alex Iskold