<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2009://1.15437-</id>
  <updated>2009-11-23T16:51:03Z</updated>
  <title>Comments for Tags as Far as the Eye Can See: New York Times to Publish Index as Linked Data</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15437</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=15437" title="Tags as Far as the Eye Can See: New York Times to Publish Index as Linked Data" />
    <published>2009-06-18T19:43:03Z</published>
    <updated>2009-06-19T01:28:50Z</updated>
    <title>Tags as Far as the Eye Can See: New York Times to Publish Index as Linked Data</title>
    <summary>Today, at the Semantic Technology Conference, Rob Larson and Evan Sandhaus of the New York Times announced together that the Times will soon be publishing its copious index as Linked Data. The Times&apos; data will join content from Project Gutenberg, a vast online library of text from public domain books, data from the U.S. census,...</summary>
    <author>
      <name>Jolie O&apos;Dell</name>
      
    </author>
    
    <category term="Semantic Web" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img src="http://www.readwriteweb.com/timesopen.jpg">Today, at the <a href="http://www.semantic-conference.com/">Semantic Technology Conference</a>, Rob Larson and Evan Sandhaus of the <a href="http://www.nytimes.com/"><em>New York Times</em></a> announced together that the Times will soon be publishing its copious index as <a href="http://linkeddata.org/">Linked Data</a>.</p>

<p>The <em>Times</em>' data will join content from <a href="http://www.gutenberg.org/wiki/Main_Page">Project Gutenberg</a>, a vast online library of text from public domain books, data from the U.S. census, and information from many other formative and vital entities in the semantic web space. Larson and his team  intend to make available hundreds of thousands of tags for content dating back to 1851. This will providing give developers an invaluable, automatically navigable roadmap for the publication's vast directory of knowledge and will link that data to existing pages, people, and content around the web.</p>]]>
      <![CDATA[<p>In his keynote address, Larson emphasized "How deeply we [at the <em>Times</em>] care about metadata."</p>

<p>"It's been fundamental to what we do for a long time. We feel we're good at it, but our content is an island... we want to announce our intention to publish our thesaurus to the community under a license that will allow you to use it and contribute your improvements... The results of this effort will in time take the shape of the Times entering this <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html">Linked Data cloud</a>. This is wholly consistent with our open strategy... to facilitate access to slices of our data for those who want to include it in their applications."</p>

<p>Larson likened the <em>Times</em> corpus to a quarry of data. He said that the <a href="http://www.readwriteweb.com/archives/times_open_developers_new_york_times_api.php">newspaper's API</a> provided the picks and shovels to mine data, and the Linked Data initiative would be the map.</p>

<p><object width="610" height="493.74"><param name="movie" value="http://www.youtube.com/v/ZYmNuIlBlB0&hl=en&fs=1&color1=0x5d1719&color2=0xcd311b"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ZYmNuIlBlB0&hl=en&fs=1&color1=0x5d1719&color2=0xcd311b" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="610" height="493.74"></embed></object></p>

<p>The timing, licensing, format, and other factors of the project are yet to be determined.</p>

<p>This announcement comes on the heels of <a href="http://www.readwriteweb.com/archives/cnet_partners_with_thomson_reuters_on_linked_data.php">CNET's partnership with Reuters</a> to publish data to the Linked Data cloud. Moreover, exactly one month ago, <a href="http://www.readwriteweb.com/archives/linked_data_is_blooming_why_you_should_care.php">we wrote</a> that Linked Data was a concept "whose time has come" and gave a thorough overview of the concepts and standards it entails, for curious readers who would like to drill deeper on the subject.</p>

<p>In another <a href="http://www.youtube.com/watch?v=TjUVd-WaCo0">recent interview</a>, Sandhaus detailed the tagging process for the <a href="http://groups.google.com/group/nytnlp/web/%20Corpus%20Overview%20Page"><em>Times</em>' corpus</a>, both for print and online articles:</p>

<blockquote>"There are two types of tagging that go on at the times... Every day, indexers take the paper and go article by article and associate each article with subject keywords. Then they manually summarize it. It's like a Google list, but in dead tree form.

<p>Another type of tagging we do is... when an article goes from the newsroom to the web, it's put there by a producer who will augment the article with any number of rich features like images, multimedia... and subject keywords. Unlike the indexers, who do this completely by hand, the producers are assisted in their tagging by an automated classification system which suggests tags to be applied to the data and which are ultimately approved by the producer."</blockquote></p>

<p>An official announcement is expected at the <a href="http://open.nytimes.com/"><em>Times</em>' Open blog</a> tomorrow, with details on the project to follow.</p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15437-comment:143132</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15437" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php#c143132" />
    <title>Comment from Elliot on 2009-06-18</title>
    <author>
        <name>Elliot</name>
        <uri>http://blog.alchemyapi.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://blog.alchemyapi.com/">
        <![CDATA[<p>A big day for the semantic web!  Orchestr8 also announced Linked Data support in its AlchemyAPI semantic tagging solution:</p>

<p><a href="http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=104&STORY=/www/story/06-18-2009/0005046519&EDATE=" rel="nofollow"><a href="http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=104&STORY=/www/story/06-18-2009/0005046519&EDATE=" rel="nofollow">http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=104&STORY=/www/story/06-18-2009/0005046519&EDATE=</a></a></p>]]>
    </content>
    <published>2009-06-19T03:20:12Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15437-comment:143189</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15437" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php#c143189" />
    <title>Comment from Ben Werdmuller on 2009-06-19</title>
    <author>
        <name>Ben Werdmuller</name>
        <uri>http://benwerd.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://benwerd.com/">
        <![CDATA[<p>It's great to see the semantic web find its feet. I've always been skeptical about it as something that will find its way into the whole Internet (or even into most user-generated data in web applications) - but as a way to allow automated functionality around structured data like the New York Times index is huge.</p>]]>
    </content>
    <published>2009-06-19T12:02:31Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.15437-comment:144183</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.15437" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/nytimes_linked_data.php#c144183" />
    <title>Comment from Eric Hellman on 2009-06-26</title>
    <author>
        <name>Eric Hellman</name>
        <uri>http://go-to-hellman.blogspot.com/</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://go-to-hellman.blogspot.com/">
        <![CDATA[<p>Just as interesting as what was said at the announcment is what was not said. It seems that they are not releasing their occurrence data (their index) which would be the logical thing to do if this release was meant to drive traffic to their site. More comment from the conference at <a href="http://go-to-hellman.blogspot.com/2009/06/new-york-times-and-infrastructure-of.html" rel="nofollow">http://go-to-hellman.blogspot.com/2009/06/new-york-times-and-infrastructure-of.html</a></p>]]>
    </content>
    <published>2009-06-26T16:29:27Z</published>
  </entry>

</feed>