ReadWriteWeb

NYTimes Exposes 2.8 Million Articles in New API

Written by Marshall Kirkpatrick / February 4, 2009 2:54 PM / 21 Comments

What do you do when your industry is shifting under your feet? Taking the lead with radical steps is one strategy. The New York Times did just that this afternoon when it announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.

This is a big deal. A strong press organ with open data is to the rest of the web what basic newspaper delivery was to otherwise remote communities in another period of history. It's a transformation moment towards interconnectedness and away from isolation. A quality API could throw the doors wide open to a future where "newspapers" are important again.

Mashery

What does that mean? It means that sites around the web will be able to add dynamic links to New York Times articles, or excerpts from those articles, to pages on their own sites. The ability to enrich other content with high quality Times supplementary content is a powerful prospect.

The Times has opened a wide variety of APIs over the last year; they are making "the newspaper as platform" (as journalist Mathew Ingram put it today) a major part of the company's bid for the future. We discussed the significance of this strategy when the Times opened its first API in October. As we wrote then,

Reporting is no longer a scarce commodity. It's hard for these huge news organizations to do it faster, cheaper or even as well as a whole web of new media producers around the world. They may be among the top sources for original content still today, but considering the direction technology is moving in - that's not a safe bet for the future.

One thing that big media still does have a particularly good share of, though, is information processing resources and archival content. The Times' campaign contribution API is a good example of this. The newspaper is far better prepared to organize that raw information, and perhaps offer complimentary content, than any individual blogger or small news publisher.

We're excited to see how this API gets put to use and we look forward to seeing it develop all the more.

What could come next? We'd love to see some semantic parsing of all this content. As semantic web aficionado Tom Morris wrote today, "[These] Could be signs of something very good - imagine if the New York Times were to join the web of Linked Data, pointing from articles out to all sorts of distributed resources. The amount of information stored up inside an institution like the New York Times would be really interesting if it were linked together with other data on the Web. A search API isn't tremendously interesting, but it is interesting to see someone like the NYT do this, rather than just Web 2.0 sites and hosts of user-contributed material publishing this kind of data."

Or, as Tim Berners Lee reportedly told attendees of the TED conference today - the time has come for no "database hugging" - don't just make your own website. Especially when it comes to government data, we should all demand raw data now.

Full raw data, marked up semantic or linked data, there are a number of options. This is an informational currency that could mean as much to the world of the future as mere delivery of the paper press used to in an otherwise isolated world. We hope this effort will succeed and be another model for more of the same from other companies.

Disclosure: The NYTimes is a syndication partner of ReadWriteWeb.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. This is absoultely great,

    now if only they start using Freebase or DBpedia as strong identifiers for their entities. Then the mash-ups can really start happening.

    bye
    Andraz Tori, Zemanta

    Posted by: Andraz Tori | February 4, 2009 3:45 PM



  2. Brilliant!

    For the life of me, I can't understand why websites such as WSJ.com still limit acces to registered readers (even if registration is free).

    Print media needs to start setting new paradigms for their industry - they're providing news/content - not just newspapers any more.

    Good stuff -great write up! Thanks!

    Posted by: faryl Posted on FriendFeed   | February 4, 2009 3:54 PM



  3. NYT get's it. Tremendous opportunities here for data viz and BI. This is the future of information: open & visual.

    Posted by: chris arkenberg | February 4, 2009 3:54 PM



  4. This is great news...the NYT is the best paper in the country, IMO.

    I will be curious to see how it works. Currently on their site you have to have a free user profile to read most of their content. I have tried emailing articles to friends in the past and they never read them because they don't want yet another username/password.

    Hopefully they don't put the available API content partially or entirely behind a wall.

    Posted by: Jeff | February 4, 2009 4:34 PM



  5. i can remember back in high school trying to search for any news articals relating to my topic, and every time, "This is an exerpt, to view in full you must be a regestered member." it was always a pain, better late than never though!

    Posted by: Robert | February 4, 2009 5:58 PM



  6. I think that nearly all the entities in the NYTimes are probably already in existing semantic repositories/ontologies.

    When you think about Calais culled from Reuters articles or KIM from Ontotext (which we use for Imindi) - the underlying repositories/ontologies like Owlim or Sesame were created from munching on data from news services.

    So I think that the parsing and semantic annotation would be quite easy.

    Posted by: Adam Lindemann | February 4, 2009 6:07 PM




  7. We are well aware and very interested in the Linked Data initiatives and have had lots of conversations with various folks. This is just one step on our journey. Stay tuned.

    Posted by: Derek Gottfrid | February 4, 2009 6:25 PM



  8. Great, great news. Can't wait to see what's created with this. Agree with the sentiment that using standard semantic markup would be great.

    Was just reading the TOS. Actually very reasonable. My favorite clause:

    "use the NYT APIs to operate nuclear facilities, life support, or other mission critical application where human life or property may be at stake. You understand that the NYT APIs are not designed for such purposes and that their failure in such cases could lead to death, personal injury, or severe property or environmental damage for which NYT is not responsible"

    Gotta love lawyers.

    Posted by: graham mudd Posted on FriendFeed   | February 4, 2009 6:45 PM



  9. New York Times is very smart for their API move. Will bring them more traffic and increase global reach. So I wonder how long until someone write a nifty PHP cover for the API.

    Posted by: Mike Reynolds Posted on FriendFeed   | February 4, 2009 6:57 PM



  10. Finally.

    Posted by: Stephan Miller Posted on FriendFeed   | February 4, 2009 7:55 PM



  11. @derek I am very interested if you'll do anything on linking your entities to LOD entities! Let me/us know at Zemanta. We'd then be able to suggest links from blogs to your pages and resources.

    @adam It's not the question of having stuff in some kind of semantic repository. I am more concerned about reconciliating entities so they can be cross-linked and cross-system queries are possible.
    (send me a mail if you want a small demo of what kind of visual journalistic auto-research becomes possible if you do that)

    Posted by: Andraz Tori | February 5, 2009 1:50 AM



  12. Very smart move.

    Posted by: Ken Kennedy Posted on FriendFeed   | February 5, 2009 6:47 AM




  13. I am excited about this too. For $1 million who can answer, how will the Times and others make money from this? Comments regarding data visualization are cool like this one, http://marumushi.com/apps/newsmap/newsmap.cfm Thus, if EVERYTHING is just plain FREE then who wins? Just a thought those out there with the answer who are willing to share for FREE!

    Posted by: L. Howell | February 5, 2009 8:01 AM



  14. This is excellent. NY Times is smart in catering to developer community thru' this API. A couple weeks ago they also announced availability of annotated corpus (contains meta data from NYTimes articles, people, places etc..)
    http://open.blogs.nytimes.com/2009/01/12/fatten-up-your-corpus/

    Kudos to them for all the open data initiatives.

    thanks
    nagaraju

    Posted by: Nagaraju | February 5, 2009 9:19 AM



  15. The enthusiasm of you and most commenters reminds me of a bunch of '60s types who have just torn down the fences to a big rock concert, thus liberating the music from the capitalist pigs. Hmmm. Who will pay the musicians now, and their electric bill and equipment and transportation costs, etc etc?

    Posted by: Michael Hill | February 5, 2009 1:37 PM



  16. Better classifications leads to better targeting rules and higher CPMs (more segmented content rather than remnant)so possibly everybody wins with this move. Happy to see it.

    Posted by: Jonathan Mendez | February 5, 2009 2:15 PM



  17. This is less cool that it sounds. Their documentation makes numerous mentions that the API can only be used for non-commercial use. Their actual terms of use leave some room for commercial use, but only if it doesn't compete with NYT, which is sufficiently ambiguous that it will surely scare off anyone smart enough to read the legalese before they start writing code.

    They want to have their cake and eat it too. I think they'd end up getting a lot more value (a much bigger pie) if they actually let developers monetize the data provided by the API in some way.

    Of course, they still have the problem of figuring out how to make enough off their archives. If their strategy here is to see what independent developers come up with for free, and then either clone it or acquire their projects on the cheap, well, I don't wish them well with that.

    Posted by: eas | February 5, 2009 3:45 PM



  18. NYTimes going open : this is great news !! One more step towards the "websites as webservices" Alex Iskold presented in 2007 in RWW.
    As for earning money, this will require agility and new skills, something I am confident major newspapers can get if they are willing to. They already made the shift to web writing, to video content, partnerships, CMS / SEO, etc... This will only get the web / technical aspect more important, but many strategies are possible. Carefully designing TOS the same way most websites offer free service up to a certain level, and freemium/paid service above.
    The key is to identify which content / service people will agree to pay for in the future : mobile internet and location-based-services will surely be part of it, as live events / social networking (see the recent CNN / Facebook event for the presidential inauguration).
    Tracking content use via trackbacks of more complex stuff may allow for usage billing, some kind of CPA the way affiliation and price comparators work.
    I am just starting to imagine mashup services ... think about all the great mashups were done after Twitter or Googlemaps, for enhanced access (read contextual and accessible) to all that great content ...
    Whaow !! Can't wait !!

    Posted by: Amaury de Buchet | February 9, 2009 3:41 AM



  19. NYTimes going open : this is great news !! One more step towards the "websites as webservices" Alex Iskold presented in 2007 in RWW.
    As for earning money, this will require agility and new skills, something I am confident major newspapers can get if they are willing to. They already made the shift to web writing, to video content, partnerships, CMS / SEO, etc... This will only get the web / technical aspect more important, but many strategies are possible. Carefully designing TOS the same way most websites offer free service up to a certain level, and freemium/paid service above.
    The key is to identify which content / service people will agree to pay for in the future : mobile internet and location-based-services will surely be part of it, as live events / social networking (see the recent CNN / Facebook event for the presidential inauguration).
    Tracking content use via trackbacks of more complex stuff may allow for usage billing, some kind of CPA the way affiliation and price comparators work.
    I am just sta

    Posted by: Amaury de Buchet Posted on FriendFeed   | February 9, 2009 3:45 AM



  20. I agree with your second paragraph -- in fact, I quoted it in an article I wrote about this for Poynter Online. Times CTO Marc Frons talked with me about how the article search API factors into the Times' vision of itself as a news and information source.

    Posted by: Steve Myers | February 9, 2009 11:59 AM



  21. Very nice but...

    Is it just me or is this less than it appears? The NYT API (and several others) don't allow me to place their content on my website. Its basically just a glorified Google search that gives me a URL to their content!

    I can see the advantage of that to the NYT, but why should I as a developer spend the time building to their API just to send my users to a different site?

    Of course, I understand why the NYT would want that, but why would I or my websites users? If all my users want is a link to other content, they can use their favorite search engine to get a lot more than just the NYT information.

    I'm more than willing to attribute, brand, place logo's etc on my site for the NYT content (ala Netflix) but just giving my users a link to their site is a waste of time.

    Posted by: royce | February 28, 2009 9:49 AM



RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS