ReadWriteWeb

Journalism Needs Data in 21st Century

Written by Guest Author / August 5, 2009 2:00 AM / 12 Comments

Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on "data" to feed our stories, to the point that "data-driven reporting" becomes second nature to journalists.

The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer.

With this mindset, finding mainstream data-driven stories doesn't take long at all. A quick scan of the Guardian's home page tells us that swine flu cases are up by 50%, according to "fresh figures...[that] will be released this afternoon." The story here is that we're in danger because swine flu is on the rise. Reporting the current figures available for swine flu alone wouldn't be all that interesting. The news comes from comparing the current figures to last week's, which is a very simple form of data analysis. By making use of published data and running one's own analysis (and building on the analysis of others), we get something very news-worthy indeed. It moves the definition ever so slightly, from "saying and asserting" to "analyzing and publishing." But it obviously works only for data that is accessible.

There is nothing new about pointing out the importance of public data being made available. Sir Tim Berners-Lee has discussed at length the importance of governments and institutions putting their data online, making it accessible and useful. His TED talk and interviews with ReadWriteWeb and Talis (disclosure: I am a blogger at Talis) all explain his belief that by publishing linked data we can begin to solve many of the problems the world faces. Innovations in medicine, science, and development could all be achieved if only currently hidden data were made available. Data-driven journalism could be the first step in realizing this dream. The best stories would then come from innovators who read about trends reported in news media and are then able to draw new conclusions and solve bigger problems. In his recent discussion with BBC, Berners-Lee said that the next step is to go for low-hanging fruit by just getting the data out there.

Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. But my perspective has shifted a bit in the past few weeks.

First, there was data.gov and President Obama's call for more access to government data. A sitting head of state (and one of some significance) was clearly calling for public access to government data: this was news! But the idea has been discussed, praised, and debated for a while since then and may have lost some of its luster.

Then about a month ago, UK Prime Minister Gordon Brown made it part of his digital strategy to prioritize the publication of government information. He asked Sir Tim personally "to help us drive the opening up of access to Government data in the web over the coming months" and appointed Berners-Lee an official governmental adviser. By now, neither of these stories is news and comparisons between the initiatives have been made.

The Guardian newspaper recently launched its own Data Blog, with the intention of letting readers access, mash up, and reuse much of its information in the form of data, which could in turn drive stories.

What is perhaps not as explicitly recognized is the voracious appetite for data that has been apparent for months. It is less about turning good ideas into stories and more about seeing how data informs our understanding of events happening right now. Each new initiative is another piece of low-hanging fruit picked.

Access to data is important: it drives innovation and even social change. Governments that publish their data have to become more transparent. Humanitarian organizations that make their findings known could spark bigger projects and source innovative solutions from their communities. Scientific findings and raw information could be used to solve bigger problems than the result of a single experiment or trial could ever manage. Even the simple comparison of two or more facts can lead to new insight, and all of these things happen only when the walls around an institution become porous.

2009 could become known as the year of data, the year of open access, or the year of the semantic Web (see links above for how this relates), and it may also be the first year when it becomes news that data wasn't published in a story when it should have been. That a government body isn't being transparent or is blocking access by publishing its findings in PDF or other non-linking formats would make a very interesting story indeed. We can expect to see more and more organizations and public bodies remove their own barriers through initiatives and legislation. Examples have been set, and seeing excuses die along with barriers is not far-fetched.

Do you know of other data-driven stories? We'd love to hear about any insights that were made through publicly accessible data or where this data might come from next.

Guest author: Zach Beauvais is a Platform Evangelist for Talis and editor of Nodalities Magazine.


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Timely post. CAR experts have been mashing and crunching data for a long time, but I believe data will have a new mainstream role, as you say here. In the UK some of the best developments are external to journalism (MySociety and other transparency projects), but the Guardian is doing great stuff with OpenPlatform so far. Let's hope other publications take its lead. I've been following developments on Journalism.co.uk: #DataJourn stories and very excited about the coming year. Let's get more people talking about #datajourn. The UK government is actively seeking programmers and developers to engage with its data, a call made at the recent OpenTech conference in London.

    Posted by: JTownend | August 5, 2009 4:15 AM



  2. We at reliable hosts appreciate the
    knowledge the blog owner has about
    this topic.
    http://www.reliable-hosts.com

    Posted by: web hosting | August 5, 2009 5:21 AM



  3. However journalists are not hired for their abilities to handle or analyse data.

    I recently looked at the graduates of the top journalism post-grad program in the UK. Of the 20 or so CVs/backgrounds I saw, I did not see a single journalist with an undergraduate degree in a quantitative background (science, statistics, etc).

    What I saw were journalists with degrees in English literature or modern languages. The course in question had no modules on data analysis or handling data or even basic numeracy.

    So you can have all the data you like, but if your journalists do not have the skills or training to deal with it, what are you going to do?
    And handling data is not just about training. It is also about gait, the cut of your gib.

    If you get to 25 yrs old without having a love for data and analysis, handling it will (for the forseeable future) be an alien skill. 10,000 hours to be an expert, 1,000 hours to become relatively good or 25 weeks full-time doing nothing but data analysis. Find me a journalist out of a J-school who has done that?

     Posted by: azeem Author Profile Page | August 5, 2009 5:27 AM



  4. I'm not so sure... you see many of the bigger news sources already have a pretty good grasp of handling data—it's hardly new for a story to report figures or uncover a problem with published information. I think the skillsets exist, though you're right to point out that many of the current generation of journalists aren't explicitly data-handlers.

    But they are fact-handlers, analysts of events and reporters; and I think that's also important. You could conversely say that no data-lovers, hackers and statisticians are trained in narrative or working with [human] languages.

    Either way, it's a problem, and you're definitely right. However, I'm sure many of the larger news sources have many people trained in both, and it may be a steep learning curve to get the mix of their best people right. Data mashups will not kill journalism, it's part of reporting, it's a source. A good journalist can glean facts from the sources, and the geeks can work on providing better tools for the journalists to use.

    Posted by: https://creativecommons.net/zach/ Author Profile Page Posted on FriendFeed   | August 5, 2009 5:50 AM



  5. Very interesting post. Cheers! http://AppUseful.com

    Posted by: NMN | August 5, 2009 6:04 AM



  6. Well articulated post, Zach.

    As government agencies and institutions start to publish their data, let's remind them that the data needs to be published in both human-comprehensible and machine readable forms. It's no longer sufficient to post a 45-page PDF with all sorts of tables and charts. Atomize the inputs into the PDF and put the tables up themselves. Do so with software that can generate the charts. It's equally no longer acceptable to merely post a link to a CSV, or worse, a self-extracting executable with a 40MB CSV.

    Data needs to be readily accessible by all 5 of the core constituent groups: programmers, scientists, journalists, researchers and non-technical but interested citizens. Some of these groups want APIs. Some want the full dataset in a variety of download formats (XML, JSON, CSV, XLS, KML, etc.). Others need to be able to access the data interactively with software that can help them draw meaning from the data.

    Posted by: Kevin Merritt | August 5, 2009 6:53 AM



  7. In response to azeem and zach's comments, some papers do make data a journalistic priority -- I've met the Washington Post's "Data Editor" for example (who has a background in traditional journalism and straddles the two disciplines nicely), and the New York Times data graphics crew is large.

    The missing piece isn't always competence with data, per se, it's having tools to analyze and present it to readers, especially online. A content platform and a word processor are all the technology you really need to publish prose stories -- but to gather, analyze, present, and link data you need to custom-built tools in many cases, and newspapers don't have those resources to spare.

    My job at Verifiable involves building an embeddable data visualization tool (with linked data!) with an eye towards online newspapers, among others. I'd love to hear your (and others') feedback. http://verifiable.com/

    -Peter C.

     Posted by: Peter Couvares Author Profile Page | August 5, 2009 7:25 AM



  8. The Guardian newspaper recently launched its own Data Blog, with the intention of letting readers access, mash up, and reuse much of its information in the form of data, which could in turn drive stories.

    Hi there - I'm Andrew Walkingshaw from Timetric, and we've been working with the Guardian on their Data Blog. A number of stories there are using our online statistics service: ones on public debt, nuclear weapons and MP's expenses, for instance.

    As Peter said, part of this is about building really easy-to-use tools for virtual newsrooms: it's a really exciting space to be working in, and there are a lot of innovative ideas being tried. The work happening at the NYT, WaPo and Guardian is really impressive! We'd love to talk with other folk working in the field and see what we have in common.

     Posted by: Timetric Author Profile Page | August 5, 2009 9:02 AM




  9. I noticed someone mentioned the future of journalism. If our elected officials keep forgetting that we are a Republic with a Constitution; they've all sworn to defend against all enemies forign and domestic, freedom of speech will be a memory! Between the Houses of Congress and the courts they have been running roughshod over "we the people" trying to centralize Government in D.C.
    The old time journalists used to keep a pretty good eye on big brother. Now they just get their info off the wires and on Friday it's see ya' Monday!

     Posted by: Thomas Author Profile Page | August 10, 2009 9:50 AM



  10. Thanks for this post, very informative!

    Posted by: Tech Guy | December 6, 2009 11:50 AM



  11. We've heard plenty about how great journalism will be when statistics and relational database design and administration are required training for any journalism degree, and that's dandy. We've heard less about how humanities-oriented journalists can be, and anyone in a newsroom lives with that reality every day. So much of journalism training is training people to tell stories — and it's reasonable that things turned out that way. Audiences retain information better if it's part of a story.

    In these conversations, we hear the words "interactive" and "visualization" constantly. Journalism can't afford to invest so much energy in so few silver bullets. Data journalism wont be portable enough until you can print it on an unmoving page, broadcast it over radio, whisper it to your confidants and scratch it in the dirt.

    Statistics, the framework we have to fall back on for describing and analyzing data, has its own vocabulary; and you can communicate some very nuanced information — if your target audience already knows what you're talking about. Statistics was never designed to tell stories. That works for the sciences, but journalism has more responsibility to be approachable.

    Communicating the content and implications of large sets of data is something journalism hasn't figured out yet, because the tools developed for science don't serve the same purposes or standards of truth and validity. Journalism tries to tell stories with facts and to spread the Truth-with-a-capital-T. Science is not interested in discovering what's true so much as finding plausible explanations for how the world works.

    Statistics is great for the latter, and absolutely no good for the former. Every single figure derived in the service of inferential statistics has the added caveat of quantifiable uncertainty, declaring that we don't, and can't, know any such Truth-with-a-capital-T by perpetually reminding us how probably we could be so wrong. The savvy journalist will know that, even without the statistics to quantify it. Realistic and useful journalism isn't about revealing the "Truth" — it's about informing decisions.

    Before data-informed journalism can achieve its potential, journalists need to embrace the question "How true is true enough?" as part of their daily practice, and in a way that their traditional values would consider preposterous. Statistics does not have answers to that question. Game theory does, and enterprising journalists wanting to inform their readers should look there for insights.

    Quality interpretation of data begets quantifiable uncertainty, which begets a rejection of knowable Truth-with-a-capital-T. Journalists should learn to accept that, then learn to communicate quantifiable uncertainty to an uneducated audience in the media they know best before creating new media.

    Posted by: Jeremy | December 25, 2009 9:06 PM



  12. It's nice tutorial , some of these can be truly useful.

    Posted by: seo | January 1, 2010 3:19 AM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  

If you think Twitter is big, check out the Real-Time Web
RWW SPONSORS



FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel



TEXT LINK ADS



RWW PARTNERS