George Bush signed a $555 billion omnibus spending bill yesterday that included a huge victory for advocates of open science on the internet. All research funded by the US National Institutes of Health (NIH), an agency with a $29 billion research budget, will now be required to be published online, free to the public, within 12 months after publication in any scientific journal.
This should open up a whole world of new opportunities for online research. Readers outside of the academic world but aware of the financial future of health information online in the commercial sector can imagine the analogous excitement about this announcement for academic researchers.
Researchers, academics and others have loudly criticisized the soaring prices of academic journals - which make access to publicly funded research cost-prohibitive to all but the largest institutions and double-charges institutions that paid for researcher salaries already.
The blog Open Access News has a good round up of science blog responses to the news.
Pubmed is the likely home for much of the research, though the law is likely to breathe more life into online sites of scientific activity like the Nature Publishing Group, the science blog search engine PostGenomic and the Public Library of Science.
Data miner Peter Suber from the Unilever Cambridge Centre for Molecular Informatics discusses just one of many reasons this is exciting news.
The hard work continues. But now all fulltext derived from NIH work will be available on PubMed. Other funders will follow suit (if they are not ahead). So our journal-eating-robot OSCAR will have huge amounts of text to mine.The good news is that we believe that this text-mining will, in itself, uncover new science. How much we don’t know, but we hope it’s significant. And if so, that will be a further argument for freeing the fulltext of every science publication.
In related science news, tech and science lovers (many of whom have libertarian sensibilities) should take note of a new video floating around the interwebs - Ron Paul doesn't believe in evolution.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
There are a lot of implications of having all NIH funded papers available via Pubmed Central, although it is but a start. Open publishing has long been a mantra of the open science crowd. Being able to mine papers is one part. The other is being able to access science more easily. Right now, it can become very difficult to access papers unless you have institutional access. I wonder how publishers like Elsevier, ACS, and even Nature will respond (all historically closed). Will they switch to a PNAS like policy of making all papers open after a one year period? I have a feeling more people will end up publishing in PLoS and BMC, which would be good for both publishers.
Next step ... revamp peer review, apply the new Science Commons protocols to data access, microformats, etc. Like you mention, Nature and PLoS have made some great moves in the past year. Hopefully other publishing companies will follow suit.
Thanks for your comment, Deepak - I was hoping you specifically would reply here. It seems that PubMed has done some work with microformats but that was one of my first questions too. I also hope that the time before research is available is less than one year!
Deepak said...
Open publishing has long been a mantra of the open science crowd.
There is no such thing as open science crowd. Scientific peer review publications has been available to anyone who is interested in those papers, since the dawn of academic publications. You read these publications for free at any University library. I visit my local university library weekly (at most 3 hours) to scour the newly arrived journals in computing & engineering, looking for interesting materials & articles that may be useful for commercial application development.
The only closed publications are the ones that private corporations do. They published their R&D and don't release them to the public (by publishing in any peer review journal), but keep them as their intellectual properties because they are immense commercial advantage over their competitors. This is understandable, because you need to keep your commercial secrets into yourself.
You must differentiate between opened & closed publications. Opened publications is any publications that is publicly available (regardless if it is free or paid). So, Elsevier, Springer, ACM, SIAM, IEEE are all opened, since the materials are accessible to the public (although mostly paid). You can subscribe (paid) to these publications or get access to them in any local university library. Closed publications is what Google R&D team are doing , for example. Any improvement (speed or accuracy) in their PageRank algorithm is published and kept to themselves and no one from the public is ever able to see or read about this new improvement PageRank algorithm. Note, Google is known for its closed publication, since you hardly see any peer review papers being published from them, whether it has been submitted to an IEEE journal, ACM, Springer, Elsevier, etc... Microsoft is different, where you see that their R&D are being published in various journals all the time. I happened to implement a data-mining algorithm that was published by Microsoft researchers in an article for IEEE Datamining journal of 2004, (Mining ratio rules via principal sparse non-negative matrix factorization). So, Microsoft is both open & close. There are certain publications that they keep to themselves and some are released to the public (via peer review journals).
Deepak said...
I wonder how publishers like Elsevier, ACS, and even Nature will respond (all historically closed).
I doubt that this will affect them at all. I think that it would be business as usual to them. For example, the popular NIPS (Neural Information Processing System) journal has been freely available for over a decade now, but that didn't stop those established publishers, Elsevier, ACM, Springer, etc, from publishing journals in that field (machine learning - artificial intelligence) or even making their machine learning journals freely available (these journals are still paid ones).
Falafulu, while I appreciate your thoughtful and informed reply very much, surely you can see the difference between studies published online with cost-free access vs. a status quo that requires you to physically visit the facility of an institution capable of paying for a particular journal subscription. I think if you ask almost any librarian they will tell you there is a big difference.
Marshall, always good to see someone from the tech blogosphere find something that interests us biogeeks.
Falafalu
I must beg to differ .. on many grounds. One is a fundamental difference in the concept of open science, which goes a long way beyond just open access to published research. Yes, you can always go to a library and access journals, but not everyone has access to a library with every publication, nor does everyone have the time. Plus in an online world, being able to access electronic versions is very important. Electronic versions that can be searched, mined and add support for things like microformats, etc (the kind of stuff Nature and PLoS have been pioneering).
Another aspect here is the lack of transparency in the peer-review process. Open is not just about making papers freely available. It's a lot more. It's about making science lot more accountable. One of the problems with the publish or perish approach is that too many people publish for the wrong reasons. In an open world that hopefully goes away.
Perhaps the most important aspect of open science for some of us is transparency in data. Data that can be re-used by others to come to additional conclusions that the original author may not be interested in, the publication of scientific protocols online (openwetware), the publication of dark data, etc, interoperability between datasets, etc. That's why we are so excited about the recent Science Commons protocol proposal.
Anyway, I could go on forever, so will stop here.
As far as companies go, that's an argument for another day.
This should have been done long ago. Tax dollars at work.
As someone who works on PubMed, I see this as terribly exciting. This gives us a chance to do some very interesting and creative things. I've personally worked on a PubMed design with microformats. Just wait and see.
Oh, yes, and I'd like to say "woo hoo, job security!" too, while I'm at it.
Sweet, thanks for the comment Edward! Nice to know that - and yes, congrats on the news in the job security front too :)
Deepak said...
Yes, you can always go to a library and access journals, but not everyone has access to a library with every publication, nor does everyone have the time.
The only people who go to the library to get access to those journals are those that are interested in them (researchers & technology implementers) and this is an undeniable fact. Give me an example of why would any random reader here at ReadWriteWeb would be interested in any of those journals unless he/she is a researcher or a technology implementer? So, researchers & technology implementers do know where to get those research papers and they are the only one who are interested either in looking for them in the first place or paid a subscription to get access to them online?
I think that you're arguing in free access (not paid to get access to) against publicly accessible. These are very different. You might as well argue that open source software is free, which you know it is not true in the sense of the literal meaning of the word. I do use some open source software, but I am entitle to charge for my modified version if anyone requests a copy from me.
I ask you this, why don't you subscribe online (paid one) to any journal you want? This will solve your problem of saying that you don't have any time to go to the library. Your whole argument is about free gift (not having to pay) rather than free access. These are different. If you want free access to those research papers and journals, just reach for your credit card and subscribe online, problem solved.
Deepak said...
Another aspect here is the lack of transparency in the peer-review process.
No, wrong. Papers that are submitted for publications in refereed journals are either accepted for publication (because it is original, modification of an old method, algorithm or technique which is better than the existing one, a new domain of application, blah, blah, blah, ...) or being rejected because it doesn't meet the requirements just stated. The rejected papers from any journal, the reviewers (or publishers) explain clearly to those authors of why their work are being denied publication. Don't you think that this is transparency? If it is not, then give your explanation of what you mean by transparency?
Deepak said...
Open is not just about making papers freely available.
That exactly what it means. It means publicly accessible, regardless if money is paid or not. So, all peer review publications that are offered for sale (Elsevier, Nature, IEEE, ACM, SIAM, etc) are publicly accessible, because if they're not open and publicly accessible, these journals would not have bee for sale to the public in the first place, would they? So, don't confuse between publicly accessible literatures (whether you paid to get access to them or read them for free in a library) and freely gifted literatures (literatures fallen from the sky, where you need not pay for them).
Deepak said...
It's about making science lot more accountable.
I don't follow this comment. What is your definition of accountable here? When you publish your paper in a refereed journal, you're only accountable to what you publish. Your peers will try to refute what you publish or counter if they find faults, by submitting their own findings to any refereed journal. This process of argument for or against a specific topic in any discipline goes on and on.
Deepak said...
Perhaps the most important aspect of open science for some of us is transparency in data.
Now, I am lost here. What do you mean by transparency in data? Again, in peer review publications all their data if requested by anyone are given in its originality by the authors of that paper. The idea of doing is that anyone could reproduce what you claim in your paper. I have requested data & codes (if authors have an implementation already) from many authors and most of the time I get given the data & codes. The only codes that I have requested and was not given to me are the ones that the published work was for a commercial vendor (but hey, the algorithm is there in the publication if I or anyone else wanted to implement it and check the claim in the paper). Eg: I requested the code for this algorithm from the author, but he declined my request because the algorithm was implemented for Credit Suisse First Boston merchant banker, however he corrected lots of logic errors in my implementation of his algorithm, which at the end, I had the same input and output exactly as stated in his paper (Numerical pricing of discrete barrier and lookback options via Laplace transforms). He confirmed that my input & output of my code (where I sent him a copy) are the same as described in his paper, however he didn't try to tell me if my implementation is efficient or not, since his work was commercial.
PS: Bioinformatics is one of my area of interests, which I have done algorithm development in that area in the past for protein folding, DNA sequencing and bio-medical data-mining. The popular algorithm that I quoted in my previous message, non-negative matrix factorisation (NNMF), which there are many variants available from the literatures, I implemented one variant based on the following paper from Oxford Bioinformatic Journal (Improving molecular cancer class discovery through sparse non-negative matrix factorization).I use to browse PubMed looking for interesting titles. I am in for easy access to research materials such as it is proposed for NIH, however I am against whingers who say, that everything is for free (not paid to get access to). If you want access to publications, just paid a subscription to get access to them.
@Edward.
Nice to see input here from some-one at the NIH !!!
--
Deepak's comments make a hell of a lot of sense to me.
One logical way forward is to look at how things have progressed with UK PubMed Central. They are performing an audit next month (was already planned) to see what impact the OA Mandates have had here in the UK.
Them folks over at Science Commons will certainly be rejoicing in light of the freeing up of data, and I for one applaud their pioneering work.