Secrecy in science is a corrosive force

billswift · on Nov 29, 2009

Secrecy in science makes it non-science. If your claims cannot be tested and falsified, then it is not scientific; hidden data or techniques prevents replication.

sethg · on Nov 30, 2009

I think this is an overstatement. If I say "the mass of an electron is X and this is how I did the experiment to determine X", you can repeat all the steps of the same experiment without looking at my original lab notebook.

The more dubious situation is when I say "based on my analysis of a massive quantity of data, which took my grad students a whole summer of full-time work to collect and which I won't publish here, I observe the following pattern which is evidence in favor of theory Y". But even in this case, a competing scientist with his or her own graduate students can collect data from different sources, or a different kind of data, and publish a paper saying "hey, the pattern in this other data provides evidence against theory Y".

dtf · on Nov 29, 2009

I seem to remember reading, maybe about a year ago, of some movement trying to promote the practice of supplementing scientific papers with reproducible experiments and derivations - ie source code and data. Perhaps it was just in the field of computer science? To a non-academic like me it sounded like a completely obvious thing to do, but yet you hardly ever seem to see it. Surely given the tools the internet offers these days (look at the wonders of github it its ilk for sharing and modifying code), we should be moving academic publishing forward to take advantage of such tools?

bbgm · on Nov 30, 2009

Number of efforts across the board, including chemistry and biology. The extreme side of the "open science" world is Open Notebook Science

http://en.wikipedia.org/wiki/Open_Notebook_Science

In general you are required to put enough into the methods section of a paper to be able to reproduce the results, but as the computational side gets more and more complex and supplementary data sets get bigger and more complicated, the need to make the raw data and (if possible) code available for people to reproduce the results and/or test the assumptions becomes almost a necessity.

A number of folks are thinking along the publishing side as well. Theres PLoS One (http://plosone.org) and Cell has been doing some very interesting prototyping on the "paper of the future". People have already mentioned JoVE and OpenWetWare. It will happen, but it's going to take a few years. Too many years of established practice.

marciovm123 · on Nov 30, 2009

jove.com is a good example of using the web to better share science, so is openwetware.org.

the people running labs these days are usually 35+; they grew up before google. In a sense, they see the web as a way to do what they've always done faster, as opposed to doing new things. As younger people replace them this will change.

marciovm123 · on Nov 29, 2009

Scientists spend loads of time worrying about what data to release and how to present it because it's necessary in communicating your findings effectively to the rest of the scientific community. The problem with the "release everything" approach is that there is so much data generated every day that it would be impossible for anyone to make sense of it. When journalists or other non-technical people try to analyze raw data or scientists' communications on their own they are very likely to mis-interpret or emphasize the wrong things.

LargeWu · on Nov 30, 2009

You seem to be making the assumption that scientists are interpreting their own data correctly in the first place. Reproducibility isn't just a check on outright fraud, it's also a way to check for mistakes in the original findings. The data isn't for lay-people and journalists, it's for other researchers.

As somebody who has had to compile (economic) datasets for public release, I understand that there is cleaning of raw data that needs to happen. However, this needs to happen in a transparent manner, and any transformations should be clearly called out or explained.

thras · on Nov 30, 2009

Or better yet, release the raw data and the transformation scripts.

alan-crowe · on Nov 30, 2009

It is important to automate the downstream processing so that you can perfom it both on the raw data and on the cleaned data. This lets you investigate the sensitivity of your conclusions to your various assumptions.

For example, if the conclusions are supported by the cleaned data but not the raw data, you need to worry about whether you have accidently written your conclusions in by hand during the cleaning process. One way to investigate this is to have two cleaning scripts, a lax one that only does no-brainer corrections, and a strict one that makes those delicate judgment calls. Then you can reprocess. If lax cleaning of the data supports your conclusions then they are fairly secure. If this sensitivity analysis reveals that the difference between lax and strict cleaning matters to your conclusions, then you groan, because you have waded into muddy waters and things are much harder than you initially thought. Maybe you have to recruit an assistant to clean the data blind, without knowing what the "right" answer is, or maybe you need to go on field trips to do direct checks on instruments.

If your research has important and expensive implications for public policy other people will want to do this kind of sensitivity analysis themselves and reach their own conclusions.

bjelkeman-again · on Nov 30, 2009

Part of the problem is the way science is funded. I am sure you have heard about "publish or perish". There is a lot of competition in science for grants. I have seen obvious cases of freeriding on someone else's hard work and not crediting them for the data for example, and if you are not listed as co-author or at least shown as the person where the data came from, then it is harder to get your next round of funding.

Also I see examples of people being hesitant to publish their data, as data collection is often the hard and lengthy process, and then people are afraid that someone else is going to scoop you on that discovery in your data. This means that people end up being cautious about releasing their data.

Now, I am not defending this behaviour, but I can understand what is happening. Personally I think that science grants should come with requirements for publishing raw data and methods completely openly.

sethg · on Nov 30, 2009

Perhaps, in the case of data-intensive work, grants should be given in pairs: one research team gets paid to collect the data, and another team gets paid to analyze it.

dagw · on Nov 30, 2009

If you've done a unique and time consuming experiment, and gathered important data, then that raw data has a real tradable value. It can be traded and bartered for with other scientists to gain access to grants, co-author status, experiments, different data and a host of other favours. Simply giving it away can put you in a real financial or academic disadvantage.

skmurphy · on Nov 30, 2009

In a private research lab your point is well taken. When it's funded by the public (e.g. NIH) or a charitable foundation you may have other obligations. In many cases the funding agencies now want to see the data easily shared and so that it can be reproduced.

InclinedPlane · on Nov 30, 2009

What model, what theory, what principles determine how the raw data should be filtered and processed? If no one can make sense of the data then the data is useless. If there are objective measures for filtering and processing the data and making sense of it, then other experts can critique those methods, they can test the methods, and others, on new and original data to determine how well the methods work. To reproduce or refute results.

This is science.

The risk of public science being misinterpreted by the public is less serious that the risk of private science being inaccessible to other scientists. The first risk is that the public will make incorrect decisions based on misunderstandings, the second risk is that the "science" will be wrong which can lead to the public making incorrect decisions which they believe to be backed by the leading scientists in the field. This second risk is far more dangerous. Whether non-technical people (including journalists) will misinterpret the data is less relevant. Does it matter whether non-technical people can make sense of Quantum Chromo-Dynamics? or General Relativity? or 2D COSY NMR Spectroscopy? What matters is that scientific theories and the data supporting them are available openly, so that others who can understand them can make use of them and can verify them. Anything else is just opinion masquerading as science.

sethg · on Nov 30, 2009

IIUC this has long been an issue in archeology: for example, a small committee of scholars had exclusive access to the Dead Sea Scrolls for decades, and the scholars dragged their feet on publishing the complete text, until someone reverse-engineered much of the text from a concordance.

known · on Nov 30, 2009

"The secret to creativity is knowing how to hide your sources." --Einstein