*In principle this should allow readers to completely understand and reproduce t...

jononor · on April 15, 2018

Reducing the barriers for doing reproduction studies will likely increase how often they happen.

Of just consider peer review. How often does a reviewer today actually review the code and data used in a study? As far as I know this is essentially never done. That would involve seeing the code, running it, maybe messing a bit with it.

nmca · on April 15, 2018

Open source software, in general, tends to make things more reproducible. Sure, software licencing might not be the singular root cause, but why does that suggest we shouldn't capitalise on the improvements available?

gaius · on April 15, 2018

Open source software, in general, tends to make things more reproducible

Citation very much needed for that. Because you can very easily find that 6 months or a year later you update your dependencies and everything is now broken. I recently came back to a Python project after a year, updated my packages then realized: I simply cannot be bothered to unpick the mess that resulted just to add one trivial feature. Whereas the poster child for backwards compatibility is closed-source and proprietary.

Science is not reproducible because there are no incentives for it to be so, despite everyone paying lip service to it. It's extra work and helps those who are competing with you for grants, after all. That's a social problem, not a software one. The software is irrelevant.

takluyver · on April 15, 2018

Open source software is important for reproducibility for a couple of reasons. Firstly, if you record that you've done your analysis with Python 3.6.3 and Numpy 1.14.2, and it later breaks on some newer version, it's relatively easy to get the same versions you were using. Commercial software vendors are usually not keen on you downloading and running a version of their product which was superseded four years ago.

Secondly, of course, open source means that if you're not sure why two versions/functions/libraries are giving you different answers, you can go and find out. I accept that a lot of people may not have time for that, but I don't think you can fix that problem unless it's possible to dig down and follow the working.

Finally, 'reproducible by anyone with a computer' is a lot better than 'reproducible by people who buy a license for the tool I used to do it'.

gaius · on April 15, 2018

f you record that you've done your analysis with Python 3.6.3 and Numpy 1.14.2, and it later breaks on some newer version, it's relatively easy to get the same versions you were using

Better record which compiler you used too, and what flags, and every version of every library and everything else. It’s not as simple as you make out and it’s far from guaranteed that all those packages will still be available or compile on your OS.

Commercial software vendors are usually not keen on you downloading and running a version of their product which was superseded four years ago.

I guess you must not deal with vendors much because generally they are fine with this. It’s part of the support agreement usually, just another service. Getting an “obsolete” version for whatever reason has never been a problem for me.

By “anyone with a computer” you mean “anyone who can exactly reproduce my configuration which I don’t even know myself for certain”

improbable22 · on April 15, 2018

Because the reproducibility crisis has very little to do with the difficulty in re-running the same code. These are almost orthogonal concerns.

The typical problem paper has a small data set, on which the authors tried 20 different things, one of which achieved p<0.05 and got published. The result tells you nothing meaningful about the world (or more often, tells you something about how its authors wish the world worked). But re-running their code on their data set will not reveal the problem.