Tuesday, February 9, 2010

Should Scientists Release Their Codes?

Slashdot has an article where Professor Ince is calling for scientists to make their codes public.  The idea is, it makes what scientists are doing more transparent.

I can think of various pros and cons. However, I know of a few scientists, like Dr. Eiichiro Komatsu (First author for the WMAP 5 and 7 cosmological interpretation papers) who make many of their codes public.

His codes are well written and set up so that by default you can reproduce many things in the literature.  I'm not saying all scientists should follow, but I will say this:  I have 100% confidence in the results Dr. Komatsu gets as I can reproduce them myself and see exactly what he did at each step.

Anyone have any thoughts?


  1. I am generally in favor of making large codes public, however in some cases there are very good reasons not to do so. For example, our code ASH is not publically available for the simple reason that if we do so then we would recieve quite a few requests for essentially tech support. Our code is very much constantly in development and probably not up to the highest standards of readability and ease of use. We don't make our code available because we don't want to spend our time helping people run it.

  2. Additionally, we have found a number of ways to get very poor results from our code if, for example, a poor stellar structure model is used as input. There is some concern that if we release our code to the public someone will miss-use it, get wonky results, and therefore reflect poorly on us and our code.

  3. Sure Nick, it's the initial input that's the problem. :) (Just kidding.)

    I understand where you are coming from. I'm sure many people's codes are fine tuned to what they specifically are doing and any alteration messes them up, especially if that alteration are some initial conditions the code was not written to handel properly..

    *I would not want to be the one who has to keep updating the Makefile so it works for everyone else on their different machines. :) *

  4. Open source has been very successful in commercial software development. Most of the stuff I use every day is free and of very high quality. Perhaps the propeller head stuff you guys work on is too narrow and specific, but it would be cool wouldn't it?

  5. From my experience I would have a very hard time doing what I do without other scientists releasing their code. The basic hydro code that I use is freely available. There is another grad student here that has been essentially writing code to do essentially the same thing, and she has been working on it for over 4 years. In less than one year I have been able to reach the same level that she is working at, in terms of running simulations.

    The visualization code that I use is also free and open source. In terms of total lines of code that I use, I have written very little of what I now use (like 0.01%).

    In terms putting code out for others to use, I can definitely see Nick's problem. When I am done with my simulations and have a paper published, I will have no problem putting out all the code that I have. But right now my code changes from day to day so there is really no point in putting it out there right now.

    I think one of the main reasons why scientists don't release their code is because they are always working on it. It's the "I know it works on my computer because I spent the last 3 years getting it to work, but I don't want to spend another 3 years getting it to work on your computer." problem.

    Having said that I know of a few special cases where scientist won't release their code because they know it is wrong and/or won't hold up to scrutiny. These people rarely make much of an impact in the scientific community. Still there are others who don't release it because it is an ego thing. They like to know that only THEY know how it works, or where everything is. There is one researcher I heard of who knows A LOT about galaxies, and where they are. Someone who once worked with him said that he has an entire galaxy catalog in his brain, but he never shows any interest in preserving it for other people. The sad thing is he is getting old and we will lose that data soon, which means somebody else will simply have to redo his entire life's work.

  6. Stan and Quantumleap42,

    I have to agree that open source is great and has led to much scientific progress.

  7. The code I use, ASH, is different from codes like FLASH, ENZO, or Athena in that it is very highly specialized. It does turbulent convection in rotating spherical shells, instead of general-purpose fluid dynamics or MHD. ASH is also pseudo-spectral so it can only work in spherical geometries, and it solves the full MHD equations, including the Coriolis force and explicit diffusion of heat, momentum, and magnetic fields.

    All of that is essentially saying that because ASH is so specialized there are only a handful of applications for it, unlike codes such as ENZO that can be used for everything from planet formation or galactic mergers to cosmological simulations. Therefore there is much less utility in making ASH publicly available than there is in making FLASH publicly available.

  8. To generalize my argument, I think that when a code is of interest to a wide enough group, then it needs to be made publicly available. ASH isn't there yet but codes like the Global Climate Models (GCM), ENZO, and so forth are.

  9. "Having said that I know of a few special cases where scientist won't release their code because they know it is wrong and/or won't hold up to scrutiny."

    So, anyone have any ideas on the best way to prevent this from happening if people aren't required to show their codes?

  10. "Having said that I know of a few special cases where scientist won't release their code because they know it is wrong and/or won't hold up to scrutiny."

    Almost all codes have dirty tricks that hold things together. The best way to make sure that the dirty tricks aren't giving bogus results is to have your results verified by another code - preferably one that uses a different numerical scheme. If that's not possible and the code isn't public, then it all rides on credibility. A credible numerical work will explicitly state it's warts. For example, if you ever see ASH data with no Gibbs ringing, you can know something is wrong because ASH essentially uses FFT's which are prone to Gibbs phenomena.

    Your own previous work and the previous work of your co-authors also contribute quote a lot. Fool me once and so-forth. If you get a bad reputation it becomes very hard to publish your results because they are either doubted or agree exactly with past results and are therefore uninteresting.


To add a link to text:
<a href="URL">Text</a>