Tuesday, December 28, 2010

Why Raw Data From Science Experiments Can Scare Me.

Cosmology as a field has become precise enough that we may measure theoretical features at the 1/10 of 1% level or better.  For example, one of Planck's greatest successes could be a detection of what is known as primordial non-Gaussianity that, if it exists, is at most a deviation of less than 0.1% from a pure Gaussian spectrum.

With that in mind, let's look at some raw Planck data.  The image above left (black curve) shows raw data recorded by Planck as time goes by.   It has these features:
  1. You see the dipole of the CMB as a "sine-wave" signal as Planck rotates and scans the sky. (See video above for an illustration of this scanning pattern.)
  2. If you look closely,  you see a sharp peak at the same spot in each sine-pattern.  This is Planck observing the galactic plane.
  3. You also see hundreds of spikes that represent cosmic rays hitting the instrument.  
Now here is the point, all those spikes and other large anomalies are much more significant than deviations from a clean Gaussian signal on the order of 0.1%!  Here we are trying to find deviations on the level of 0.1% and the anomalies from false signals are significantly greater that this!

And so, Planck has to somehow remove them.   The graph on the top right shows what their data looks like when these known anomalies/systematics are accounted for.  It looks decent and gives what looks like a near-Gaussian spectrum modulo the peak from the galactic plane.  Hence, naively/hopefully in such data you can now go searching for 0.1% deviations.

But wait!!!

That clean signal assumes at least the following:
  1. That the simulations of the anomalies and the templates and models used for the removal of this stuff are more accurate that the 0.1% level.
  2. That the removal actually worked... beyond what looks good by eye.
  3. That in the process of removing garbage, Planck didn't inadvertently introduce other false signals.
  4. Etc...
And this "scariness" does not just exist for cosmology data.  For example, I have talked with many people working at the LHC who have admitted that their background issues they have to deal with in the data can be frightening in similar ways. 

Conclusion: Now, don't get me wrong, I have a lot of trust in the Planck team/LHC/whoever.  I really do.  But let's just say this still scares me a little.  Some of the most important results from in physics hinge on better that 1/10 of 1% accuracy in both removing false signals and in not introducing fictitious ones during such a removal process.  (And everyone in the trenches knows this can be really hard to get right!) Therefore, this sometimes seems like a scary business... but at the same time it is also a testament to how far we have come in science. :)


  1. This sort of thing is also a very big deal in exoplanet searches where the signal of an Earth-like planet passing in front of a sun-like star produces a 0.001% change in the brightness of the star. With things like thermal expansion of the space-craft, star spots, dust, cosmic rays, etc. it takes a brave science team to turn the raw data into discoveries of exoplanets.

    I much prefer simulations where we know our results are untrustworthy at those levels of precision. It saves a lot of headaches.

  2. Nick,

    Yeah, which is why I like seeing competing teams going at the same final result because bad results do to false signals from systematics from team A should produce different outcomes than bad results do to false signals from systematics from team B. So if their final results agree a lot of trust can be put into those results.

    As for CMB stuff, fortunately there are many competing experiments looking at a wide variety of things so if we get consistant results across the board this will be very reassuring indeed.

  3. By the way, I love the panicked face in your post. There just aren't enough cartoons on this blog for my liking.

  4. Nick,

    I was thinking the same thing when I was posting it. Never hurts to use some good cartoons for making a better experience.

  5. JS,

    Nice post. This is more or less the case in most advanced research in all fields. I deal with numbers that make profits and losses and I am always faced with difficulties in determining what is real and what is fake, as well degrees of uncertainty. Ultimately, I have to depend on my simulations (thanks, Nick!) done previously, on like kinds. Fairly complex, and not explainable.

  6. JS,

    One more thought. The real science is to understand the nature of the noise sources in the raw signals. It would really be cool if one of the Planck guys would do a simplified write up on how they go about cleaning up the signal...

  7. Ancient1,

    Oh they will! :) I'm sure they are going to provide such detail about what they did that there will be whole journal articles just on the low-level data analysis and how they cleaned it up.

    Although, whether it will be simple enough may still be in question. It will be simple enough so that CMB experts can understand it but maybe no simpler unfortunately for everyone else.


To add a link to text:
<a href="URL">Text</a>