## The Higgs Boson and Bayesian Statistics

When the discovery of the Higgs Boson was announced there was controversy over the use of classical Hypothesis Testing and the lack of Bayes in particle physics. Some reactions, with useful links, can be found here, here, here, and here. In this post, I’ll point out what particle physics has to lose by ignoring Bayesians.

One goal of particle physics is to determine the mass of particles such as neutrinos. A couple of centuries back, Laplace faced the similar problem of estimating the mass of Saturn from some error prone, earthbound measurements of the mutual perturbations of Jupiter and Saturn. A prior can be obtained, in the words of Jaynes, from the “common sense observation that can’t be so small that Saturn would lose its rings; or so large that Saturn would disrupt the solar system”. Jaynes goes on to describe Laplace’s success:

Laplace reported that, from the data available up to the end of the 18th Century, Bayes’ theorem estimates to be (1/3512) of the solar mass, and gives a probability of .99991, or odds of 11,000:1, that lies within 1 percent of that value. Another 150 years accumulation of data has raised the estimate 0.63 percent.

This stands in contrast to use of confidence intervals in physics, which are well known to posses nothing like the coverage properties Frequentists claim are objectively guaranteed.

But this raises an interesting point. Prior information like “ can’t be so small that Saturn would lose its rings” is very difficult for Frequentist to use. The problem, as explained here, is that this information in no away physically affects the errors in the earthbound observatories, or causes us to modify their sampling distribution or model. It only affects the prior distribution!

Moreover, this kind of prior information is available for neutrinos. A discussion can be found here, but a flavor is given by the quote:

If the total energy of all three types of neutrinos exceeded an average of 50 eV per neutrino, there would be so much mass in the universe that it would collapse.

So that’s what Bayes has to offer particle physics.

This also puts lie to the belief sampling distributions are hard facts while priors are subjective, if not outright meaningless. While I’ve never met anyone who’s measured the errors given by any device in a laboratory or observatory to “check” the assumed sampling distribution, and Laplace certainly didn’t do so, almost everyone’s seen images like the one below verifying the prior.

UPDATE: The charming Dr. Mayo reposted a discussion on the use of background information to assign prior probabilities here. The choice quote is

There is no reason to suppose that the background required in order sensibly to generate, interpret, and draw inferences about H should—or even can—enter through prior probabilities for H itself!

I love that “or even can” line. Yep, there is no reason to suppose. The good Dr. Mayo has searched high and low, but can’t find a reason anywhere.

July 23, 2013konrad

link

That’s a fairly uninformative prior – the approach with the Higgs Boson was to collect enough data that such priors ought to be swamped. I’d hope that Bayes has more to offer than this?

July 23, 2013Joseph

link • author

The information used in the prior, like the likelihood, is whatever it is. It may be a lot or it may be little; it may be highly relevant or only mildly so.

Did you think there was a theorem in statistics somewhere that obliterated that reality?

July 23, 2013konrad

link

No, my point was that if this is all Bayes has to offer particle physics (and I imagine this is what many physicists think) then it is understandable why they largely ignore Bayesian methods. In the problems they work with, the information in the prior is also available from the data, so they can get the same results outside of the Bayesian framework.

If the aim is to convince physicists to embrace Bayesian methods, the key must lie in the other advantages of Bayesianism (averaging over uncertainties rather than using point estimates, the option of using models with large numbers of parameters, etc) – the only problem is these may not be of much value in particle physics applications either.

July 23, 2013Joseph

link • author

Oh I see. I wouldn’t take it as given though that particle physicists can’t find tight bounds on the neutrinos mass. Maybe they can. Maybe they already have.

And please don’t take this as exhausting every benefit. In this post

http://www.entsophy.net/blog/?p=53

I showed in a simple example how having one measurement more accurate than the others can really screw things up for Frequentists. The Bayesian Credibility interval is obviously correct and happily contains the true value, while the Confidence Interval is 3 billion times wider but still missed the correct value (at the same alpha!).

In that simple example, I would expect a Frequentist to spot the phenomenon and correct for it in some ad-hoc fashion. But in many real and complicated physical models, there are a multitude of parameters, with an enormous number of measurements, from a variety of sources, and it would be practically impossible to find these kinds of problems.

The fact that Bayesian methods avoid them automatically then is essential in practice.

July 23, 2013konrad

link

Sure, but your example features a Straw Frequentist. The problem statement specified measurement accuracies which you took into account in the Bayesian analysis (instead of selecting a simpler Bayesian analysis such as one assuming constant measurement accuracies). A real frequentist would likewise start with an analysis which takes those accuracies into account – there is no need to first spot a phenomenon. I would hope that a natural aversion against discarding information is also present in most frequentists.

July 23, 2013Joseph

link • author

Huh? the Bayesian used the exact same distributions as the Frequentist. They both took into account the exact same things.

July 24, 2013konrad

link

I mean the frequentist estimator which does not take the measurement precision into account.

July 24, 2013Joseph

link • author

That was my point. A Bayesian stumbling into the problem automatically gets the right answer without special notice. A Frequentist stumbling in the problem, with the same distributions, gets it horribly wrong.

They have to first detect the problem, which is easily missed if you’re not paying attention, especially since there aren’t any mistakes per se in the Frequentist’s calculation, and then find an estimator which corrects the problem. Both of these are doable in this example, because it’s very simple and there’s a sufficient statistic.

In a real physics based statistical model, the complexity guarantees both that it’s difficult, if not impossible, to recognize the problem and that there’s no sufficient statistics available when you do.

July 25, 2013konrad

link

But in this example (a bunch of measurements, each with known precision and not all precisions equal) it is _not_ true that a frequentist would pick an analysis that ignores half of the available information, then look at the results to decide if a problem crops up.

Let’s say you make a spreadsheet containing all the measurement and precision values and send it off to your favourite frequentist analyst. An unthinking frequentist may start plugging in numbers into the naive estimator you proposed, but will realise _before calculating any results_ that the estimator fails to ask for half of the numbers provided in the spreadsheet. Bayesians do not have a monopoly on the “use all available information” guideline, especially when (as here) the information is already encoded numerically.

Alternatively, we can imagine that the precision values are hidden, and only available to analysts who think of asking for them. But then the Bayesian analyst is just as likely as the frequentist to suppose that measurement precisions are unknown, and make the parsimonious assumption that they are equal.

July 25, 2013Joseph

link • author

Imagine a situation where you’ve got a complicated stat/physics weather model. One of my old thesis adviser’s research was just such a massive physics based Bayesian weather model.

Now imagine you have an enormous number of inputs from various different sensors. Each comes with their own uncertainty which causes both the Bayesian and the Frequentist to use the same distributions.

But it’s quite possible that some of those sensors imply certain quantities are known very precisely, while other sensors are highly uninformative about that same quantity. This can easily happen in such a way that you’d never know this fact without doing a specific, lengthy and difficult calculation.

Typically neither the Bayesian or Frequentist would have time to perform this calculation, especially since it would have to be done for every quantity, and they would just get on with the problem. The Bayesian wont run into trouble because Bayes theorem will automatically detect the more accurate inputs and effectively use those, but if the Frequenitst uses simple averages, which they’re very liable to do, they’ll get a bunch of crap.

To make matters worse, even if they magically did detect the problem, there wouldn’t be any sufficient statistic which could be used to replace the simple average!