While working on the definition of probability post, I saw Gelman’s advertisement subtitled “Can we use Bayesian methods to resolve the current crisis of unreplicable research?” No doubt Gelman has reasonable and constructive points to make. So let me be unreasonable: Bayes has within it the capability to destroy Frequentist methods when it comes to reproducible science.
In the usual view a distribution is the shape of a histogram of many x’s observed for fixed . This restricts us to cases were many x’s are possible, but even worse, it presupposes there is a stable shape to the histogram. This is an extraordinarily strong physical assumption usually, which is usually wrong.
Thus right from the outset Frequentism works against reproducible science. It’s likely most statisticians of Gelman’s type think Bayesians can tweak this picture a little, help out here and there, but not fundamentally challenge it. There is an alternative though.
That can be thought of as merely a way to locate those x’s compatible with . Loosely speaking, any x in the high probability manifold is consistent with the parameters. If we’ve modeled this right then the we actually see will be there as well.
This is meaningful even if only one ever exists, which allows us to create distributions for one-off events. But it allows far more than that. Suppose we have an which has very small variance over . Then the expected value of this function,
has an interesting meaning under this interpretation. The small variance means almost all x’s compatible with lead to . Since is one of those x’s compatible with the observed parameters, we’ll find,
In other words, there will appear in the laboratory to be a functional relation or law of nature connecting the inputs to the outputs of this analysis:
But there’s more. The small variance is predicting this relationship will be highly reproducible. If we repeat this experiment using the same parameters then we’ll get a different . But this new value will still be in the high probability manifold, which means we’ll usually get .
But there’s still more. This reproducibility usually holds even when the histogram of all those doesn’t resemble at all. It can be wildly different and unstable in fact. The only requirement is that the histogram stay inside the high probability manifold. Most such wild deviations of the histogram from the distribution will make (2) more reproducible not less!
That’s the fundamental technical fact most statisticians just can’t seem to digest and it’s retarded applied statistics more than any other single failing.
Strictly speaking though, (2) is just a prediction. The relation holds for most x’s in the high probability manifold but not quite all. You might even say the law it represents is “highly probable” rather than required. If you check it in the laboratory, you’ll either verify it or you’ll discover something even more important.
Any consistent failure of (2) means those ‘s are being confined by Mother Nature to a very small set of exceptional cases in the high probability manifold. So you’ve just discovered an important new physical effect. By incorporating this new effect you’ll get still better which allows accurate predictions of even more reproducible relationships .
Bayes is the most powerful tool we have for the prediction and discovery of reproducible results. All you need do to exploit it is to forget you were ever told probabilities are frequencies.