Statistics is full of old and difficult ideas. It’s time for something new and simple. Well, it’s not actually new, but it will seem that way to most. The story begins with the physicist Max Planck over a century ago.
Planck’s 1912 summary of his researches on Black Body Radiation included a chapter titled “Probability and Entropy”. This chapter had a specific purpose. Previously statistical ideas had been applied to gases where the problem was to use known functions of the microstate, like the energy, to predict other functions of the microstate. But the Black Body Radiation problem was physically and mathematically a different beast. Technically it involved using functions of the amplitudes in the Fourier expansion of the Electric field to predict other functions of those amplitudes (fields).
Nevertheless, Planck wanted to use those statistical ideas from thermodynamics to solve the Black Body Radiation problem. To do so he had to first clarify the universal and extra-physical nature of Statistical Mechanics. That was the explicit aim of the “Probability and Entropy” chapter.
It’s easy to see how Planck succeeded, because despite the differences, they are abstractly the same kind of problem. We don’t observe some directly but know some and wish to predict some other .
Planck’s solution is simplicity itself. To illustrate consider the x’s compatible with the observed value of f:
Now within that domain, compute the number of x’s compatible with each value of g:
Then to predict just choose the value which maximizes . Graphically the situation is as follows:
If is close to as shown in the picture, then there will appear to be a functional relationship connecting and . This relation will seem to the observer like a “law of nature”.
All that’s needed to make this law a reality is for to be among that majority of V associated with . That’s it. There’s no requirement that multiple occur equally often among V. There needn’t even be more than one . Rather success depends entirely on a simple concrete criteria involving the one that actually exists and everything else is irrelevant.
Notice too that the law (1) will appear stable even if is jumping around as a function of time. This is easy to see from the illustration below:
Again, the only thing we need to make this true is that stays within that majority region. In no way is it required that they fill up V, or occupy each point of V equally often in an ergodic sense.
Obviously, to make this work it helps to have as large as possible. For this reason it’s natural to use the following ratio,
as a measure of the “strength” of the law expressed in (1).
If p happens to be close to 1 it will create a “separation” between law (1) and whatever natural process determines . No matter how crazy the physics governing is, it’ll be hidden from any observer only looking at (1). Indeed, it’s precisely this “separation” phenomenon which creates the appearance of distinct branches of science like physics, chemistry, and biology.
There is an exception to this though. If overlaps with the part of V where , as in the picture below, then the law (1) will appear to fail. If observed, this provides us with quite strong information about the nature of and amounts to a kind of statistical learning.
Planck doesn’t mention this “learning” directly, but it’s hard to believe he wasn’t aware of it since the greatest scientific example of it was the discovery of Quantum Mechanics, and Planck’s research was the key step in that discovery.
This may not look like the foundation of statistics. It’s both simpler and very different in outlook than anything you’ll see in a Statistics class. But I assure you a generalization of it easily serves as the foundation for every successful application of Statistics ever made and leads to quite a few new ones.