A previous post showed how diffuse distributions on a space can be used to estimate functions which aren’t sensitive to . This is the essence of statistics as evidence by the ‘s found in coin tossing, election prediction, Statistical Mechanics, and error statistics. But while being able to easily predict is handy, sometimes we’d rather observe and learn something about . Unfortunately these goals are in tension, which is why “data science” is inherently limited.
In the coin tossing example, is the frequency of heads while . So while we can confidently predict without taking a single measurement, actually observing to be within this range tells us basically nothing.
To see this consider the size of the spaces involved. Using logs for convenience, , while the set gives . The subspace is very nearly all of .
It’s for this reason that believing “ proves something about a completely fictitious frequency distribution on ” makes about as much sense as saying:
According to the theory of magic fairies, the outcome of a dice throw should be either a 1, 2, 3, 4, 5, or 6. Since I actually got a 5 when I rolled the dice, the prediction is confirmed and magic fairies exist. They’ve been objectively verified!
Fundamentally then the following three observations are related:
Many things work this way. Gases diffuse because almost any microscopic happenings lead to diffusion. This fact allowed physicists to understand diffusion before they understood the atomic realm, but it also means observing diffusion provides no hint of Quantum Mechanics.
Power laws are a similar phenomenon. Under mild conditions almost anything that could happen will appear to satisfy a power law. Actually observing a power law tells you almost nothing about the underlying physical reality. Mandelbrot style efforts to unlock the secrets of the universe by finding power laws everywhere couldn’t be more misguided. They’re more akin to numerology than science. (the 1/f noise described here as almost mystical, is a similar example. The giveaway is that this noise occurs in many physically different systems).
Having said all that, you do actually learn a little something from observing a power law: namely that those “mild conditions” are satisfied. Jaynes famously exploited this to extract information from frequencies obtained from throwing a dice 20,000 times. His procedure was to compute a theoretical entropy based on various physical effects and to keep adding effects until dropped down to the empirical entropy .
But once , Jaynes had to stop. The empirical entropy serves as a kind of barrier to learning because of the Entropy Concentration Theorem. Basically, almost anything that could have been true at that point leads to frequencies indistinguishable from the ones observed. Once anything that can happen gives the same result, the result tells you nothing.
In the end Jaynes determined which pair of faces on the dice were cut last in the manufacturing process. It’s a justly famous inference, and if this is the one thing you need to know to make millions, then you’re in business. If you need to know anything else about the special circumstances, or laws of nature, which nudged each roll in one direction or another, then you’re out of luck. You can never get it from those frequencies.
What this means for science is that those fields dependant on extracting patterns from frequency type data, which is the majority of statistical applications today, have a long term problem. Once they’ve harvested what little information is available like Jaynes did with the dice, they’ve either got to (1) find a better way of doing science (2) find novel data, or (3) stagnate. Evidence suggests they mostly do (3), occasionally do (2), and rarely do (1).
Finance is a prime example. What meager information about the universe there was in the histograms and correlations of price movements was learned long ago, and despite innumerable man-hours, computer time, and ever more sophisticated statistical methods, the field hasn’t budged in predictive ability since. You might even say it’s gone backwards, since the foundational theories taught since the 70’s are now admitted to be false.
Such is the nature of Data Science. Fortunately it wasn’t around in Newton’s time or physicists would still be baffled by coin flipping.