The post “What do we need to Model?” showed what our goal in modeling errors should be. This one shows how it’s achieved. Assigning a distribution to the fixed parameters is like finding a prior ; it’s successful whenever the ‘s high probability region accurately describes where is located in space.
Fortunately it’s not our job to model the frequency of measurement errors. Statisticians usually conjure up error distributions without physically checking anything, but if they did check, they’d discover error patterns aren’t stable. Errors depend on factors outside of the measuring instrument itself, and those factors change. Even in processes designed to yield stable patterns, they don’t remain stable – just ask anyone making a living in Quality Control.
Our job is merely to pin down the fixed parameters as much as possible. But this brings us to the crux of the problem: What do we really know about ?
When last I taught physics lab the students were told wooden rulers like the one above had an accuracy of about . Better rulers were reasonably accurate to about 1/5 to 1/10 their smallest division. Digital calipers came with a known accuracy from the factory, which was usually the smallest significant figure on display. So given accuracy and the knowledge that we took careful measurements using calibrated instruments, the one true thing we know is:
lies in a hypersphere in for some reasonable radius proportional to . Any distribution whose high probability manifold corresponds to this hypersphere would do in practice, but it’s worth considering a maxent approach. Given the state of knowledge (1) we’ll use a distribution which maximizes the entropy constrained by:
The result is . The “IID” results from the symmetry evident in (1) and (2). Maximizing the entropy expands the high probability region as much as possible, so even if (2) weren’t symmetric the maxent solution would try to get as close to IID as it could. The form of this distribution in no way implies causal or frequency “independence”; it just provides a big symmetric region needed to locate .
Since IID has nothing to do with being “IID” as commonly conceived, it’s easy to find examples that’ll leave Frequentist’s head spinning. Take these current and future errors provided by a rogue laboratory ruler with initial accuracy and :
Their sampling distribution definitely isn’t IID , and yet if you use this model you’ll get a 95% Bayesian interval which correctly implies . This success shouldn’t be surprising. However horrendous that model is as a frequency distribution, it is a good Bayesian “prior” which accurately locates . The interval calculation shows that 19 out of every 20 potential lead to . Far from being surprising that it worked, it would only be exceptional if it didn’t!
This also puts to rest a mystery noticed by every thoughtful statistician since Laplace. To quote from Jaynes (page 198)
In the middle 1950s the writer [Jaynes] heard an after-dinner speech by Professor Willy Feller [Frequentist], in which he roundly denounced the practice of using Gaussian probability distributions for errors, on the grounds that the frequency distributions of real errors are almost never Gaussian. Yet in spite of Feller’s disapproval, we continued to use them, and their ubiquitous success in parameter estimation continued.
NIID assumptions work better than Frequentists expect because the state of knowledge in (1) is far more realistic, modest, knowable, and just plain true than any fantasies about long range frequencies. Anyone who understands that probabilities aren’t frequencies can use (1) to get results inexplicable to those not in the know.