A probability distribution corresponds to an urn with a potentially infinite number of balls inside. When a ball is drawn at random, the “random variable” is what is written on the ball.
for situations where no such “urn” or “population” existed. To explain why requires an answer to the question: when will data appear to be drawn from a frequency distribution ?
The answer is given by the Entropy Concentration Theorem. If results from maximizing the entropy , subject to constraints
Then almost any data which satisfies
will look approximately like .
If a physical system imposes (1) on the data, then each new data set will appear as though they’re “random draws” from the same “population” . Of course, there is no randomness here; it happens because almost every possibility results in that outcome. Almost no matter what special causes produced , as long as (1) is satisfied, the data will likely fool anyone ignorant of the Entropy Concentration Theorem into thinking it’s a “random draw” from some infinite magical urn.
This is not a rare phenomenon. Since all the common distributions, Normal, Uniform, Poisson, Gamma, Binomial and so on, are maximum entropy distributions subject to just such constraints, this effect covers most of the undergraduate curriculum in statistics. This is why so much of common statistics is tied to “average” type estimators like (1).
This also explains how deterministic algorithms can produce “random numbers”. To get a “random” sample from N(0,1) all you need do is create any sequence which satisfy and . Your work is mostly done since virtually any such sequence will look approximately like a N(0,1), and the fact that the sequence isn’t “randomly” generated, whatever that’s supposed to mean, is completely irrelevant.
The original sin of Frequentist statistics is to misinterpret this special case and then insist that all statistics be so misinterpreted. In most real social sciences applications for example, the next observations will not satisfy (1) again. The future doesn’t resemble the past that way very much. That’s the chief reason p-values and CI’s get things so horribly wrong in the social sciences.
From a Bayesian perspective, frequency distributions like are just a fact, a one-off event, to be observed, predicted, or inferred as desired, no different from any other datum. That we can sometimes predict these well is explained in essence in the posts Noninformative Priors and Data Science is inherently limited. Basically the Entropy Concentration Theorem shows the mapping is suprisingly insensitive to the inputs whenever they satisfy (1). Being otherwise ignorant about those inputs thus doesn’t stop you from accurately estimating .
It would be a kindness if Statisticians stopped fantasizing about mystical forces called “randomness” governing phantom “populations” drawn from fabled “urns”, and just got back to the real world.