Most Statisticians think of sampling distributions as a kind of physical model for an infinite sequence of events. They view priors on the other hand as something different since they can be assigned to one-off hypothesis like “Republicans will win the 2016 election” or “A meteor wiped out the dinosaurs”. This conventional wisdom gets it wrong however, since all distributions describe the probability of one-off events.
We always wish to know some which is a fact about our universe, unique to specific time and place, never to be repeated. Our model and the size of it’s high probability region is a really a reflection of how well we can pin down.
To make money trading stocks in September 2013 for example, you don’t need a model describing the infinite sequence of more or less “random” stock prices. You just need to know the price of stocks in September 2013. The better you know them (i.e. the smaller the high probability region for that month) the more money you’ll make.
If you’re sailing a boat across the Atlantic from 1 Aug 2013 to 10 Aug 2013, you don’t need to model the “random variable” that is the weather as though you were describing the abstract sequence of weather data, on a abstract planet, over an arbitrary time period. You just need to know the weather in the Atlantic from 1 Aug 2013 to 10 Aug 2013, and the less uncertainty you have the safer you’ll be.
If you’re trying to measure the length of a table and obtained data using your laboratory’s ruler on 1 August 2013, then you don’t need to model the propensity of the ruler to generate errors out to infinity. You just need to know the actual numbers contain in the data you actually collected. Knowing even one of those numbers well tells you to a high degree of accuracy, making all other knowledge about the ruler and it’s errors completely irrelevant.
If you’re trying to predict the outcome of the 2012 election, you don’t need a model of how many times Obama would have won in multiple copies of our Universe. You just need to know the actual vote on 6 November 2012 among the space of all possible votes: . The more you can use polls or other information to shrink around the true the better you’ll be at predicting the outcome.
If I wish to predict the percentage of heads in the 100 coin flips I’m about to make at noon on 8 Aug 2013 then I use to describe what I know about the future . Using to predict is no different in principle than making inferences using Pr(“A meteor wiped out the dinosaurs”). Whether that distribution could be used to predict other ‘s observed on a different day is beside the point. The outcome I’m trying to predict is not one element of a some mystical “population”. It’s a physical fact like the current temperature of the room I’m sitting in or whether the dinosaurs were wiped out by a meteor.
In only one instance can the “random variables” mythology be maintained. If we’re concerned about two sequences and which are described using and then sometimes it makes sense to use a common distribution . Such models allow us to imagine that and might have been “drawn” from the “population” . These models tend to be low information/high entropy special cases since the high probability region of must be big enough to include both and .
This is just a special case though. The great sin of Frequentist statistics is to force-fit all statistics into this example. But even here, we really are only interested in and each viewed as one-off events. And if there’s any asymmetry in our information about them we can get better results using models thereby breaking the “random variables” illusion.
In reality, all probabilities have the status of those devilish priors which Frequentists want excised from statistics. All probabilities are for one-off events.
UPDATE: I changed “singular” to “one-off” based on a suggestion by Corey since it really captures what I meant better.