The Amelioration of Uncertainty

The Law of Trading Edges

Andrew Gelman recently ran a post title “Why waste time philosophizing?” My answer is that different philosophies dramatically affect how, and how well, we get answers from probabilities. This is illustrated with an example from Finance.

Suppose a Frequentist aims to trade the market next week. With a crystal ball they’d know the actual market behavior is:


Not knowing this, they collected up-down data going back a thousand weeks. Historically the market is up 50% of the time regardless of the weekday, so they create the model


A more perceptive Frequentist conditions their historical research on the previous weeks behavior and determines the market will only be up two days next week. So they use an improved model


Both are objective and well calibrated models of the “data generating mechanism”. No one would argue the Frequentists failed to get this right and indeed much quantitative market analysis is a version of their approach. But now a Bayesian comes along brazenly using the following:


Clearly the Bayesian will make a lot more money next week than either equation or equation, but the Bayesian’s probabilities aren’t the frequency of anything, and would be dismissed as subjective by Frequentists. Eager to resolve this conflict Frequentists could equate equation to the fraction of alternate Universes in which the market goes up. For many this is enough to make equation respectable. Respectability though is in the eye of the beholder, and skeptics are mindful of the shocking lack of data from other universes.

There is a better way to understand the objective content of equation. Although next week’s market moves seem like a “random variable”, the truth is they’re a singular event, unique to a given time and place; never to be repeated. Frequentists are flummoxed by the probabilities of singular events, but Bayesians deal with fixed and non-repeatable events all the time. A Bayesian happily considers the probability distribution for the speed of light or the probability Obama will win the 2012 election. All we need do is understand equation in the same way.

equation works because it satisfies two objective conditions. These are expressed using the joint distribution equation where equation is the 5-tuple describing next weeks markets movements. For example equation.

The Truthfulness Condition says the distribution should only imply true things about equation:

(1)   equation

Not every distribution is truthful in this sense. equation, equation, and equation satisfy this condition at the equation level, but equation does not. It wrongly implies the market is down all next week.

The Information Condition requires the distribution to be informative about equation:

(2)   equation

Here is where equation improves on the others. Using Boltzmann’s insight that equation is related to the entropy equation we can determine which distribution is most informative using equation:


Which explains why the Bayesian model was so much more useful. equation and equation weren’t wrong it’s just that the Bayesian, freed from the ideological constraint of making all probabilities equal to some frequency, found a distribution so informative that equation was 30 times smaller than equation.

These considerations lead to the Law of Trading Edges:

The trader has an edge to the extent they can find a low entropy distribution covering the trading period whose high probability region contains the true price sequence. As long as these conditions are satisfied, it’s irrelevant to their performance over that period whether the distribution equals the frequency of anything.

One strategy for satisfying the Truthfulness Condition is to find the region equation where equation‘s have fallen in the past and create a probability distribution so that equation. If the future resembles the past, then equation will again be in this region. This is how equation and equation were found. Indeed it’s pretty much the only strategy Frequentists use since they bizarrely believe anything else is subjective.

An even simpler, but more robust strategy would be to enlarge equation so much it can’t help but contain equation regardless of how the past relates to the future. Maximizing the entropy equation often works this way when the question at hand doesn’t need a highly informative distribution. Applying this simple strategy will be the subject of the next post.

February 16, 2013
  • February 16, 2013Andrew Gelman

    I don’t know enough about this example to really follow it, but I will say that savvy frequentists can be as Bayesian as they want in any problem by defining it as a prediction problem. In the frequentist world, “predictive quantities” have distributions, but “parameters” do not (hence, an expression such as “p(y|theta)” is, strictly speaking, meaningless for them in the Kolmogorov world in which p(a|b)=p(ab)/p(b)).

  • February 17, 2013Joseph

    Indeed Frequentists can view it as a prediction problem, but they are still interpreting every distribution in terms of frequencies. My point was that this “frequency interpretation” had precisely nothing to do with the success or failure of the predictions. The real driver of success was whether the distribution satisfied the Truthfulness and Information Conditions.

    Once that point is seen, the statistician is free to work with distributions which have no plausible connection to the frequency of anything, but which do satisfy these conditions and hence are predicatively successful.

  • March 16, 2013Daniel Lakeland

    The Truthfulness Condition *is* a statement about frequency when the experiments are reproducible though, we have Truthfulness in a model when the conditional probability is close to the conditional frequency. In other words when the Bayesian posteriors are pretty well calibrated.

    F(A ; P(A|Knowledge)) ~= P(A|Knowledge) when A is reproducible.

    if you understand my bizarre notational meaning after all the p value hoo-ha over at Gelman’s blog. ;-)

  • March 18, 2013Daniel Lakeland

    I should point out that by “reproducible” I mean in the general sense, not in the sense of resetting every atom in the experiment. Also, I think this makes “Truthfulness” a property of the model, not of an individual prediction.

    If I predict a coin flip as p(head | info) = 0.5 when the info is pretty much anything, except when info includes some specific important information (say a series of high speed photographs of the first few seconds of the coin flip) when p(head | info*) = something else depending on the specifics of “info” then I need to wait until I’ve got that kind of info quite a bit and look at something a bit complicated to determine whether I have truthfulness.

    Namely, I want to see F(H | p(H|info*)) = p +- dp for all the p bins that I create my F histogram from.

    Maybe I’ll write a blog post on this issue. It’s the kind of thing people don’t quite talk about enough.

  • April 1, 2013Joseph

    “The Truthfulness Condition *is* a statement about frequency when the experiments are reproducible though”

    I understood you to mean macroscopic reproducibility as you explained and not microscopic reducibility.

    In general though, it is not a statement about frequency when the experiments are reproducible. Suppose we have P(A | knowledge), but unknown to us A is a constant of the motion of the universe and hence never changes. If the first measurement of A lies in the high probability manifold of P, then trivially, it will do so for all measurements after that. So P always satisfies the truthfulness condition. Yet P(A| knowledge) will not resemble the frequency distribution of A.

    A realistic example of A might be “the speed of light”.

    Fundamentally, P(A_1 | knowledge) is not a statement about what would happen if the experiment is reproducible. To answer that question requires a new analysis and a joint probability distribution P(A_1,…,A_n | knowledge). The joint distribution is not uniquely determined by the marginal distribution P(A_1| knowledge ). Moreover, the success of the joint distribution P(A_1,…,A_n|knowledge) will be determined, just as in the single case, by whether the true sequence A_1,…,A_n satisfies the “truthfulness” condition with respect to the joint distribution.

    The real point of the post though, was that having well calibrated probability distributions isn’t even close to being necessary for success. The real driver of success is the “Truthfulness Condition” and “Information Condition”. Once that point is seen, we are free to use whatever creative or clever method we can dream up to ensure these two conditions are satisfied – even if those methods lie well outside current statistical practice and understanding.

Leave a Reply or trackback