The Amelioration of Uncertainty

## The odds of choking from atmospheric fluctuations

This is the first of a two part series about Bayes Theorem and the Second Law of Thermodynamics. It begins with the question:

What are the odds that one hour from now, the diffuse air in the sealed and insulated room I’m in will position itself in the half where I’m not, thereby causing me to choke?

The air’s microstate is a point in formed from the position and momentum of each particle . The microstate’s details are unknown, but we do know it’s volume and energy . This confines the possibilities for to some region as shown below. Within that region, there’s subsets of “bad” states that will evolve into states which only take up half the room. The probability we seek is the chance that the air is currently in .

Using Liouville’s Theorem and the entropy of an ideal gas,

where depend on the gas, we get, assuming a room at 1 Atmosphere and T=25 degrees Celsius,

It’s clear why this never happens: real gases are one of those majorities and not one of the exceptions. In this sense physical systems don’t make “transitions” whenever which is just the Second Law of thermodynamics. Although a better way to express it is “the odds are against a state in winding up in whenever “.

Probability distributions are introduced solely to define regions and to count states. In this case the true state has to be in region consistent with the measured energy where . This implicit definition of can be made explicit by maximizing the entropy functional subject to the constraints:

The second constraint can usually be dropped for reason given in the next post. The region of interest is then .

These probabilities are not the frequency of anything. They describe our uncertainty about and are similar to prior probabilities for fixed parameters. The frequency of states under repeated preparation, or of a single system over lifetimes, are interesting quantities, but they’re separate from reasoning about the air currently in my room. It’s likely effects not considered would confine states to a small subset of contained entirely in that majority, so we’d never observe an exception even if we could repeat it times, which we can’t. The frequency interpretation is not known or likely to be true, not needed, and impossible to verify.

This understanding is associated with Jaynes (see references), but is quite old. Plank expressed the same views in a classic 1913 work. Planck saw entropy as a general tool not limited to physics which allowed us to take partial knowledge (functions) of the true state and find conclusions implied by almost every state consistent with that knowledge. The only thing needed to make those conclusions a reality is for the true state to be one of those great majority. There is no mention of frequency/ergodic interpretations and he even carries out an explicit maxent construction like above (section 139). As proof of the generality of these ideas, I used Planck’s reasoning in this post to show why IID Normal assumptions work much better than Frequentists think they should.

So now imagine there is a Bayesian who takes as a prior probability and observes aspects of the air as time progresses. With each measurement the Bayesian updates the distribution producing a sequence of regions which shrink around the true state . The Bayesian creates a decreasing and concludes won’t overlap with at all, so there is no chance of asphyxiation.

The Bayesian’s results do not contradict the Second Law calculation either as an inference or in it’s physical implications.

As an inference they are both perfectly valid. The Bayesian’s additional knowledge allows them to place in a “funnel” in time+phase space smaller than the “cylinder” used in the Second Law calculation, but they are both accurately locating the truth path in that space. So while they both reach the same no-choking conclusion, the Bayesian does so with less uncertainty, rather than , but this merely reflects the truism that greater knowledge leads to less uncertainty. If you lacked the Bayesian’s additional measurements, the Second Law inference is still valid.

Nor are various entropies in contradiction. Although both calculations use they reference different , which are functions of different variables and have different numerical values, and are being used in slightly different ways.

They agree physically as well. Either analysis predicts the mass density , which is proportional to the frequency of particles in each small volume, will maximize the empirical entropy and diffuse out to a uniform density. Just as in the last post, the decreasing in no way contradicts increasing or our intuitions about increasing disorder.

So is compatible with . The implications of this for thermodynamics and statistical inference will be sketched in the next post.

REFERENCES:
Jaynes (1963) Information Theory and Statistical Mechanics
Jaynes (1965) Gibbs vs Boltzmann Entropies
Jaynes (1967) Foundations of Probability Theory and Statistical Mechanics
Jyanes (1988) The Evolution of Carnot’s Principle
Jaynes (1998) The Second Law as Physical Fact and Human Inference

September 10, 2013
• September 10, 2013Joseph

All,

This is my sincere attempt to answer Cosma Shalizi’s questions from the last post, without poking Frequentists in the eye.

This understanding radically changes the nature of non-equilibrium Statistical Mechanics and much else besides. I hope to give some sense of that in the next post.

Regards,
Joseph

• September 10, 2013Daniel Lakeland

So, I think you’ve shown that you can use the maximum entropy formalism to construct measures on the phase space constrained by the energy and its uncertainty (which you say we can usually ignore) that give the right answers. You’ve also shown that if you obtain information you can construct Bayesian measures that give the right answers with less uncertainty. I think the only question left is in what specific sense does the statement “entropy tends to increase in time” make sense? ie. for what general procedure for calculating a measure and computing its entropy result in that statement being true?

Hopefully that’s what you’re going for next.

• September 10, 2013Joseph

Daniel,

I did show that in the post. Sometimes it make sense to say,

Other times it makes sense to say is increasing in time. But these are just special cases, and trying to generalize them to much is misleading and often wrong.

A far more general way to look at the problem, in line with Planck, Jaynes and few others, is to simply ask “what conclusions are implied by the vast majority of states compatible with what’s known?”.

Sometimes in answering this question you’ll get an “entropy” which increases, and sometimes you won’t. Sometimes you’ll get multiple “entropies” doing different things in the same problem. So the key is to stop taking “entropy increasing in time” as the foundation of the subject. The real foundation is the question in the previous paragraph.

Doing so opens up a whole world of opportunities which can never be reached by trying to generalize “entropy increases in time”.

• September 10, 2013Daniel Lakeland

Got it. I think that clarifies your point.

• September 11, 2013Brendon J. Brewer

Nice explanation, very clear.

One thing I’m not totally comfortable with on the technical side is why knowing the energy constrains the expected value of your probability distribution, leading to the canonical distribution. The microcanonical distribution seems a much more reasonable description of “knowing the energy”, although it leads to harder mathematics and pretty much the same predictions.

• September 11, 2013Joseph

Brendon,

Suppose our knowledge of the actual energy can be expressed by saying

is somewhere in the high probability manifold of

Given this state of knowledge and the form of the function what distribution does this induce or imply on phase space?

If you think about it long and hard, you’ll get the answer in the post with replaced by and replaced by . That we can drop the later constraint is due to the fact that the first constraint is already enough to get a descent answer to questions of interest and including the second constraint wouldn’t improve those answers appreciably.

• September 11, 2013Brendon J. Brewer

“Given this state of knowledge and the form of the function equation what distribution does this induce or imply on phase space?”

A normal mixture of microcanonical distributions That’s the MaxEnt result with the constraint on the marginal distribution for the energy, not just the expected value of the energy.

This might be very well approximated by the canonical distribution if sigma is exactly right (it will need to be very small). If sigma is large then no single canonical distribution will work, but a mixture could work very well.

• September 12, 2013Corey

Jaynes has a very nice and clear explanation for why Liouville’s theorem is compatible with the Second Law. From the macrostate at , we get a maxent distribution on phase space. We transform it according to the microstate time evolution rules (Newton or quantum) to and make a prediction of the macrostate. Because we included all of the physically relevant info in the definition of “macrostate”, the prediction is accurate. Liouville’s theorem then implies that if we run maxent again at the macrostate, the associated entropy must be at least as large as that of the macrostate.

A curious feature of the above reasoning is that it holds for both < and < . I have a fuzzy, incompletely thought-out notion about why it is nevertheless correct…

Pedantic semantics: you mean "suffocation" rather than "choking"; the latter means obstruction of the airways, while the former is what happens when one lacks access to fresh air — perhaps due to choking.

• September 12, 2013Corey

Jaynes has a very nice and clear explanation for why Liouville’s theorem is compatible with the Second Law. From the macrostate at , we get a maxent distribution on phase space. We transform it according to the microstate time evolution rules (Newton or quantum) to and make a prediction of the macrostate. Because we included all of the physically relevant info in the definition of “macrostate”, the prediction is accurate. Liouville’s theorem then implies that if we run maxent again at the macrostate, the associated entropy must be at least as large as that of the macrostate.

A curious feature of the above reasoning is that it holds for both < and < . I have a fuzzy, incompletely thought-out notion about why it is nevertheless correct…

Pedantic semantics: you mean "suffocation" rather than "choking"; the latter means obstruction of the airways, while the former is what happens when one lacks access to fresh air — perhaps due to choking.

• September 12, 2013Corey

Rather than saying, “why it is nevertheless correct,” I should have said, “how it can be repaired.”

• September 15, 2013Joseph

Corey,

It’s never a problem when Canadians defend the English language from the ruffians and cowboys to the south.

Since I used Liouville’s theorem to derive the second law, I obviously don’t see an incompatibility. The key is to drop the idea of a Physical second law altogether and work with a purely inferential one. Probably a slight majority of the great physicists saw it that, while the notion that liouville’s theorem is incompatible with the second law because of is more common among rank and file physicists. I think people like Jonny von Nuemann for example understood exactly what was going on.

A more interesting question is to turn things around. Since empirically we don’t see air accidentally bunch up on one side of the room and choke people, how much evidence does that provide for Liouville’s theorem or its quantum equivalent at the microscopic level. Answer: basically none.

Liouville’s theorem could be hugely wrong, and you would still get a second law as described in the post. For example if you used it wouldn’t change the result. Which is a good thing because it’s probably wrong in reality. No matter how much we try to make a closed system, the particles will be subject to constantly changing stray influences and forces of various kinds which make the true potential they experience non-conservative and very weird.

So if we ask the question “given that we start out in were might we reasonably end up at , the answer is not going to be the implied by Liouville’s theorem, it’s going to be a set big enough to take into consideration all those unknowns in the potential.

You’ll also get the same effect if you try to retrodict the past state rather than predict the future state. If you want to know what set at the state could have been in to get you’ll need a set bigger than what Liouville’s theorem implies. So given that the state is in now tells us that future and past states were somewhere in a kind of double cone which expands both into the future and into the past.

Again my answer to Daniel above is key to all this. “Entropy increasing in time” is not the foundation of the subject. Fundamentally, this subject is the same as the rest of statistics: identify sets which contain the true state by any method that you can, and then count possibilities over those sets to identify (as Planck stated) conclusions which are true for almost every possibility.