The Amelioration of Uncertainty

Classical Statistics really is screwed.

It’s believed the crises in science will abate if we only educate everyone on the correct interpretation of p-values and confidence intervals. I explained before in this long post why this isn’t true. Below is a summary.

Two technical points help explain the issue. First,

Data equation will always appear to be an IID draw from some distribution equation

The maxent construction in the last post shows why. We can just keeping adding constraint functions equation until the maxent distribution equation matches the empirical histogram equation.

If I secretly constructed such a equation for the data and handed it to you, then it will appear to be a good model for equation by whatever Frequentist tests you care to dream up.

Which brings me to the second point.

Future data rarely satisfies the same constraints as the test data.

Suppose the equation model created from equation had been created using the constraint

(1)   equation

Then, by the Entropy Concentration Theorem , the condition that the next data equation appear to be an IID draw from equation is (basically) that it satisfies (1) with the same G(x) and approximately the same g.

There are cases where you can expect such a thing. But in general those equation‘s are partial snapshots of our Universe at different points in time, and when you put them together they don’t consistently satisfy any teleological constraint like (1). Most physical, biological, economic or psychological systems evolve in their own complicated way without regard to the overarching scheme in (1) and only accidentally satisfied it the first time.

So we can always fool ourselves into thinking we’ve modeled some “data generation mechanism” and we’re usually wrong.

The original sin of Frequentism was using a special type of problem as the template for all of statistics. The tragedy is they picked a special case that’s barely useful and easily deceives.

So you can fiddle-fart around with the nuances of p-values and null hypothesis all you want. Statistically ‘sound’, but non-replicable, results will remain the norm.

November 12, 2013
37 comments »
  • November 12, 2013Brendon J. Brewer

    Once you understand the correct interpretation of p-values and confidence intervals, you will clearly see why they are not quantities that anybody should be calculating.

  • November 12, 2013konrad

    Brendon: that wasn’t really the point of the post. The point of the post was to claim that real systems often create the illusion of having generated data under strict constraints, when those constraints are in fact only satisfied accidentally. In particular, the commonly made assumption that constraints persist over time can be problematic in many cases.

  • November 12, 2013Brendon J. Brewer

    You’re right, I was only responding to the first paragraph sorry. I have a very short attention span.

  • November 13, 2013Joseph

    Konrad,

    I considered replacing the post with your comment since it says the same thing and only takes a paragraph. However I’ve found that including the tiniest smidgen of mathematics in a post which isn’t typically found in stat courses keeps the knuckleheads away. Kind of like building closets out of cedar as a natural way to protect against moths.

  • November 13, 2013Joseph

    Incidentally, Konrad, Brendon, Daniel, and Corey,

    If you were interested in some research paper ideas there’s plenty here. Actually, it amazes me that every statistician doesn’t have 300 papers to their name, since I do believe it’s possible to come up with 25 paper ideas per day (I.J. Good style) and not run-out in a years time. It’s got to be the easiest field in the world to publish in. Doing that would be boring as hell, but some people seem to like it. I’m soooo clad I joined the Marines rather than became a prof!

    Anyway, I’ve deliberately suppressed all mathematical details in these posts, but there’s plenty about under the hood. Although lots of this is already likely in print, much of it may not be since it relies on stuff which is isn’t in the normal bag of tricks for statisticians, even for many of the pure math/measure theory types.

    For example, how do we really choose equation? Say for instance the domain of x is the interval equation rather than the real line. How would you choose equation?

    Well a natural idea is to use the Legendre Polynomials equation which form a complete orthogonal basis for some appropriate Hilbert Space of functions on equation. So while they may not be a good choice for equation in the sense that many polynomials are needed, at least we know that if we include enough of them we’ll get close enough to equation (under some conditions).

    This is a direct analog of the fact that the most natural choice for the equation‘s on the real line are the polynomials in the Taylor series expansion equation. The first two of which lead to the Normal Distribution. And as long as equation is unimodal, we won’t have to go to far in this series to get a decent approximation.

  • November 13, 2013Joseph

    Let me just summarize that last comment like this:

    There’s a strong interplay between sets of basis functions for Hilbert Spaces, Maximum Entropy, and Classical Statistics.

    A creative statistician can take that fact and run for miles with it.

  • November 13, 2013konrad

    I can’t comment on theoretical fields, but my experience in applied work is that the bottleneck is execution rather than ideas – generally we give up on many ideas just because they never rise high enough in the priority list so we don’t get around to them. Also, once one reaches a certain threshold quantity-wise, the emphasis shifts to quality.

    Before discussing how to choose G, we need to talk about why one should choose it. Especially since you’ve been arguing that such constraints tend to be satisfied accidentally (in which case inferring them tells us nothing about the system being studied – I’m guessing that the point of choosing a basis for G is that you want to infer G?).

  • November 13, 2013Joseph

    Well, there’s at least two immediately. Maybe there are reasons to think such a constrain is the real, or maybe we just want a function which summarizes the data and we don’t care whether it’s physically real (as in the data compression stuff from last post).

    But there’s definitely other scenarios. Suppose we have a equation which changes with time. An example might be stock prices or something. Then at each moment in time we want a distribution P(x|t) which puts the true value equation in its high probability manifold.

    One way to do that would be to choose the equation well and then consider the maxent distribution. Then if all goes well you might be able to get equation where the lagrange multipliers change as a function of time. Basically the “bubble” of high probability is following the true value around, sometimes increasing or decreasing in size depending on whether we know equation more or less precisely.

    If you’re good and chose the equation’s with good theoretical properties then this can lead to nice “equations of motion” for equation. Perhaps they satisfy differential equations which are possible to derive somehow.

    See this paper by Jaynes (especially section 6)

    http://bayes.wustl.edu/etj/articles/macroscopic.prediction.pdf

    A closely related one is this:

    http://bayes.wustl.edu/etj/articles/cinfscat.pdf

    They probably are Jaynes two most important papers. Not for what they contain so much as what they point too. As far as I can tell they’ve been completely ignored.

    Also, on a different note, consider what the log of such a maxent distribution looks like;

    equation

    This should remind you a bit of Generalized Linear Modeling. But that’s a subject for another day.

    All of this stuff though is likely to bring you right back to considering sets of basis functions of Hilbert Spaces and their close interplay with Maxent distributions. Only in some cases does this have any relevance to classical statistics though.

  • November 13, 2013Daniel Lakeland

    My advisor Roger Ghanem is one of the big names behind Polynomial Chaos Expansions (you can google around pretty easily) for representing the results of complex physical models with uncertainty by transforming random variables through basis functions. They have their interesting bits associated to them, but typically involve serious problems with the curse of dimensionality. I had a blog post years back with an argument about how we only ever really need around O(6) independent random variables in any given problem, provided we could construct functions of those variables. But many times a model is easier to construct as a simple function of a lot of variables rather than a complex function of a few. It seems clear to me that the flexibility that MCMC methods give you to be relatively dimension-independent is huge if you care about building good models.

    Also, I think the classical sampling theory stuff is a bit misguided in the same way you do. It’s really not so important to be given a histogram of inputs and nearly exactly match some observed histogram of outputs.

    There are several reasons for this, the first is that the histogram of inputs is maybe not all that stable or well modeled to begin with, secondly, the frequency interpretation limits the type of model you are willing to construct and typically this limits your ability to creatively solve problems by being smart about them (as opposed to just crunching numbers in supercomputers in a particular prescribed way), and third there is typically some quantities of interest which are really *of interest* and other quantities (nuisance parameters) that our model just has to be mildly adequate, and a strict frequency interpretation of probability also limits your ability to exploit this

  • November 13, 2013Daniel Lakeland

    I guess it’s pretty clear, my preference is to think hard about a problem, come up with some insight that simplifies our understanding of it, and then construct a solution based on insight. My dissertation is more or less doing that with two different fundamental problems: soil liquefaction, and wave energy dissipation. The Polynomial Chaos stuff is more often applied in the context of “we have an enormous FEM mesh and a crapload of processor power, but we don’t know what numbers we should put in the front end for loads and materials properties and things and we have almost no real world data to compare the output to anyway”

  • November 13, 2013Daniel Lakeland

    Examples are things like CO2 sequestration by injection into rock formations, storing nuclear waste in Nevada deserts, and climate models built from weather models. All of these have a certain feel to them that turns me off.

  • November 13, 2013Joseph

    Daniel, just building on what you talked about:

    “It’s really not so important to be given a histogram of inputs and nearly exactly match some observed histogram of output”

    Classical Statistics makes sense when we know the histogram of the data ahead of time and nothing else relevant about it. The confusing thing is that there are other very different situations which lead to the same mathematics. That makes it nearly impossible to explain what’s really going but I’m going to try over the next few posts.

    “I guess it’s pretty clear, my preference is to think hard about a problem, come up with some insight that simplifies our understanding of it, and then construct a solution based on insight.”

    We’re lucky because we tend to work in areas with high domain knowledge where we’re far more likely to know whether something being modeled is real or not. Unfortunately, the temptation to the follow the classical approach – fit the histogram and assume future histograms will be the same – is greatest precisely when there’s little real underlying knowledge (psychology, financial data mining, and so on) which is why they have so many irreproducible results.

    “All of these have a certain feel to them that turns me off.”

    Yeah I had the same reaction (one of my advisers was a big Bayesian Weather model guy). Also with geophysics earthquake type stuff which I love to play around with in an amateurish way. If I had to put the feeling into words it would be “they’re not accounting for the real uncertainties right”.

    Your reasons may have been of a different nature, but one thing I’m sure of: anytime a field feels wrong, it almost certainly is wrong.

  • November 13, 2013Daniel Lakeland

    I have a particular interest in the near or on-fault dynamics in earthquakes. For the most part seismologists can treat the earth as a linear elastic substance. Many of them work with frequencies in the range 0-2 Hz or so. If they’re near-fault maybe up to 10 or 20 Hz. At 3000m/s a 2Hz wave has a wavelength of 1500 m so the fault zone itself which might as well be somewhere between 1mm and 1m thick looks like a point. The so called “representation theorem” says that the deformations on a fault can be simulated by just applying stresses, at a distance the effect will be the same as the actual displacements. But this is basically because a 3m displacement (which is huge) is more or less “strain” when your resolving power is 1500m. If you’re actually interested in the physics of what goes on at the fault however, it’s not enough to solve the “outer problem”. you have to get into the nitty gritty details of how materials behave when they shatter, slide, heat, flow, fuse together, etc.

    So often my complaint is that people get some tool which is hugely successful in some regime and then after a while it has had so many successes that sometimes people only learn that tool, or they mistake that tool for reality and can’t make progress on some other problem where the tool isn’t appropriate.

  • November 13, 2013Corey

    “So often my complaint is that people get some tool which is hugely successful in some regime and then after a while it has had so many successes that sometimes people only learn that tool, or they mistake that tool for reality and can’t make progress on some other problem where the tool isn’t appropriate.”

    It strikes me that this is exactly Jaynes’s complaint about Fisher’s cookbooks on statistical methods.

  • November 14, 2013Joseph

    “So often my complaint is that people get some tool which is hugely successful in some regime and then after a while it has had so many successes that sometimes people only learn that tool, or they mistake that tool for reality and can’t make progress on some other problem where the tool isn’t appropriate”

    It’s funny to hear someone better informed express the same feeling. For shits and giggles, I spent a fair amount of time with this work:

    http://www.amazon.com/Theoretical-Global-Seismology-F-Dahlen/dp/0691001243/ref=sr_1_1?ie=UTF8&qid=1384432747&sr=8-1&keywords=dahlen+tromp

    My overwhelming impression was that the questions they were answering were being answered well, but they were only asking questions that could be well answered by late 19th century physics.

    Sometime later I may put out some posts on the subject of “finding the right variables”. The thesis is that finding the right variables for a problem is far harder than finding a good theory that relates them. Science exposition skims right over the former and concentrates all it’s effort on the later.

    The immediate concern will be macroeconomics, which hasn’t even come close to discovering the right macro-variables. Physics has done a far better job with it’s Stress Tensors and so on. But when it comes to something like Earthquakes, it’s very likely we haven’t discovered all the relevant (macro)variables we’d need. In other words, a good physical theory for earthquakes likely contains variables we’ve haven’t defined or measured yet.

    This may seem unrelated to statistics (it is completely unrelated to frequentist statistics) but it’s actually very closely tied to Bayesian Statistics. The post “Max Planck and the Foundations of Statistics” hints at the connection.

  • November 14, 2013konrad

    “This may seem unrelated to statistics (it is completely unrelated to frequentist statistics)”

    er, isn’t “statistic” just another name for a macro-variable?

  • November 14, 2013Joseph

    Konrad,

    Yes! At least that was the original meaning of “statistics’ (long before anyone thought of p-values). I was thinking heavily about that last weekend. It seems like the word has lost that connotation somewhat, possibly because of the use of “test statistic” in classical stat, which seems to give it a different vibe.

    Here’s what I believe:

    -That original meaning of statistic as a kind of macro-variable isn’t incidental to statistics its absolutely fundamental.

    -When people think they’re modeling or exploiting “randomness” what they’re usually exploiting is the highly non one-to-one’ness of some “statistic” (i.e. a mapping from some large space into a “macrovariable”)

    -It’s a hell of a lot easier of see the connection between Statistical Mechanics and ordinary statistics if you think in terms of “statistics as macrovariables”. There’s an inherent similarity between

        equation

  • November 14, 2013Daniel Lakeland

    I have a whole chapter in my dissertation on developing new types of continuum models by explicitly thinking about actual averages over real boxes in space to produce new variables, and comparing the size of the box to certain other length scales in the problem, and by doing this explicitly using the non-one-to-one nature of averages to get theories that do not depend on certain details.

    most of that is not really new, but I placed it in the context of calculus using infinitesimal numbers which I think was a nice way to connect physical reasoning about a box full of stuff to mathematical reasoning about the equations that govern continuum models.

  • November 14, 2013Joseph

    No it’s definitely not new. The thousand Kinetic Theories of Heat 150-200 years ago most of which are forgotten today testify to that. It’s very unexploited though. I claim that’s specifically a result of the failure to understand Bayesian Statistics.

    So when do we get see this infamous thesis?

  • November 14, 2013Brendon J. Brewer

    Two unconnected comments.

    “I have a whole chapter in my dissertation on developing new types of continuum models by explicitly thinking about actual averages over real boxes in space to produce new variables”

    That’s cool. Lots of introductory badly-taught statmech implies such concepts but then they don’t enter the formalism at all when you actually get down to the mathematics of it. e.g. “here’s an example of the 2nd law. When I remove the barrier, the gas expands to fill the whole container”. Yet the macrovariables are never things like “the fraction of the gas that is in this half of the container”.

    “http://bayes.wustl.edu”

    I wonder if anyone has downloaded all the papers from this site. If it ever went down we would lose a great resource.

  • November 14, 2013Daniel Lakeland

    It needs to be finalized by the thesis formatting gods, and then I am embargoing it until after a paper which is currently under review is actually published, but I’ll put it up on my blog or my company website after that (company website not yet available, your mileage may vary, void where prohibited by applicable law, all rights reserved). It will ultimately also be available via USC library download and google indexed and all that.

  • November 14, 2013Brendon J. Brewer

    A slight correction.

    “Yet the macrovariables are never things like “the fraction of the gas that is in this half of the container”.”

    This is considered in “kinetic theory” which tries to predict the time evolution of the frequency distribution of particles, using horrendously complicated ideas to do with collisions. I’ve never seen it done by using proper Gibbs/Jaynes reasoning, but then again, I’m not in touch with the statmech literature.

  • November 14, 2013Daniel Lakeland

    Brendon: thanks I think it’s a very nice unifying way of considering how continuum mechanics really works. Like for example, the cosserat continuum, what the heck does it mean physically to have a torque per unit area *at a point*? The general theory I develop the beginnings of gives you both a mathematical and physical interpretation

    As for that site, that’s a perfect reason to continue donating to archive.org

    https://web.archive.org/web/*/http://bayes.wustl.edu

  • November 14, 2013Joseph

    “I wonder if anyone has downloaded all the papers from this site. If it ever went down we would lose a great resource.”

    I got all jaynes’s from that website.

    “I’ve never seen it done by using proper Gibbs/Jaynes reasoning, but then again, I’m not in touch with the statmech literature.”

    Some has but only some. See here for example:

    http://www.amazon.com/Entropy-Evolution-Macroscopic-International-Monographs/dp/019965543X/ref=sr_1_2?ie=UTF8&qid=1384475329&sr=8-2&keywords=walter+grandy

    Really there’s so much more that could be done I can’t help but think of everything up to this point as “just scratching the surface”.

  • November 14, 2013Daniel Lakeland

    As for horrendously complicated ideas to do with collisions, ultimately physical evolution occurs because of forces (and hence collisions), the question is how to incorporate this knowledge that collisions matter together with your ignorance of the specific state to determine a time-evolution of macro-variables (such as the mean momentum in a region), as well as some measure of the uncertainty. Furthermore, what does the uncertainty in the state imply about the long-time evolution of the macro-state?

    I explicitly model the dissipation of wave energy in a tiny molecular bar using these ideas, and come up with a scaling law for the dissipation of wave energy due entirely to the observational size of the wave energy measuring device together with the uncertainty in the molecular state (as influenced by temperature). It’s compared to detailed molecular dynamics simulations, and unknown constants are inferred by bayesian MCMC comparing observed wave energy to predicted via a multi dimensional ODE.

    It’s really great to hear you guys think this is interesting, because most of the people around here were sort of like “so what’s the point of this, and what does it have to do with Civil Engineering?” ;-)

  • November 14, 2013Daniel Lakeland

    Also, note that the Lattice Boltzmann method is extremely successful for certain fluid flow problems, especially involving multiple phases or fluids interacting, it more or less models the distribution of molecules directly via collision rules and relaxation rules.

  • November 14, 2013Daniel Lakeland

    I had a colleague who was interested in the problem of heat transfer and fluid flow through nuclear fuel rod packs, I wanted, but didn’t have the time, to build a lattice boltzmann model and calculate the startup transient in such a case. Suppose for example you lost pump power and pressurization so that the fluid sat there and started to boil. Then you re-pressurized and started up the pumps to try to recover before you had a meltdown, what would go on in the fluid? How could you model it using what’s known about pipe flow and heat transfer, and the thermodynamics of water? It’s a really interesting problem begging to be studied. Anyone want to give me some grant funding?

  • November 14, 2013Joseph

    “so what’s the point of this, and what does it have to do with Civil Engineering?”

    Planck’s thesis committee supposedly didn’t understand his thesis and passed him because they thought he was a smart guy. From what I’ve read of his writing I find this very easy to believe.

    “It’s a really interesting problem begging to be studied. Anyone want to give me some grant funding?”

    What are the plans after graduation?

  • November 15, 2013Daniel Lakeland

    Plans for after graduation are TBA. There will in fact be a consulting company involved. Life is complicated when you have 2 kids and a wife who’s successfully on the tenure track already. Full bore post-graduation plans will begin most likely in the new year.

  • November 15, 2013konrad

    “so what’s the point of this, and what does it have to do with Civil Engineering?”

    A valid question from the engineering point of view – engineering provides a broad background with a healthy emphasis on real-world questions (something sorely missing in the more theoretical disciplines), but at bottom it is all about how to make stuff that people will be willing to spend money on. That’s why I left engineering for science after my PhD.

  • November 16, 2013Daniel Lakeland

    I actually am very interested in real world applications of scientific principles, but I’m not that interested in what you might call “formulaic” applications, ie. building codes and beam design procedures and soforth.

    The motivation for the microscopic wave dissipation work in my dissertation was ultimately determining the lower bound on the size of fracture events that can be in principle detected by a dectector of a certain physical size at a certain distance from the event at a certain temperature.

    Still, I can see why you’d want to go from eng -> sci. What are you actually doing at this point? I have the vague sense that you’re here in california at some university doing something sciency but we haven’t been actually introduced ;-)

  • November 17, 2013Joseph

    You could always do like Eric Floehr did. See the interview by John Cook:

    http://architects.dzone.com/articles/evaluating-weather-forecast

    basically, he took a small home project using python to look at historical weather data, and turned it into a small online business reselling the data. He didn’t strike it rich, but he made more than enough to quit his day job.

  • November 17, 2013konrad

    Daniel: yes, I’m at UCSD now, at the Dept of Medicine (a subject I don’t actually know anything about). Here’s all you want to know and more: http://id.ucsd.edu/faculty/KonradSchefflerPhD.shtml

  • November 18, 2013Joseph

    Konrad, you look like you had just witnessed someone kicking a puppy in that picture.

  • November 18, 2013Daniel Lakeland

    That Floehr article is interesting. Your site Entsophy.com with the quantitative consulting stuff, is that what you’re currently pursuing as career path, or is it a side-project to some other “day job”?

  • November 18, 2013Joseph

    Daniel,

    There’s no good options these days, there really aren’t. Academia is a nightmare, as is consulting. Only a tiny fraction of the population is really cut out for cubicle work, and I’m sure as the hell not one of them.

    By shear dumb luck, I’ve been very lucky. The Marines and Physics background are both individually rare, but putting them together meant I could do some things it’s very hard to find people for. Having the right security clearances helps a great deal as well. But most peoples experiences with consulting are far far worse than mine.

    So no, I don’t have a career per se and just do whatever I want, constrained only by the need to keep the kids fed. The only advice I’d have for anyone is to be very entrepreneurial. I would view any positions in Academia and consulting as stepping stones at best. They’re just a way to pay to keep the kids fed, until you can figure out better way to make a living.

  • November 18, 2013Daniel Lakeland

    “Constrained only by the need to keep the kids fed” is not exactly a small constraint. :-) I think we just recently barely passed the point where daycare is now cheaper than our mortgage by a few dollars a month.

    I’ll let you know how my entrepreneurial streak works out in practice. So many of these stories are sort of like yours or the Eric Floehr one, where things kind of accidentally work out well for someone when they accidentally find a latent niche demand full of people willing to actually listen to and believe in their product/service.

Leave a Reply or trackback