The Amelioration of Uncertainty

Data Compression and the Nobel in Economics

Consider the following data compression problem. Suppose we have a large data set equation we wish to transmit. They’re too many to send directly but luckily the precise values aren’t important. Slightly different values would work as long as the data retained it’s overall character. So how do we avoid transmitting n raw data points?

Here’s one method for compressing the data before sending. First define an empirical distribution


which is best thought of as a “histogram”. The easiest way to see it’s nature would be to divide x-space into small boxes and graph the histogram. Next find a distribution based on a few parameters equation such that


Instead of transmitting n data points, just send the name of this new distribution with the k parameter values. The receiver can then simulate a new set of equation‘s for themselves.

But how in practice can we find a suitable equation?

The answer lies with Entropy. First pick a function equation which in some sense captures the nature of equation and equate expected values:

(1)   equation

Then find equation by maximizing the entropy equation subject to the constraint in (1). The result will be a distribution equation where lambda is chosen to ensure (1) holds.

If this equation is close to the empirical distribution then we’re done. If not, then choose an additional function and re-maximize subject to two constraints. Each time you introduce a new function you’ll bring equation closer to equation, so this process will always work eventually. In practice though, we’d carefully choose equation to get equation. The fewer constraints it takes to capture the essence of the data the better.

What does this have to do with the Nobel Prize in Economics? To see the connection plug the definition for equation into the constraint equation (1) and note that equation satisfies:


Which is the just the Method of Moments. A generalization of this procedure helped win the Nobel Prize in Economics this year. The usual “Method of Moments” type procedures don’t require a maxent distribution, but ideally you’d use a distribution which does a good job of “modeling” the data, which amounts to the same thing in this case.

Notice that if equation then equation would be a sufficient statistic for equation. This has an intuitive meaning in terms of data compression. Only certain functions of the data are required to get the equation‘s, and they are all the receiver needs to reconstruct the data. In other words, those functions are “sufficient” to capture the essence of the data.

November 11, 2013
  • November 11, 2013Joseph

    The “data compression” stuff was a bit of a subterfuge to introduce some ideas, but it is a real problem nevertheless. Suppose the receiver had a need to compute various functions:


    And the sender didn’t know what equation will be chosen. Perhaps the receiver may have new functions they’d need to compute years later or something. At any rate, just assume the equation‘s aren’t known or predictable to the sender.

    The brute force solution would be to send all equation, then the receiver could compute whatever they wanted whenever they want. If that’s too much data to send, then just send the name for equation with the values for equation.

    With just that little bit of information the receiver could compute


    for any function F they wanted. As long as the approximation was good enough, the whole thing’s a success and requires a good deal less bandwidth.

  • November 13, 2013Joseph

    Also, the scenario in the previous comment is far from fanciful. This is exactly what often happens in journal articles. If a researcher fits a distribution and reports it in the article, then this serves as a way to communicate the data without providing the entire data set. The journal article reader can then compute any expectation value they want without having the raw data.

Leave a Reply or trackback