The Amelioration of Uncertainty

## Data Compression and the Nobel in Economics

Consider the following data compression problem. Suppose we have a large data set we wish to transmit. They’re too many to send directly but luckily the precise values aren’t important. Slightly different values would work as long as the data retained it’s overall character. So how do we avoid transmitting n raw data points?

Here’s one method for compressing the data before sending. First define an empirical distribution

which is best thought of as a “histogram”. The easiest way to see it’s nature would be to divide x-space into small boxes and graph the histogram. Next find a distribution based on a few parameters such that

Instead of transmitting n data points, just send the name of this new distribution with the k parameter values. The receiver can then simulate a new set of ‘s for themselves.

But how in practice can we find a suitable ?

The answer lies with Entropy. First pick a function which in some sense captures the nature of and equate expected values:

(1)

Then find by maximizing the entropy subject to the constraint in (1). The result will be a distribution where lambda is chosen to ensure (1) holds.

If this is close to the empirical distribution then we’re done. If not, then choose an additional function and re-maximize subject to two constraints. Each time you introduce a new function you’ll bring closer to , so this process will always work eventually. In practice though, we’d carefully choose to get . The fewer constraints it takes to capture the essence of the data the better.

What does this have to do with the Nobel Prize in Economics? To see the connection plug the definition for into the constraint equation (1) and note that satisfies:

Which is the just the Method of Moments. A generalization of this procedure helped win the Nobel Prize in Economics this year. The usual “Method of Moments” type procedures don’t require a maxent distribution, but ideally you’d use a distribution which does a good job of “modeling” the data, which amounts to the same thing in this case.

Notice that if then would be a sufficient statistic for . This has an intuitive meaning in terms of data compression. Only certain functions of the data are required to get the ‘s, and they are all the receiver needs to reconstruct the data. In other words, those functions are “sufficient” to capture the essence of the data.

November 11, 2013
• November 11, 2013Joseph

The “data compression” stuff was a bit of a subterfuge to introduce some ideas, but it is a real problem nevertheless. Suppose the receiver had a need to compute various functions:

And the sender didn’t know what will be chosen. Perhaps the receiver may have new functions they’d need to compute years later or something. At any rate, just assume the ‘s aren’t known or predictable to the sender.

The brute force solution would be to send all , then the receiver could compute whatever they wanted whenever they want. If that’s too much data to send, then just send the name for with the values for .

With just that little bit of information the receiver could compute

for any function F they wanted. As long as the approximation was good enough, the whole thing’s a success and requires a good deal less bandwidth.

• November 13, 2013Joseph

Also, the scenario in the previous comment is far from fanciful. This is exactly what often happens in journal articles. If a researcher fits a distribution and reports it in the article, then this serves as a way to communicate the data without providing the entire data set. The journal article reader can then compute any expectation value they want without having the raw data.