## Non-random Metropolis-Hastings Algorithms

It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a non-randomized way that delivers better performance but requires more thought.

This remark by E. T. Jaynes will be illustrated with the Metropolis-Hastings algorithm.

First some background. The expected value of a function can be estimated using the Riemann sum

If the are independent draws from the distribution (which are then ordered) the Riemann sum simplifies to:

So to find the integral, just average . As a bonus, the points once found can be reused for the expected value of any reasonable function.

Finding “independent random draws from P(x)” – whatever that means – is tricky though. It can be avoided by noting that what’s needed are points which make

Any such points will do no matter how they were obtained. This can be achieved independent of by finding points such that,

(1)

Note the histogram of will resemble since the points will be close when is large.

The condition (1) can sometimes be turned into a recursion relation for generating without randomization or rejection:

To illustrate, I’ll use this to generate a thousand points from the Standard Normal . Take

The reader can verify in minutes that the first 1000 points satisfy any normality test you care to use with a p-value greater than .99.

This would seem to be a handy way of generating a distribution, but unfortunately programming a general purpose version is hard, especially when is multidimensional and the shape of the distribution is unknown and complicated. The recursion relation is liable to miss some high probability regions of .

One workaround is randomizing the jump from to to increase the chance of fully exploring . So for example, one might randomly pick a candidate close to , then force (1) to hold in the long run by accepting or rejecting in such a way the expected magnitude of is constant from jump to jump. This is precisely the Metropolis-Hastings Algorithm.

Having seen this though, one can imagine general purpose recursion type algorithms at least as good, that achieve (1) without randomization or rejection, but requiring more code to insure the distribution is fully explored. All of which leads to a corollary:

Whenever there’s a randomized way of programming something, there is a non-randomized way that delivers better performance but requires more coding.

April 22, 2012Corey

link

Couple of points:

Jaynes is more-or-less making a conjecture in computational complexity theory that the polynomial complexity class is equal to the bounded-error probabilistic polynomial complexity class. This is plausible but not proven.

The main advantage of MCMC is that you don’t need to compute the normalization constant of the target distribution.

August 29, 2012David Rohde

link

I find this issue fascinating… and thought I would post a few links to interesting things somewhat related (stuff – I want to reread anyway, by some of my favourite Bayesians)

Bayesian Monte Carlo

http://mlg.eng.cam.ac.uk/zoubin/papers/RasGha03.pdf

Non-random Monte Carlo

http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/

The MCMC revolution

http://math.uchicago.edu/~shmuel/Network-course-readings/MCMCRev.pdf