Non-random Metropolis-Hastings Algorithms
It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a non-randomized way that delivers better performance but requires more thought.
This remark by E. T. Jaynes will be illustrated with the Metropolis-Hastings algorithm.
First some background. The expected value of a function can be estimated using the Riemann sum

If the
are independent draws from the distribution
(which are then ordered) the Riemann sum simplifies to:

So to find the integral, just average
. As a bonus, the points once found can be reused for the expected value of any reasonable function.
Finding “independent random draws from P(x)” – whatever that means – is tricky though. It can be avoided by noting that what’s needed are points which make

Any such points
will do no matter how they were obtained. This can be achieved independent of
by finding points such that,
(1) 
Note the histogram of
will resemble
since the points will be close when
is large.
The condition (1) can sometimes be turned into a recursion relation for generating
without randomization or rejection:

To illustrate, I’ll use this to generate a thousand points from the Standard Normal
. Take

The reader can verify in minutes that the first 1000 points satisfy any normality test you care to use with a p-value greater than .99.
This would seem to be a handy way of generating a distribution, but unfortunately programming a general purpose version is hard, especially when
is multidimensional and the shape of the distribution is unknown and complicated. The recursion relation is liable to miss some high probability regions of
.
One workaround is randomizing the jump from
to
to increase the chance of fully exploring
. So for example, one might randomly pick a candidate
close to
, then force (1) to hold in the long run by accepting or rejecting
in such a way the expected magnitude of
is constant from jump to jump. This is precisely the Metropolis-Hastings Algorithm.
Having seen this though, one can imagine general purpose recursion type algorithms at least as good, that achieve (1) without randomization or rejection, but requiring more code to insure the distribution is fully explored. All of which leads to a corollary:
Whenever there’s a randomized way of programming something, there is a non-randomized way that delivers better performance but requires more coding.


April 22, 2012Corey
link
Couple of points:
Jaynes is more-or-less making a conjecture in computational complexity theory that the polynomial complexity class is equal to the bounded-error probabilistic polynomial complexity class. This is plausible but not proven.
The main advantage of MCMC is that you don’t need to compute the normalization constant of the target distribution.
August 29, 2012David Rohde
link
I find this issue fascinating… and thought I would post a few links to interesting things somewhat related (stuff – I want to reread anyway, by some of my favourite Bayesians)
Bayesian Monte Carlo
http://mlg.eng.cam.ac.uk/zoubin/papers/RasGha03.pdf
Non-random Monte Carlo
http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/
The MCMC revolution
http://math.uchicago.edu/~shmuel/Network-course-readings/MCMCRev.pdf