It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a non-randomized way that delivers better performance but requires more thought.
First some background. The expected value of a function can be estimated using the Riemann sum
If the are independent draws from the distribution (which are then ordered) the Riemann sum simplifies to:
So to find the integral, just average . As a bonus, the points once found can be reused for the expected value of any reasonable function.
Finding “independent random draws from P(x)” – whatever that means – is tricky though. It can be avoided by noting that what’s needed are points which make
Any such points will do no matter how they were obtained. This can be achieved independent of by finding points such that,
Note the histogram of will resemble since the points will be close when is large.
The condition (1) can sometimes be turned into a recursion relation for generating without randomization or rejection:
To illustrate, I’ll use this to generate a thousand points from the Standard Normal . Take
The reader can verify in minutes that the first 1000 points satisfy any normality test you care to use with a p-value greater than .99.
This would seem to be a handy way of generating a distribution, but unfortunately programming a general purpose version is hard, especially when is multidimensional and the shape of the distribution is unknown and complicated. The recursion relation is liable to miss some high probability regions of .
One workaround is randomizing the jump from to to increase the chance of fully exploring . So for example, one might randomly pick a candidate close to , then force (1) to hold in the long run by accepting or rejecting in such a way the expected magnitude of is constant from jump to jump. This is precisely the Metropolis-Hastings Algorithm.
Having seen this though, one can imagine general purpose recursion type algorithms at least as good, that achieve (1) without randomization or rejection, but requiring more code to insure the distribution is fully explored. All of which leads to a corollary:
Whenever there’s a randomized way of programming something, there is a non-randomized way that delivers better performance but requires more coding.