A Common Derivation for Markov Chain Monte Carlo Algorithms [pdf]

joe_the_user · on Nov 3, 2018

Oh, I'd actually found this approach searching a while back.

It frames all Markov Chain Monte Carlo as a variation of slice sample. I like it because it makes the process fairly simple but I have tried to implement it yet.

https://en.wikipedia.org/wiki/Slice_sampling

smallnamespace · on Nov 4, 2018

Another way to look at this: slice sampling is analogous to taking the Lebesgue integral rather than the Riemann integral (we sum small horizontal strips rather than vertical bars).

timdellinger · on Nov 3, 2018

It's interesting seeing how a community codifies the nomenclature and the formalization of the various things that it has stolen/borrowed/reinvented from other communities.

smallnamespace · on Nov 3, 2018

Here's a very simple example of what MCMC does and how it can be useful:

- Let's say I offer to play a game with you, where you take an unbiased coin, flip it a thousand times, then we take the number of heads (H) and subtract it from the number of tails (T), and I pay you $|(H-T)|. So if there are 520 heads and 480 tails, I'll owe you $40, but if it were reversed, you'd owe me nothing. How much would you be willing to pay to play this game with me, i.e. what’s the expected value of this game?

- One simple approach would be to simulate the entire game many times to estimate the payoff, which is 'vanilla' Monte Carlo simulation. We can do it like this:

  import random
  def flip():
    # 1 or -1
    return random.randint(0, 1) * 2 - 1

  numDraws = 1000
  numFlips = 1000
  totalPayoff = 0;
  for i in range(numDraws):
    totalPayoff += abs(sum(flip() for _ in range(numFlips)))

  print('$' + str(round(totalPayoff / numDraws, 2))) # around $24

The more simulations we run, the more accurate our result will be, but we need to run a full thousand flips for every attempt, so the time complexity is O(flips * tries). What if the game had 10k or 100k flips instead of 1k?

This is the problem that MCMC solves: finding an efficient way to draw more samples of a process, where each independent draw is hard to calculate. For this problem, note that:

1. If you start with a set of 1000 coins that are randomly initialized, and you just keep picking up at random and flipping it from its current configuration, after you flip it enough times, the equilibrium distribution of the result will match the expected distribution of the coins after 1000 flips in our original game.

2. This means if you just keep randomly picking up a coin and flipping it, then recording the payoff that you would have gotten had that been the thousandth flip, once you average all those payoffs, you'll also have an estimate of payoff.

In code:

  numSteps = 1000 * 30
  numFlips = 1000
  totalPayoff = 0;

  # initialize our coins
  coins = [0] * numFlips
  for i in range(numFlips):
    coins[i] = flip() # 1 or -1

  currentTotal = sum(coins)
  totalPayoff = 0;
  for i in range(numSteps):
    coinToFlip = random.randrange(numFlips)
    coins[coinToFlip] *= -1
    currentTotal += 2 if (coins[coinToFlip] is 1) else -2
    totalPayoff += abs(currentTotal)

  print('$' + str(round(totalPayoff / numSteps, 2))) # Also ~$24

Now we've gotten the time complexity down to O(numSteps), regardless of the number of coins that we're flipping. Now the question is: how does O(steps) relate to O(numTries * numFlips)? It turns out with a bit of math, that we can show that for a similar amount of accuracy, steps ~ tries * sqrt(flips), so here we save a sqrt(numFlips) factor (one can also find tighter bounds). For very long games, this can be a substantial speed improvement.

This is the essence of MCMC: when it's expensive to take independent draws of a space, we can instead take a random walk around once instance of the space and still get a reasonable answer. A lot of the details of MCMC revolve around 1) how do I walk around the space and 2) how many steps must I take so I know the answer is reasonable?

* For this toy problem, there is actually a simple closed-form solution for the probability so there’s no need to simulate, but for more complicated processes numerical simulation is often the best option available for getting a number

credit_guy · on Nov 4, 2018

> This is the essence of MCMC: when it's expensive to take independent draws of a space, we can instead ...

I disagree. The essence of MCMC is that it's a clever MC scheme that defeats the curse of dimensionality by using an adaptive importance sampling method. In this method, you use information from the current sample to help draw a better new sample, or at least not a much worse one. This way, even if you start form a place far away from the region of interest, you end up in that region; the trade-off is that the samples are obviously not independent anymore. In very high dimensional spaces the region of interest is always very tiny compared to the vastness of the space (that's the "curse of dimensionality"), so even when it's very cheap to draw samples, you would be hopeless without an importance sampling scheme.

smallnamespace · on Nov 4, 2018

> by using an adaptive importance sampling method

Although not all MCMC schemes actually do this, e.g. naive Gibbs (re)sampling just sweeps through the domain spaces iteratively.

You're right though MCMC enables you to do this relatively easily. Still requires some work if to get this right, especially if your integrand has multiple 'spikes'.

timdellinger · on Nov 3, 2018

> * For this toy problem, there is actually a simple closed-form solution for the probability so there’s no need to simulate

I got about 1/3 of the way through your comment, when my knee jerked and I said "Bernoulli trials, Binomial distribution".

malshe · on Nov 4, 2018

Thanks for this example!

ranc1d · on Nov 4, 2018

This was posted on here 2 years ago but didn't get any comments.

https://news.ycombinator.com/item?id=12286521

bob1029 · on Nov 3, 2018

Incomprehensible LaTeX soup IMO. What value does this paper afford one of us "hackers"? It would take me weeks to digest this to only find it was a bunch of bullshit talking in circles over exceptionally pedantic topics that have no real world applications.

boxfire · on Nov 3, 2018

For many researchers MCMC is described in very specific terms very differently in many places. This paper offers a single view of how to understand the idea and how you can use it to derive specializations for your field. Of course if you have no interest in mathematics, just move on. Nothing about being a good hacker means being abrupt and unwilling to dive deep in the necessary tools for a job. I for one am glad to see a paper like this and know I can start from here to build my own sampler for a complex system for which there is otherwise no software to efficiently sample. The hacker spirit is curious and open minded, but moves fast. If it's not a tool for you that doesn't mean it can't help thousands of others gain real insight into a difficult subject.

tanderson92 · on Nov 3, 2018

There is lots of specialized knowledge posted on HN. As a factual matter, MCMC is definitely not without real-world applications.