Monte Carlo methods

Table of Contents

Overview

  • Attempts to solve two types of problems:
    1. Generate samples markov_chain_monte_carlo_94e627499f613b0f144acb1171dfdca633bda615.png from a given probability distribution markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png
    2. Estimate expectations of functions under this distribution, e.g.

      markov_chain_monte_carlo_0834087462ae78b608e437fc8c37993536bdb3f8.png

  • Class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution.
  • State of the chain after a number of steps is then used as a sample from the desired distribution
  • Really good stuff about MCMC methods can be found in Chapter 29 and 30 in mackay2003information.

Notation

  • markov_chain_monte_carlo_fdc55d2cdc4a931b899449f272c9fb1d8f44e144.png denotes the empirical expectation of markov_chain_monte_carlo_93369077affb352dbdb97e8b3182fd50784f2b14.png
  • markov_chain_monte_carlo_ffc094702275ba71f91c573bf9490cc643285973.png denotes the partition function or normalization factor of distribution markov_chain_monte_carlo_7225b076f6e6326f1636b11d1aad8de58bcc4761.png, i.e. we assume that markov_chain_monte_carlo_f449b98c33acb46c25c07c67bb5e90f57e81a2f8.png defines a probability distribution.

Importance Sampling (IS)

Consider the following

markov_chain_monte_carlo_d31d77b6f66e69c7a12b644cee60f29186d0cccb.png

where we've let markov_chain_monte_carlo_36ad90f20778a7bfc4437adfd7661db0a5aa8e78.png. Using MCMC methods, we can approximate this expectation

markov_chain_monte_carlo_0063517af0826ff5cabf4b0d80f108de36442bfc.png

or equivalently,

markov_chain_monte_carlo_79c9b6dc7359ef2d47522723bbd18c17e934153f.png

This is importance sampling estimator of markov_chain_monte_carlo_7225b076f6e6326f1636b11d1aad8de58bcc4761.png and is unbiased.

The importance sampling problem is then to find some biasing distribution markov_chain_monte_carlo_f63749027365e29025a5cb867262c465d2de65bb.png such that the variance of the importance sampling estimator is lower than the variance for the general Monte Carlo estimate.

We can replace markov_chain_monte_carlo_0515ee464f5155f4e07ab972251bc00971769dff.png and markov_chain_monte_carlo_14d61cc6581909bbf0f4ebba85a68bfc302f35c9.png with markov_chain_monte_carlo_81c4f1c92b10e0be46af81da41eef3e0b6033e61.png and markov_chain_monte_carlo_0976415340da20eeb5a1b0632628089ccf8b7fcb.png, assuming

markov_chain_monte_carlo_6b13cca9b26f1fa0bee80b9ac97b07aae2ec2d21.png

Then

markov_chain_monte_carlo_8a48416e4e217faee922919b1a4fc8edad5a2ad6.png

where

markov_chain_monte_carlo_0c8c5b9279d4fb2d064078649f6cea067859e495.png

Since

markov_chain_monte_carlo_0d73280a182c5e4eec90a9e9a7b1a7b130f68cb1.png

We have

markov_chain_monte_carlo_7a005dadd36cc0ecbc3e7652bc5037727bcd9454.png

with markov_chain_monte_carlo_a1dfa18cc15ab719c29ae079ca9cea320b972d45.png. That is, as markov_chain_monte_carlo_d42970d8ef75b6eb77b66c571a7ec127029eaef6.png, taking the unnormalized weigth markov_chain_monte_carlo_13fdbcba5e32d68e3a720fcc6df0fbd158009cab.png we arrive at the same result as if we used the proper weights markov_chain_monte_carlo_1d3176708ac9b34b8fc9173783cb811f671cd620.png.

Issues

Large number of dimensions

In a large number of dimensions, one will often have the problem that a small number of large weights markov_chain_monte_carlo_a5d751f11266787a33cf7fc876c40d8059d2cdbb.png will dominate the the sampling process. In the case where large values of markov_chain_monte_carlo_b4625aa4535b6123d93c130d4dc4772a395916cd.png lie in low-probability regions this becomes a huge issue.

Rejection Sampling

  • Have markov_chain_monte_carlo_f449b98c33acb46c25c07c67bb5e90f57e81a2f8.png which is too complex to sample from
  • We have a simpler density markov_chain_monte_carlo_14d61cc6581909bbf0f4ebba85a68bfc302f35c9.png which we can evalute (up to a factor markov_chain_monte_carlo_68ed90dbc5ab5b772ae69a0b761b49cdb414f667.png) and raw samples from
  • Assume that we know the value of a constant markov_chain_monte_carlo_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png s.t.

    markov_chain_monte_carlo_8711bf83cb2ca8ab380fefb71d3759db82bdb4d0.png

  • For markov_chain_monte_carlo_c0d6704bb86826bd6091a38b53ea0cd974f4aac7.png:
    1. Generate markov_chain_monte_carlo_553f2c3cfa607e9fed28bc05049fb59ac0e395bd.png
    2. Generate markov_chain_monte_carlo_379045d298db72ec61db739b4f01843b0609af41.png
    3. Accept markov_chain_monte_carlo_2eb41a2c49131bb915879789eaaf389f2e47f09d.png if markov_chain_monte_carlo_c814d8c1743a37e139e8615d8ab9faa159bd94e7.png, reject markov_chain_monte_carlo_7a84c9a383f9772338016d101ccc096be06af784.png if markov_chain_monte_carlo_f4ee8f33833342dd2058c746fc1b879b62abe9c7.png
  • Then

    markov_chain_monte_carlo_1810f97076d8d6e0ccdb5bdda9731d8259cd0b1c.png

We're simply sampling from a distibution we can compute, and then computing the ratio of samples under markov_chain_monte_carlo_72ae8a925245506614e5646f37f3912968acbba6.png which also fell under markov_chain_monte_carlo_81c4f1c92b10e0be46af81da41eef3e0b6033e61.png. Since we then know the "area" under markov_chain_monte_carlo_72ae8a925245506614e5646f37f3912968acbba6.png we know that the area under markov_chain_monte_carlo_81c4f1c92b10e0be46af81da41eef3e0b6033e61.png is simply the ratio of points accepted.

This is basically a more general approach to the Box method Sampling, where we instead of having some distribution markov_chain_monte_carlo_14d61cc6581909bbf0f4ebba85a68bfc302f35c9.png to sample from, we simply use a n-dimensional box (or rather, we let markov_chain_monte_carlo_14d61cc6581909bbf0f4ebba85a68bfc302f35c9.png be a multi-dimensional markov_chain_monte_carlo_96b470acea4f9f21618505ad0e5e003e0d10e3b6.png).

Issues

Large number of dimensions

In large number of dimensions it's very likley that the requirement

markov_chain_monte_carlo_dbd976734059123371cb3e91a6a3726c39133b6e.png

forces markov_chain_monte_carlo_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png to be so huge that acceptances become very rare.

Even finding markov_chain_monte_carlo_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png in a high-dimensional space might be incredibly difficult.

Markov Chain Monte Carlo (MCMC)

Markov chains

A Markov chain is said to be reversible if there is a probability distribution markov_chain_monte_carlo_9e283f0a770977b9c5776238e85e4b1f2406cdd4.png over its states such that

markov_chain_monte_carlo_ac0facd616ce0f4e8a38e25eb9281f52a34ca0c8.png

for all times n and all states markov_chain_monte_carlo_f78e53d727e82c8e43205bb55f2368f6dc0affa3.png and markov_chain_monte_carlo_d8797d487554fc78e460f5843167fcb01b53e76b.png. This condition is known as the detailed balance condition (some books call it the local balance equation).

Methods

Metropolis-Hastings

The family of Metropolis-Hastings MCMC methods all involve designing a Markov process using two sub-steps:

  1. Propose. Use some proposal distribution markov_chain_monte_carlo_edbb53fa0bc49f43cdea9d16ae886a44e0677e51.png to propose the next state markov_chain_monte_carlo_50ade4dd9b223c2d730777708ff0711d1c8f82f1.png given the current state markov_chain_monte_carlo_7a84c9a383f9772338016d101ccc096be06af784.png.
  2. Acceptance-rejection. Use some acceptance distribution markov_chain_monte_carlo_64b1da954ea519f52594b0cbc7bde0b094ea7886.png and accept / reject depending on this conditional distribution.

The transition probability is then

markov_chain_monte_carlo_4960c9c9aa8cf9c94f5bae65c2d42b955e476eeb.png

Inserting into the detailed balance equation for a MC we have

markov_chain_monte_carlo_b9790c14ec0a62f2f8f5929d9f7868f0e71d334a.png

This is satisfied by (for example) the Metropolis choice:

markov_chain_monte_carlo_c082cd05d50b8a9cbae8efde62be9c2734fd980d.png

The Metropolis-Hastings algorithm in its full glory is then:

  1. Initialize
    1. Pick initial state markov_chain_monte_carlo_c462c980a45481116745a8647094b2b1d245df0f.png
    2. Set markov_chain_monte_carlo_9b50c3c5095e3def3ea0a38afe0602245b66e068.png
  2. Iterate
    1. Generate: randomly generate a candidate state markov_chain_monte_carlo_50ade4dd9b223c2d730777708ff0711d1c8f82f1.png according to markov_chain_monte_carlo_edbb53fa0bc49f43cdea9d16ae886a44e0677e51.png
    2. Compute: compute acceptance probability markov_chain_monte_carlo_263e15bb9e8ba64eda18150ab569c1138b1c74da.png
    3. Accept or Reject:
      1. Generate a uniform random number markov_chain_monte_carlo_eae173c56526f36d57ecd042a3b47b0e597cc542.png
      2. If markov_chain_monte_carlo_a37384d8c6c6d475068f988f0a449c6daa628049.png, accept new state and set markov_chain_monte_carlo_4f4a1ab5d34b07b70d8c368c2fe32dc783ed3fa2.png
      3. If markov_chain_monte_carlo_22fb034e7e2912e1be6b20833709b5884e91426c.png, reject new state and set markov_chain_monte_carlo_dc4978ebdf0e029a4c9700ce6fca34dfbec7d68f.png
    4. Increment: set markov_chain_monte_carlo_60b895b84cbd688e55acdd88925a36d87adfb336.png

The empirical distribution of the states markov_chain_monte_carlo_d01ab457709353c0759e7955744aeaffd415fa94.png will approach markov_chain_monte_carlo_ec844b96e2007c3d21ed371bbeb16ba5fd0bfa6f.png.

When using MCMC methods for inference, it's important to remember that markov_chain_monte_carlo_ec844b96e2007c3d21ed371bbeb16ba5fd0bfa6f.png is usually the log-likelihood of the data.

This can also be seen by considering the posterior distribution of the parameters given the data, and considering the ratio between two different parameters, say markov_chain_monte_carlo_ada209414f10e6505347b677d21c6cc9da55b721.png and markov_chain_monte_carlo_3f1c4f040bdfd2b1728d6069f6ef0777a4855939.png. Then

markov_chain_monte_carlo_9afe6fded16484e2512f4841053ecb637027220c.png

Letting the prior-distribution on markov_chain_monte_carlo_7bec91615d02a5342b34270ae4a669947491ecaf.png be the proposal distribution markov_chain_monte_carlo_5eda5bc8724030dff96b49f83b0e9ba134ffc700.png in Metropolis-Hastings algorithm, we see that in this case we're simply taking steps towards the parameter markov_chain_monte_carlo_7bec91615d02a5342b34270ae4a669947491ecaf.png which maximizes the log-likelihood under some prior.

Further, one could make the ratio markov_chain_monte_carlo_0b0ed7635e6a743b6b1b1b85615d5bb4391054ec.png by using a uniform distribution (if we know that markov_chain_monte_carlo_85efd414c7edb9fcd94caee73577a4d6d0cd7910.png for some markov_chain_monte_carlo_ace2bca881e8ad00894c8bc9404c9d32b19d7e07.png.

The choice of markov_chain_monte_carlo_c5d79cfc69c50dbdb48775e5b4b7737f02a46938.png is often referred to as a (transition) kernel when talking about different Metropolis-Hastings methods.

Gibbs

Gibbs sampling or a Gibbs sampler is a special case of a Metropolis-Hastings algorithm. The point of a Gibbs sampler is that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution.

Suppose we want to sample markov_chain_monte_carlo_34739318f8ef54c2e912a0984cc50dbcf2d5cb58.png from a join distribution markov_chain_monte_carlo_86aa47d10b819fb8ff985729906e3facfb2b7a05.png. Denote k-th sample by markov_chain_monte_carlo_86522716a21ab8c1df2d2a158d1c71025ca926de.png. We proceed as follows:

  1. Initial value: markov_chain_monte_carlo_9782e554d35ca939c22cf165fb2cc1c8f33572dd.png
  2. Sample markov_chain_monte_carlo_7e0f7a6fc827127f46c1eac281a4fe774304a9e5.png. For markov_chain_monte_carlo_5e02f0b7e5230e6a21d15cb594ac964a1d8ae5c7.png:
    1. Sample markov_chain_monte_carlo_5df2028a52caeaf910c003ece3451c2dc9acfe75.png. If sampling from this conditional distribution is not tractable, then see Step 2 in the Metropolis-Hastings algorithm for how to sample.
  3. Repeat above step until we have the number of samples we want.
Proof of convergence

Suppose we want to estimate the markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png for a state markov_chain_monte_carlo_7d8a57b8fcbf3d52278b708fa0818ec4f7c14b31.png.

Let markov_chain_monte_carlo_203fe101666c890125cab15c0d151d6ddfc2df77.png denote the values which markov_chain_monte_carlo_e12d4a6ed95d301ef8034f7f5f23b0f9441b3b0d.png can take on, i.e. markov_chain_monte_carlo_53c951ec14f3126cacfad5f7f1aaa180a08ed30e.png is all the possible states, keeping markov_chain_monte_carlo_5ea13e10d0ae85fcb7e47caa15b5618926a4fc70.png constant.

Assuming markov_chain_monte_carlo_b3525fcf6ca611d9e3401e37fa95bb9bbd3947a8.png, the Gibbs procedure gives us

markov_chain_monte_carlo_35012846042b8dfb6bb00c4af59b1d280ef631fa.png

Which we can rewrite as

markov_chain_monte_carlo_29629e11dbca65093dde79e488fa9e9b09dc307c.png

by using the identity

markov_chain_monte_carlo_4f2948a0b1c80606bc4243c0957acab2896aa5b6.png

We note that

markov_chain_monte_carlo_e67c5ff6b52f0857d9acce55f3f91f8185fa7e39.png

and get

markov_chain_monte_carlo_2afd1843c977925a446e353270651d818f5210cf.png

Recall that markov_chain_monte_carlo_fdf595bfdd8e5143bc0abe9d7fb5179e77310597.png, so we have

markov_chain_monte_carlo_d19c15e6ec549c4a41d5d9d6c28812f55678f7a3.png

which is simply

markov_chain_monte_carlo_6720950797b60a1dc4ed8fe3cb982d20185db2ba.png

That is,

markov_chain_monte_carlo_a40011e55db0ea10da4c2ed6145679098f7e084c.png

which implies the detailed balance equation is satisfied, hence markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png is the stationary distribution of the resulting Markov Chain.

Further, to ensure that this stationary distribution markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png actually exists, observe that we can reach any state markov_chain_monte_carlo_84e1ac79714032450446238575325326ada5acc3.png from the state markov_chain_monte_carlo_40fb973cd997849a029e392c385d32d4b8c40196.png in some number of steps, and that there are no "isolated" states, hence the chain constructed by the Gibbs transition creates an aperiodic and irreducible Markov Chain, which is guaranteed to converge to its stationary distribution markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png.

In our definition of Gibbs sampling, we sample from the different dimensions sequentially, but here we have not assumed how sample the different dimensions, instead giving it some arbitrary distribution markov_chain_monte_carlo_e1358c61f52c26a3b6a1e73ebcf8f9cf7e2b2f37.png. Hence Gibbs sampling also works when not performing the sampling sequentially, but the sequential version is standard approach and usually what people refer to when talking about Gibbs sampling.

The notation markov_chain_monte_carlo_26f6ce38244ee237af92ec7a3829fa35367fbaa6.png instead of markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png is often when talking about Markov Chains. In this case we're mainly interested in sampling from a distribution markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png, therefore we instead let markov_chain_monte_carlo_b2d4bc118ef3baf606d3f2d4fdb2bf3b0af96a8d.png denote the stationary distribution of the chain.

In a Markov Random Field (MRF) we only need to marginalize over the immediate neighbours, so the conditional distribution can be written

markov_chain_monte_carlo_6016482aa9c612f580d7bfb2b825036dcfd1368c.png

Convergence bound

If markov_chain_monte_carlo_ec2784a94be7a0f45f475481c6425a721baf63e6.png is the transition matrix of the Gibbs chain, then the convergence rate of the periodic Gibbs sampler for the stationary distribution of an MRF is bounded by

markov_chain_monte_carlo_f946b85a2e067916b6c2fbd21886fbba8d9bfbc1.png

where

  • markov_chain_monte_carlo_9eae5352463d61fea9a4504c645e8a12ce1a3147.png
  • markov_chain_monte_carlo_a4cd73a16319680a1641fe8860ff9e8d59f48460.png
  • markov_chain_monte_carlo_acb0106122b7f90d9bd5639367a141a7e53d8327.png is the arbitrary initial distribution
  • markov_chain_monte_carlo_2f3f8bda7c94f623985b1a98fd41f1c166801ecf.png is the distance in variation

    markov_chain_monte_carlo_7ae5e9f9453ead893655b724cc4421af43bb8aac.png

    for distributions markov_chain_monte_carlo_216baa1dc1da1eb3f1765b89622830a7cef9cccc.png and markov_chain_monte_carlo_81c5f489718d4ff0ca6962103920c3133c0daea4.png on a finite state space markov_chain_monte_carlo_6ac37f3dbecbda6686c66a4e8881e71672d3a77a.png

Finding a starting point

Burn-In

  • Throw away some iterations at the beginning of an MCMC run

Issues

Dependent samples

The samples of a Markov Chain are in general dependent, since we're conditioning on the current state when sampling the proposal state.

Partition function estimation

Notation

  • markov_chain_monte_carlo_fd2f431714a2835d616893e155760bb45dee663a.png is a unnormalized probability distribution
  • Probability distribution

    markov_chain_monte_carlo_97cced07ba45bdfc03c866d1ac5debde30931b16.png

    with

    markov_chain_monte_carlo_3833cdd5156ec5403d0fb0035d3843c9c69cdb2e.png

Overview

Log-likelihood is given by

markov_chain_monte_carlo_4bd6d06d6f4c36bd85d89acdb4277e4cbeba1d23.png

Gradient of markov_chain_monte_carlo_ea79abafc5317bf8aa5cdab40cae77f94d9e75f4.png is then

markov_chain_monte_carlo_787beca8d7c8a03ab41edb736eb12c930462e030.png

These two terms are often referred to as:

  1. postive phase
  2. negative phase

The 2nd term in the summation can be written:

markov_chain_monte_carlo_882cf54d302fea6da8da4cd4c776388a7e3d439b.png

Interchanging integration and gradient, which we can do under under suitable conditions:

markov_chain_monte_carlo_775710e81f164e41639147fd73713263e2e0c5ed.png

Hence,

markov_chain_monte_carlo_2d59f212fa2534e2274f9d9a79cae0171f613d1b.png

Since the markov_chain_monte_carlo_79101ea9a673816b8d452bddd5f93c3e4572a505.png is constant, we observe that the second term in the above expression can be written

markov_chain_monte_carlo_dc8068462bf07cb0219b7d4a4c8ce6142be6760e.png

That is,

markov_chain_monte_carlo_041bab57f7ca671dbf66694e4452bf69aee42519.png

And, employing MCMC algorithms, we can estimate this expectation as:

markov_chain_monte_carlo_c76ad93a9f9ebdf7c774431e8831f6a6bc0fab8f.png

i.e. markov_chain_monte_carlo_0f3382b5db7c9b12a3c9bb251f8742593a01fbb7.png are markov_chain_monte_carlo_1641d18cc980f8db14cdff95d7417a8526eef446.png MCMC samples from the unnormalized markov_chain_monte_carlo_43f3f6cbf018078098445b68f9d07952f3dec238.png.

Finally, giving use a more tractable gradient estimate of markov_chain_monte_carlo_ea79abafc5317bf8aa5cdab40cae77f94d9e75f4.png:

markov_chain_monte_carlo_d800190c1aa6053eb16d440e2b79055fc7ae6cdf.png

The exact same argument holds for discrete random variables too, simply exchanging the integrand for a summation.

Problems

  • Using standard MCMC methods, we would have to perform a burnin before sampling those markov_chain_monte_carlo_1641d18cc980f8db14cdff95d7417a8526eef446.png samples in the negative phase. This can be very expensive.

Simulated Annealing (SA)

Simulated annealing is a method of moving from a tractable distribution to a distribution of interest via a sequence of intermediate distributions, used as an inexact method of handling isolated or spurrious modes in Markov chain samplers. Isolated modes is a problem for MCMC methods since we're making small changes between each step, hence moving from one isolated mode to another is "difficult".

It employes a sequence of distributions, with probability densities given by markov_chain_monte_carlo_ae2a945db764da5af89dbcf5e588c465ed1e3dd6.png to markov_chain_monte_carlo_52f907fee0ce23d6d7fb71ac585c9f02d02caf3e.png, in which each markov_chain_monte_carlo_40c0f86727a378e0e1b62c7324492ec165ddef7e.png differes only slightly from markov_chain_monte_carlo_d1b451edbddeb4393a606ef146d88077c3a9adad.png. The distribution markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png is the one of interest. The distribution markov_chain_monte_carlo_2173061c1e92f2bc20cf550c12d4edb49f8717d1.png is designed so that the Markov chain used to sample from it allows movement between all regions of the state space.

A traditional scheme is to set

markov_chain_monte_carlo_b490c1c9c0effd81b07dff3525dd751e914e69de.png

or the geometric average:

markov_chain_monte_carlo_dfdc4e0bf0478b1849e569f05fd2f53a7c1d75ea.png

An annealing run is started from some initial state, from which we first simulate a Markov chain designed to converge to markov_chain_monte_carlo_2173061c1e92f2bc20cf550c12d4edb49f8717d1.png. Next we simulate some number of iterations of a Markov chain designed to converge to markov_chain_monte_carlo_b7c22d7fc258a8d10ee603af0cb9f8e8fa7fef7d.png, starting from the final state of the simulation for markov_chain_monte_carlo_2173061c1e92f2bc20cf550c12d4edb49f8717d1.png. Continue in this fashion, using the final state of the simulation for markov_chain_monte_carlo_40c0f86727a378e0e1b62c7324492ec165ddef7e.png as the initial state of the simulation for markov_chain_monte_carlo_52620d8507a9f64d3e932d883e8f60d58c8e6757.png, until we finally simulate the chain designed to converge to markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png.

Unfortunately, there is no reason to think that annealing will give the precisely correct result, in which each mode of markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png is found with exactly the right probability.

This is of little consequence in an optimization context, where the final distribution is degenerate (at the maximum), but it is a serious flaw for the many applications in statistics and statistical physics that require a sample from a non-degenerate distribution. neal98_anneal_impor_sampl

TODO Example: SA applied to RBMs

Annealed Importance Sampling (AIS)

Annealed Importance Sampling (AIS)

Notation

  • markov_chain_monte_carlo_67af81abcb7268a7aa14872f19ab870887f98561.png is a set of Gibbs distributions defined:

    markov_chain_monte_carlo_d6e012f26a964dfdb76ef1f8c08556ebbddc5989.png

  • markov_chain_monte_carlo_e73ff840cf3114f3df5cd75b61a1463b99130ec2.png, which we call the extended state space
  • The use of square brackets, e.g. markov_chain_monte_carlo_dd31d76ec37bc6bcfa948727a9e66fd2cde09945.png, denotes a distribution over markov_chain_monte_carlo_cf586c400f6eefec402befd7aa86bcb09ded11ae.png
  • We define

    markov_chain_monte_carlo_858ec2100b81038a1d42eb40f34ffd1338668f41.png

  • markov_chain_monte_carlo_219a433cc55e08eae18332da3ed7f268a1075cdd.png (forward distribution) creates samples from markov_chain_monte_carlo_5bd37950b9a7ecff5ab6b6e9be2a33349794178d.png given a sample markov_chain_monte_carlo_c462c980a45481116745a8647094b2b1d245df0f.png from markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png:

    markov_chain_monte_carlo_ee70ec0108c2268b72d7e20040b60bc1f35ad6f6.png

  • markov_chain_monte_carlo_096daac27aac90ce72e518c1761fd2e848db4b25.png (reverse distribution) creates samples from markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png given a sample markov_chain_monte_carlo_b6abbcc84414f883a1bfc480c440e6bcbe38a4e6.png from markov_chain_monte_carlo_5bd37950b9a7ecff5ab6b6e9be2a33349794178d.png:

    markov_chain_monte_carlo_4ca050fb10d88a307c31fdd69f5e78b18b9bea3e.png

Derivation

Assume that a set of Gibbs distributions markov_chain_monte_carlo_875b6bf9631b4ad191ac4019f157f68004234935.png to construct a pair of distributions markov_chain_monte_carlo_219a433cc55e08eae18332da3ed7f268a1075cdd.png and markov_chain_monte_carlo_096daac27aac90ce72e518c1761fd2e848db4b25.png on markov_chain_monte_carlo_cf586c400f6eefec402befd7aa86bcb09ded11ae.png with

markov_chain_monte_carlo_a5b4e5eefd00ee01fd72147f2ff5bd40a3e4fcbf.png

  • markov_chain_monte_carlo_219a433cc55e08eae18332da3ed7f268a1075cdd.png (forward distribution) creates samples from markov_chain_monte_carlo_5bd37950b9a7ecff5ab6b6e9be2a33349794178d.png given a sample markov_chain_monte_carlo_c462c980a45481116745a8647094b2b1d245df0f.png from markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png
  • markov_chain_monte_carlo_096daac27aac90ce72e518c1761fd2e848db4b25.png (reverse distribution) creates samples from markov_chain_monte_carlo_4ea1081b44d33b8d561c3fa10cc008956a8a585a.png given a sample markov_chain_monte_carlo_b6abbcc84414f883a1bfc480c440e6bcbe38a4e6.png from markov_chain_monte_carlo_5bd37950b9a7ecff5ab6b6e9be2a33349794178d.png

Then

markov_chain_monte_carlo_1530ebc0dd29c277e2b9bca136b95f0fbdfc2467.png

where markov_chain_monte_carlo_c97a5909b2e6673cf2dc616533e89c982a353407.png. Then, for a function markov_chain_monte_carlo_4ba4f4bf4974b6da8e8ad0b95adc6b17433ed361.png, we have

markov_chain_monte_carlo_9a1f1f12ce1cc6e5e341bc14a7129076ec4141b3.png

thus the expectation of markov_chain_monte_carlo_93369077affb352dbdb97e8b3182fd50784f2b14.png

markov_chain_monte_carlo_e2dad2fae8df51e96fa58860653451bdebc174f6.png

Then, letting markov_chain_monte_carlo_d381d7f0cd39650edd6daefd5a8ff4d318208a32.png, i.e. constant function, we get

(Generalized) Annealed Importance Sampling (AIS) a method of estimating the partition function using the following relation:

markov_chain_monte_carlo_4d6ee83b06ae45c4e84b01788977d9b52a090f12.png

where markov_chain_monte_carlo_8cf85b34adbcd9c037d9e0e1531133943ba37910.png is the partition function of the known "proxy" distribution markov_chain_monte_carlo_219a433cc55e08eae18332da3ed7f268a1075cdd.png.

The "standard" AIS method is obtained by a specific choice of markov_chain_monte_carlo_219a433cc55e08eae18332da3ed7f268a1075cdd.png.

Bennett's Acceptance Ratio (BAR) method

Bibliography