Approximate Bayesian Computation (ABC)

Table of Contents


  • $\theta \in \Theta$ parameters
  • $p(y \mid \theta)$ generated samples from model with parameters $\theta$
  • $y^*$ denotes observed data
  • $\mathcal{Y}$ is the domain of the observations
  • $\rho$ is a metric on $\mathcal{Y}$
  • $B_{\varepsilon}(y^*) = \{ y : \rho(y, y^*) < \varepsilon \}$


  • Cases where computing the likelihood of the observed data $y^*$ is intractable
  • ABC uses approximation of the likelihood obtained from simulation

Rejection ABC

Let $\varepsilon > 0$ be a similarity threshold, and $\rho$ be the notion of distance, e.g. premetric on domain $\mathcal{Y}$ of observations.

The rejection ABC proceeds as follows:

  1. Sample multiple model parameters $\theta \sim \pi$.
  2. For each $\theta$, generate psuedo-dataset $y$ from $p(y \mid \theta)$
  3. For each psuedo-datset $y$, if $\rho(y, y^*) < \varepsilon$, accept the generated $y$, otherwise reject $y$.

Result: Exact sample $\{ \theta_i \}_{i=1}^M$ from approximated posterior $\tilde{p}_{\varepsilon} \big( \theta \mid y^* \big) \prodto \pi(\theta) \tilde{p}_{\varepsilon}(y^* \mid \theta)$, where

\tilde{p}_{\varepsilon} \big( y^* \mid \theta \big) = \int_{B_{\varepsilon}(y^*)} p \big( y \mid \theta \big) \ dy, \qquad B_{\varepsilon}(y^*) = \{ y : \rho(y, y^*) < \varepsilon \}

Choice of $\rho$ is crucial in the design of a n accurate ABC algorithm.

Soft ABC

One can interpret the approximate likelihood $\tilde{p}_{\varepsilon}(y^* \mid \theta)$ in rejection ABC as the convolution of the true likelihood $p(y \mid \theta)$ and the "similarity" kernel $k_{\varepsilon}$

k_{\varepsilon}(y, y^*) = \mathbb{1} \big( y \in B_{\varepsilon}(y^*) \big)

In fact, one can use any similarity kernel parametrised by $\varepsilon$ satisfying

k_{\varepsilon}(y, y^*) \to \delta_{y^*}(y) \quad \text{as} \quad \varepsilon \to 0

which gives rise to the Soft ABC methods:

Soft ABC is an extension of rejection ABC which instead weights the parameter samples from the model instead of rejecting or accepting.

An example is using the Gaussian kernel:

k_{\varepsilon}(y, y') := \exp \Bigg( - \frac{\rho^q(y, y')}{\varepsilon} \Bigg), \quad q > 0

Which results in the weighted sample

\Big\{ (\theta_j ,w_j) \Big\}_{j = 1}^M \quad \text{with} \quad w_j = \frac{k_{\varepsilon}(y_j, y^*)}{\sum_{i=1}^{M} k_{\varepsilon}(y_i, y^*)}

which can be directly utilized in estimating posterior expectations, i.e. for a test function $f$

\mathbb{\hat{E}} \big[ f(\theta) \big] = \sum_{i=1}^{M} w_j f(\theta_j)