Notes on: Fischer, A. (2015): Training Restricted Boltzmann Machines

Table of Contents

1 Notation

  • \(G = (V, E)\) denotes a graph with \(V\) the set of nodes and \(E\) the set of undirected edges
  • \(\mathcal{C}\) denotes the set of all maximal clique
  • \(\mathcal{V} \subset V\) separates two nodes \(v \notin \mathcal{V}\) and \(w \notin \mathcal{V}\) if every path from \(v\) to \(w\) contains a node from \(\mathcal{V}\)
  • \(X_v\) random variable in state space \(\Lambda_v\) associated with node \(v \in V\)
  • We assume \(\Lambda_v = \Lambda\) for all \(v \in V\), i.e. same state space for all vertices
  • \(\mathbf{X} = \big( X_v \big)_{v \in V}\) is called a Markov Random Field if \(X_v\) is conditionally independent of all other variables given its neighborhood
  • \(\boldsymbol{\theta}\) denotes the parameters of the distribution
  • \(S = \left\{ \mathbf{x}_1, \dots, \mathbf{x}_{\ell} \right\}\) denotes samples (assumed to be i.i.d.)
  • \(\mathbf{H} = (H_1, \dots, H_n)\) denotes the latent variables
  • \(m = |\mathbf{X}|\) and \(n = |\mathbf{H}|\) such that \(m + n = |V|\)
  • AIS refers to Annealed Importance Sampling
  • BAR refers to Bennett's Acceptance Ratio

2 Definitions

3 Graphical models

3.1 MCMC

  • Sampling in Ising models use the acceptance probability

    \begin{equation*} \min \Bigg(1 , \frac{\pi(\mathbf{x}')}{\pi(\mathbf{x})} \Bigg) \end{equation*}

    where \(\mathbf{x}\) denotes the current state and \(\mathbf{x}'\) denotes the proposal state

    • This is just the Metropolis-Hastings approach, and since we're working with binary observations, the proposal would just be to "flip" state

3.2 RBMs

  • Energy function

    \begin{equation*} E(\mathbf{v}, \mathbf{h}) = - \sum_{i=1}^{n} \sum_{j=1}^{m} w_{ij} h_i v_j - \sum_{j=1}^{m} b_j v_j - \sum_{i=1}^{n} c_i h_i \end{equation*}