# Notes on: Fischer, A. (2015): Training Restricted Boltzmann Machines

## Table of Contents

## 1 Notation

- \(G = (V, E)\) denotes a
*graph*with \(V\) the set of nodes and \(E\) the set of undirected edges - \(\mathcal{C}\) denotes the set of all maximal clique
- \(\mathcal{V} \subset V\)
*separates*two nodes \(v \notin \mathcal{V}\) and \(w \notin \mathcal{V}\) if every path from \(v\) to \(w\) contains a node from \(\mathcal{V}\) - \(X_v\) random variable in state space \(\Lambda_v\) associated with node \(v \in V\)
- We assume \(\Lambda_v = \Lambda\) for all \(v \in V\), i.e. same state space for all vertices
- \(\mathbf{X} = \big( X_v \big)_{v \in V}\) is called a Markov Random Field if \(X_v\) is
*conditionally independent of all other variables given its neighborhood* - \(\boldsymbol{\theta}\) denotes the parameters of the distribution
- \(S = \left\{ \mathbf{x}_1, \dots, \mathbf{x}_{\ell} \right\}\) denotes
*samples*(assumed to be i.i.d.) - \(\mathbf{H} = (H_1, \dots, H_n)\) denotes the
*latent variables* - \(m = |\mathbf{X}|\) and \(n = |\mathbf{H}|\) such that \(m + n = |V|\)
**AIS**refers to*Annealed Importance Sampling***BAR**refers to*Bennett's Acceptance Ratio*

## 2 Definitions

## 3 Graphical models

### 3.1 MCMC

Sampling in Ising models use the

\begin{equation*} \min \Bigg(1 , \frac{\pi(\mathbf{x}')}{\pi(\mathbf{x})} \Bigg) \end{equation*}**acceptance probability**where \(\mathbf{x}\) denotes the

*current state*and \(\mathbf{x}'\) denotes the*proposal state*- This is just the Metropolis-Hastings approach, and since we're working with binary observations, the proposal would just be to "flip" state

### 3.2 RBMs

Energy function

\begin{equation*} E(\mathbf{v}, \mathbf{h}) = - \sum_{i=1}^{n} \sum_{j=1}^{m} w_{ij} h_i v_j - \sum_{j=1}^{m} b_j v_j - \sum_{i=1}^{n} c_i h_i \end{equation*}