# Notes on: Fischer, A. (2015): Training Restricted Boltzmann Machines

## 1 Notation

• $$G = (V, E)$$ denotes a graph with $$V$$ the set of nodes and $$E$$ the set of undirected edges
• $$\mathcal{C}$$ denotes the set of all maximal clique
• $$\mathcal{V} \subset V$$ separates two nodes $$v \notin \mathcal{V}$$ and $$w \notin \mathcal{V}$$ if every path from $$v$$ to $$w$$ contains a node from $$\mathcal{V}$$
• $$X_v$$ random variable in state space $$\Lambda_v$$ associated with node $$v \in V$$
• We assume $$\Lambda_v = \Lambda$$ for all $$v \in V$$, i.e. same state space for all vertices
• $$\mathbf{X} = \big( X_v \big)_{v \in V}$$ is called a Markov Random Field if $$X_v$$ is conditionally independent of all other variables given its neighborhood
• $$\boldsymbol{\theta}$$ denotes the parameters of the distribution
• $$S = \left\{ \mathbf{x}_1, \dots, \mathbf{x}_{\ell} \right\}$$ denotes samples (assumed to be i.i.d.)
• $$\mathbf{H} = (H_1, \dots, H_n)$$ denotes the latent variables
• $$m = |\mathbf{X}|$$ and $$n = |\mathbf{H}|$$ such that $$m + n = |V|$$
• AIS refers to Annealed Importance Sampling
• BAR refers to Bennett's Acceptance Ratio

## 3 Graphical models

### 3.1 MCMC

• Sampling in Ising models use the acceptance probability

\begin{equation*} \min \Bigg(1 , \frac{\pi(\mathbf{x}')}{\pi(\mathbf{x})} \Bigg) \end{equation*}

where $$\mathbf{x}$$ denotes the current state and $$\mathbf{x}'$$ denotes the proposal state

• This is just the Metropolis-Hastings approach, and since we're working with binary observations, the proposal would just be to "flip" state

### 3.2 RBMs

• Energy function

\begin{equation*} E(\mathbf{v}, \mathbf{h}) = - \sum_{i=1}^{n} \sum_{j=1}^{m} w_{ij} h_i v_j - \sum_{j=1}^{m} b_j v_j - \sum_{i=1}^{n} c_i h_i \end{equation*}