RNN: Recurrent Neural Network

Table of Contents

Notation

  • $t$ - time
  • $x_t$ - feature-vector at step $t$
  • $h_t$ - hidden state at time $t$

Recurrent Neural Networks (RNN)

Notation

  • $W_{hh}$ - weight matrix for hidden-to-hidden
  • $W_{hx}$ - weight matrix for input-to-hidden
  • $W_{hy}$ - weight matrix for hidden-to-output

Forward

The major points are:

  • Create a time-dependency by encoding the input and some previous state into the new state
\begin{equation*}
\begin{split}
h_t &= \tanh \Big( W_{hh} \cdot h_{t-1} + W_{hx} \cdot x_t \Big)	 \\
y_t &= W_{hy} \cdot h_t
\end{split}
\end{equation*}
1

We can of course add any activation function at the end here, e.g. sigmoid, if one would lie such a thing.

Backward

Whenever you hear backpropagation through time (BPTT), don't give it too much thought. It's simply backprop but summing gradient the contributions for each of the previous steps included.

Long-Short Term Memory networks (LSTM)

Gated Recurrent Unit (GRU)

Resources