RNN: Recurrent Neural Network

Notation
Recurrent Neural Networks (RNN)
Long-Short Term Memory networks (LSTM)
Gated Recurrent Unit (GRU)
Resources

Notation

$t$ - time
$x_t$ - feature-vector at step $t$
$h_t$ - hidden state at time $t$

Recurrent Neural Networks (RNN)

Notation

$W_{hh}$ - weight matrix for hidden-to-hidden
$W_{hx}$ - weight matrix for input-to-hidden
$W_{hy}$ - weight matrix for hidden-to-output

Forward

The major points are:

Create a time-dependency by encoding the input and some previous state into the new state

$\begin{equation*} \begin{split} h_t &= \tanh \Big( W_{hh} \cdot h_{t-1} + W_{hx} \cdot x_t \Big) \\ y_t &= W_{hy} \cdot h_t \end{split} \end{equation*}$ 1

We can of course add any activation function at the end here, e.g. sigmoid, if one would lie such a thing.

Backward

Whenever you hear backpropagation through time (BPTT), don't give it too much thought. It's simply backprop but summing gradient the contributions for each of the previous steps included.

RNN: Recurrent Neural Network

Table of Contents

Notation

Recurrent Neural Networks (RNN)

Notation

Forward

Backward

Long-Short Term Memory networks (LSTM)

Gated Recurrent Unit (GRU)

Resources