RNN: Recurrent Neural Network

Table of Contents

Notation

  • recurrent_neural_networks_eb5d809ed7c492fae7d4927a6fc9a5e22f9b3831.png - time
  • recurrent_neural_networks_ce8deeba8921a13666b0564f15c0771f99fece1c.png - feature-vector at step recurrent_neural_networks_eb5d809ed7c492fae7d4927a6fc9a5e22f9b3831.png
  • recurrent_neural_networks_7a0937e3ac5949d872971d753dc61c9cd5f07a50.png - hidden state at time recurrent_neural_networks_eb5d809ed7c492fae7d4927a6fc9a5e22f9b3831.png

Recurrent Neural Networks (RNN)

Notation

  • recurrent_neural_networks_ad11b4c09a71b6f285fead0cd117d36da39226c3.png - weight matrix for hidden-to-hidden
  • recurrent_neural_networks_7e885fc85782251a468b02901032437ad02b27ac.png - weight matrix for input-to-hidden
  • recurrent_neural_networks_881100efec176c24ea7d6326d5bea949e726a112.png - weight matrix for hidden-to-output

Forward

The major points are:

  • Create a time-dependency by encoding the input and some previous state into the new state

recurrent_neural_networks_a75168601400c4340f8e0c004d19244c01d03cb9.png

We can of course add any activation function at the end here, e.g. sigmoid, if one would lie such a thing.

Backward

Whenever you hear backpropagation through time (BPTT), don't give it too much thought. It's simply backprop but summing gradient the contributions for each of the previous steps included.

Long-Short Term Memory networks (LSTM)

Gated Recurrent Unit (GRU)

Resources