Convolutional Neural Networks

Table of Contents

Overview

Hello my fellow two-legged creatures! Today we'll have a look at convolutional networks, and more specifically, how they really work.

I will start out with the very simplest case, and then generalize at the end.

Motivation

When I was trying to wrap my head this topic, I found some great lectures / tutorials, such as Hugo Larochelle's video series (his entire entire series on Neural Networks is amazing by the way, so do have a look!) and Andrew Gibiansky's blog post on the topic.

Now, both of these were really well-done and provided me with a lot insight, but at the time it had been a couple of months since I had been doing anything involving Neural Networks and I wasn't really familiar with the mathematical concept of "convolution". Therefore after watching / reading the above I felt as I understood what this was all about, but only on a very high-level; too high, after my taste. There we're small things that made it hard for me to really understand what was going on:

  • Hugo does this pretty cool thing were he uses the same notation as they did in the first "major" paper describing the technique (2009, Jared et. al.). He does this in basically every video, and in general I think it's great! But despite his efforts to make it clear whenever he was redefining some notation (following the paper), I felt this made it slightly harder to follow.
  • Hugo explains the intuition behind the convolution operation and states that we can view the operation we're interested in (I'll get to this) as taking the convolution between the input channel and the weight matrix with its axes flipped . That's cool and all, but I would really like to know why . By the way, for what Hugo is trying to do, I think he is absolutely correct in not digging into the convolution part. I also believe Hugo encouraged people to attempt to obtain the full expression for the backward pass in the forward-backward algorithm themselves. Again, I also believe you ought to try that first, but I figured I would provide my view on things in case you get stuck or want to confirm (I hope..) your own deduction.
  • Andrew's post did go a bit more into the details of the forward-backward algorithm for a convolutional layer, but doesn't really show why we can view this as a convolution. Also, going from the notation used in Hugo's lectures to Andrew's blog post was a bit difficult.

In the end I was left with this nagging question:

Why the "convolution" in a convolutional network?!

Notation

One thing I found quite confusing when trying to understand convolutional networks myself, was the discrepancy in notation across different sources. Granted, one of the reasons why this is the case is because in a convolutional network there is a lot of different symbols to keep track of.

Because of this I now try to impose on you another notation! Now, this might seem a bit weird after what I just said, but it is due to the fact that I want to introduce this topic in a slightly more detailed manner than the other resources I found and instead trying to merge their notation, it's easier to simply create my own.

Firstly, the following schema will always be applicable unless specified otherwise:

If we are looking at some integers from some arbitrary number up to convolutional_network_e10e2b430f95617381cdd6d6b52aed29fb971dff.png, we will use the notation convolutional_network_f07b932b12c91bca424fd428d383495413b5b6a9.png, i.e. convolutional_network_1ca929e9212a7692ef7b624237a54472d091cf95.png If multiple integeres from this set is required, we will use the subscript to separate them, i.e. convolutional_network_69b6e1174a51249ae2aedfc48bc00416bb14f368.png

More specifically, we will use the following notation:

  • convolutional_network_7c973d1542fad7cdb7dfe52d544508c09780dfd8.png is the convolutional_network_387b1cc7b59ca9e041fb256f7bff8dec9b52a466.png layer in network
  • convolutional_network_ed39d9a397196f8f0ce6388b0ea4e0c1dd8becee.png is the entire input vector or matrix to the network itself, and we use convolutional_network_47d9dd37e38249b521a5870af62f2102c0ffb5d8.png for the entire input vector or matrix to the convolutional_network_387b1cc7b59ca9e041fb256f7bff8dec9b52a466.png layer
  • convolutional_network_4016a81e52b1f12c3e1d907335d404f43d7cf519.png is the weight-vector or -matrix, with convolutional_network_692180a40544bfca0c18dd550216169998c18afa.png being the one acting on the input convolutional_network_9a249102142f93ed4eedbe6b32be3b792093331e.png
  • convolutional_network_50269850a73d83ca2182d66fd39650cde1033c2f.png is the pre-activation, i.e. the input to the activation / non-linear function convolutional_network_d422483fbb1ecb8fff6716125bd9cb99e65b128d.png
  • convolutional_network_29c26837fff0673a98d6523a80b727d38797f426.png is the entire output vector or matrix for the network itself, and we use convolutional_network_44d287afcb4765f226254fab63d216dcb5a9e93b.png for the entire activation vector or matrix to the convolutional_network_480c950b50ed5d743a85eb08e49380b5e3ad847a.png layer. That is, convolutional_network_77d4cfa07f3684bfc5b29e9fe363c2c70c1d9f3a.png

1D Convolutional Network

Notation

  • convolutional_network_3dc03a096e69bd506ebb8cc4a9d922cd814069b9.png denotes the convolutional_network_02823db7c95055e517a2beeda673308289c921e7.png entry in the weight-vector

2D Convolutional Network

Notation

  • convolutional_network_91b65ba3ce3ff0fba60ddbdc77e952e77dc5e09e.png denotes the convolutional_network_7eb3626f3d00739a65014f057c3c0b2cb0665bbf.png entry in the weight-matrix

Algorithm

Forward pass

Backward pass