Convolutional Neural Networks (CNNs)

Types of image recognition tasks:

  • Classification: Is this image a cat or a dog?
  • Object detection: Label the things in the image
  • Semantic segmentation: Which pixels are cats or dogs?

Images have structure: a white pixel will probably have whitish pixels around it

  • Old neural networks instead treated images as 1D arrays and ignored structure

Channel: Three channels: red, green, and blue

Hidden layers

At each step, there are three hidden layers working in parallel (one for each color channel)

Each hidden layer can find different features (patterns/shapes) in the image (e.g., one might find lines)

The hidden layers are convolutional layers

Convolutional layers

Every node in the hidden layer will have a local receptive field (a small subset of neurons that go into activating it)

  • It will look at only a small square from the input (say, 4x4)
  • Its neighbor’s square can overlap with it but it won’t be the exact same square (shifted over by the stride)
    • Stride: How much you shift the window/kernel each time
  • To get a single number out of that small square:
    • Matrix multiply that small square with a kernel (which would be a 4x4 matrix)
    • Fardina calls this “dot product”
    • Sum up all the elements of that matrix
  • Then the outputs for all the channels are summed together

Every node in the entire layer uses the same weights (same kernel)

Kernel

  • The weights of the kernel are learned parameters
  • The size of the kernel is a hyperparameter
  • The kernel will be as deep as your channels

Activation layer

After convolutional layer comes an activation function

For images, use ReLU rather than sigmoid as the activation function

  • Introduces non-linearity to the network (necessary for learning complex patterns)
  • ReLU is a simple linear function itself

Pooling layers

Between the convolutional layers are pooling layers

Instead of using weights, we aggregate the pixels we’re responsible for

Max pooling layer: Just finds the max value in each small grid

Final layer

Fully connected layer and an output layer

  • Flatten the grid
  • Do classification

Training Process

  • Show the network N pictures, where N is your batch size
    • Slides say that higher batch size will underfit, lower batch size will overfit
    • Yes that’s correct, not the other way around
  • Use the backpropagation algorithm, but randomly pick some number of weights to not update (set to 0), based on your dropout rate
    • A higher dropout rate will prevent overfitting and make training go faster
    • Too high and it underfits
  • Repeat until network loss stops decreasing

Transfer Learning

Can use transfer learning

One common approach is to adjust the final output layer to match number of classes in new task

  • No need to change the convolutional layers