Before we begin, let’s prime your brain for learning by taking a look at this image of a Convolutional Neural Network:
Now, what the heck is a Convolution Neural Network
A deep learning algorithm typically used to analyze images.
But what’s deep learning?
It’s a subset of machine learning that uses multiple layers to extract higher-level features from raw input. For images, this might mean identifying edges in initial layers and things more relevant to us humans in later layers like faces or numbers.
Back to Convolutional Neural Networks (or CNNs for short)
One of the principal concepts behind CNNs is the neural network, an attempt at mimicking the way neurons in our brain work together and allow our senses to make sense of the world.
CNNs try to understand images the way our visual cortex does. Each neuron responds to stimuli in limited but overlapping fields of vision over the entirety of an image.
Convolutions
When dealing with CNNs, the “neurons” are filters (or kernels).
A filter is a matrix of values (or, more precisely, weights) trained to detect particular features.
An image is nothing more than a matrix of pixel values. A filter essentially tells you how confident it is that a feature is present in an image. It does this by carrying out a convolution operation, an element-wise product and sum between two matrices.
The higher the resulting value, the more confident it is that the feature is present.
With CNNs, a filter runs across an image to detect whether that feature is present.
Padding and Stride Length
One of the things you’ll notice in the example above, is that the edges of the image don’t get as much love as the inner parts of the image.
An easy fix to this padding.
What padding does is add pixels to an image to extend the area which a CNN processes an image (each added pixel is given a value of zero).
No padding:
Padding:
You can also adjust the stride length (the amount of movement or pixels a filter moves over an image).
No strides:
Strides:
Pooling:
Pooling reduces the dimensions of feature maps to reduce the number of parameters needed to learn.
Pooling summarizes the features present in a part of the feature map created by the convolutional layer.
There are two types of pooling: max pooling and average pooling.
Max pooling takes the maximum value within a filter and outputs it into a new filter.
Average pooling averages the values within a filter and inputs the mean into a new filter.
The Flattening
After our convolutional and pooling layers, we flatten our data to convert it into a 1-dimensional array.
We do this to create a single long feature vector.
And then come the Fully Connected Layers
The fully-connected layers are the last step before final classification.
Fully connected layers are just feedforward neural networks.
They perform the task of classification based on the information learned in the previous layers.
At the end of it all we have our final classification
After running through the fully connected layers, our CNN heads towards a softmax (for multi-class classification) or sigmoid (for binary class classification) activation function, which produces a probability between 0 and 1 and gives us our class.
Remember that image from the beginning? Let’s look at it again. It should make more sense to you now:
Note: If this took you more than 6 minutes to read, blame Read-o-meter. It told me it’d take you less than three and a half minutes to read this! I added some extra time so you could look at the pretty pictures.