The 6 minutes or less convolutional neural network intro

Before we begin, let’s prime your brain for learning by taking a look at this image of a Convolutional Neural Network:

Image courtesy of Sumit Saha

Now, what the heck is a Convolution Neural Network

A deep learning algorithm typically used to analyze images.

But what’s deep learning?

It’s a subset of machine learning that uses multiple layers to extract higher-level features from raw input. For images, this might mean identifying edges in initial layers and things more relevant to us humans in later layers like faces or numbers.

Back to Convolutional Neural Networks (or CNNs for short)

One of the principal concepts behind CNNs is the neural network, an attempt at mimicking the way neurons in our brain work together and allow our senses to make sense of the world. 

CNNs try to understand images the way our visual cortex does. Each neuron responds to stimuli in limited but overlapping fields of vision over the entirety of an image.

Convolutions

When dealing with CNNs, the “neurons” are filters (or kernels).

A filter is a matrix of values (or, more precisely, weights) trained to detect particular features.

An image is nothing more than a matrix of pixel values. A filter essentially tells you how confident it is that a feature is present in an image. It does this by carrying out a convolution operation, an element-wise product and sum between two matrices.

The higher the resulting value, the more confident it is that the feature is present.

With CNNs, a filter runs across an image to detect whether that feature is present.

Image courtesy of Rubikscode

Padding and Stride Length

One of the things you’ll notice in the example above, is that the edges of the image don’t get as much love as the inner parts of the image.

An easy fix to this padding.

What padding does is add pixels to an image to extend the area which a CNN processes an image (each added pixel is given a value of zero).

Image courtesy of​​ Ayeshmantha Perera

No padding:

Image courtesy of  Vincent Dumoulin

Padding:

Image courtesy of Vincent Dumoulin

You can also adjust the stride length (the amount of movement or pixels a filter moves over an image).

No strides:

Image courtesy of Vincent Dumoulin

Strides:

Image courtesy of Vincent Dumoulin

Pooling:

Pooling reduces the dimensions of feature maps to reduce the number of parameters needed to learn.

Pooling summarizes the features present in a part of the feature map created by the convolutional layer.

There are two types of pooling: max pooling and average pooling.

Max pooling takes the maximum value within a filter and outputs it into a new filter.

Average pooling averages the values within a filter and inputs the mean into a new filter.

Source

The Flattening

After our convolutional and pooling layers, we flatten our data to convert it into a 1-dimensional array.

We do this to create a single long feature vector.

And then come the Fully Connected Layers

The fully-connected layers are the last step before final classification.

Fully connected layers are just feedforward neural networks.

They perform the task of classification based on the information learned in the previous layers.

Image courtesy of Jiwon Jeong

At the end of it all we have our final classification

After running through the fully connected layers, our CNN heads towards a softmax (for multi-class classification) or sigmoid (for binary class classification) activation function, which produces a probability between 0 and 1 and gives us our class.

Remember that image from the beginning? Let’s look at it again. It should make more sense to you now:

Image courtesy of Sumit Saha

Note: If this took you more than 6 minutes to read, blame Read-o-meter. It told me it’d take you less than three and a half minutes to read this! I added some extra time so you could look at the pretty pictures.