Build Your Own Model with Convolutional Neural Networks

What is a neural network

Ayesha Jayasankha
Analytics Vidhya
Published in
4 min readAug 25, 2020

--

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

What is a convolution neural network

A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data.CNNs are powerful image processing, artificial intelligence that uses deep learning to perform both generative and descriptive tasks, often using machine vision that includes image and video recognition, along with recommender systems and natural language processing.

Why we need convolution

  1. Parameter sharing — feature detectors can be used in all over the image
  2. Sparsity of connections — Each output value in only depend on small number of input values

How to do convolution

fig 1 — Convolution operation

Convolution is overlay the filter into the input and get the summation.

Stride

fig 2 — stride

Stride value is how much cells we are going to shift the filter to the right to get the next output value.

Padding

fig 3 — padding

Padding has two main benefits,

  1. Determine the border of the image
  2. Use convolution without necessarily shrinking the height and width of the volumes.

Convolution over volume

fig 4 — convolution over volume

Number of filters determines how many channels will be on the output.

Other than convolutional layers CNN has pooling layers and activation layers.

Pooling layer

Pooling layer is used to reduce the size of the representations and to speed up calculations, as well as to make some of the features it detects a bit more robust.

There are two main types of pooling layers.

  1. Max pooling — get the maximum value contained in the window
  2. Avg pooling — get the average value from the window
fig 5 — pooling layers

When you do the pooling, it doesn’t change the number of channels. It only reduces the width and the height.

Activation function layer

The purpose of the activation function is to introduce non-linearity into the output of a neuron.

fig 6 — activation functions

Simple convolution neural network consists of 3 main components.

  1. Forward pass
  2. Final layer calculation
  3. Backward pass

Forward pass is the process of calculating the output values from first layer to last layer. In the final layer, loss function is calculated using output values. Backward pass is the process of calculating the derivatives using loss function and updating the bias and weight values.

fig 7 — forward and backward pass
fig 8 — sample network

Well known architectures in CNN

Classic Network: LeNet — 5

LeNet-5 is using 32*32 gray-scale image as its input. LeNet-5 consists with two convolution layers with average pooling followed by 2 fully connected layers. Finally a softmax layer to determine the output.

fig 9 — LeNet-5

Classic Network: AlexNet

AlexNwt is using 227*227 RGB images as its input. Single RGB images consist of 3 channels. It has 5 convolution layers and 3 max pooling layers. Then it is followed by 3 fully connected layers. Finally a softmax layer to determine the output.

fig 10 — AlexNet

Classic Network: YOLO

YOLO stands for you only look once. Input image size of YOLO is 448*448 RGB images. YOLO architecture has many versions. YOLO, YOLOv2, tiny-YOLO are few of them. Neural network size depends on the version that you are going to use.

fig 11 — YOLO

External Ref —

--

--