Build Your Own Model with Convolutional Neural Networks

What is a neural network

Published in

Analytics Vidhya

4 min readAug 25, 2020

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

What is a convolution neural network

A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data.CNNs are powerful image processing, artificial intelligence that uses deep learning to perform both generative and descriptive tasks, often using machine vision that includes image and video recognition, along with recommender systems and natural language processing.

Why we need convolution

Parameter sharing — feature detectors can be used in all over the image
Sparsity of connections — Each output value in only depend on small number of input values

How to do convolution

Convolution is overlay the filter into the input and get the summation.

Stride

Stride value is how much cells we are going to shift the filter to the right to get the next output value.

Padding

Padding has two main benefits,

Determine the border of the image
Use convolution without necessarily shrinking the height and width of the volumes.

Convolution over volume

Number of filters determines how many channels will be on the output.

Other than convolutional layers CNN has pooling layers and activation layers.

Pooling layer

Pooling layer is used to reduce the size of the representations and to speed up calculations, as well as to make some of the features it detects a bit more robust.

There are two main types of pooling layers.

Max pooling — get the maximum value contained in the window
Avg pooling — get the average value from the window

When you do the pooling, it doesn’t change the number of channels. It only reduces the width and the height.

Activation function layer

The purpose of the activation function is to introduce non-linearity into the output of a neuron.

Simple convolution neural network consists of 3 main components.

Forward pass
Final layer calculation
Backward pass

Forward pass is the process of calculating the output values from first layer to last layer. In the final layer, loss function is calculated using output values. Backward pass is the process of calculating the derivatives using loss function and updating the bias and weight values.

Well known architectures in CNN

Classic Network: LeNet — 5

LeNet-5 is using 32*32 gray-scale image as its input. LeNet-5 consists with two convolution layers with average pooling followed by 2 fully connected layers. Finally a softmax layer to determine the output.

Classic Network: AlexNet

AlexNwt is using 227*227 RGB images as its input. Single RGB images consist of 3 channels. It has 5 convolution layers and 3 max pooling layers. Then it is followed by 3 fully connected layers. Finally a softmax layer to determine the output.

Classic Network: YOLO

YOLO stands for you only look once. Input image size of YOLO is 448*448 RGB images. YOLO architecture has many versions. YOLO, YOLOv2, tiny-YOLO are few of them. Neural network size depends on the version that you are going to use.

External Ref —

Student Notes: Convolutional Neural Networks (CNN) Introduction

These notes are taken from the first two weeks of Convolutional Neural Networks course (part of Deep Learning…

indoml.com

pjreddie/darknet

You only look once (YOLO) is a system for detecting objects on the Pascal VOC 2012 dataset. It can detect the 20 Pascal…

github.com

AlexNet

AlexNet is the name of a convolutional neural network (CNN), designed by Alex Krizhevsky, and published with Ilya…

en.wikipedia.org

LeNet-5 - A Classic CNN Architecture - engMRK

Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner proposed a neural network architecture for handwritten and…

engmrk.com