Simple CNN using NumPy: Part I (Introduction & Data Processing)

Pradeep Adhokshaja
Analytics Vidhya
5 min readJun 20, 2021

--

Introduction

Convolutional Neural Networks (CNNs) are a class of neural networks that work well with grid-like data, such as images. They extract useful features from images to make the image recognition process more robust. These networks are inspired by the results of experiments conducted by David Hunter Hubel & Torsten Nils Wiesel who observed different neural activity in the cat’s brain in response to different orientations of a straight line.

First Convolutional Networks

The first convolutional neural network was the Neocognitron, implemented by Dr Kunihiko Fukushima in 1980. This system used a hierarchical structure to learn simple & complex features of an image. An unsupervised learning procedure was used to recognize handwritten characters.

These ideas were improved upon by Dr Yann LeCun and his team in the 1990s, to include back propagation to recognize handwritten postal codes (Le-Net5). These types of implementations have led to drastic improvements in image recognition tasks.

CNNs have the ability to detect useful features (edges, horizontal lines, vertical lines, curvature, etc) from an image, which is akin to different neurons firing in the brain for different orientations of an image. This feature is made possible by the convolutional layer.

The experiments by David Hunter Hubel & Torsten Nils Wiesel can be found in the following Youtube Video (Credits: Ali Moeeny)

In these series of articles, I will try to implement a rudimentary Convolutional Neural Network using NumPy.

Input Data

The input used here will be Kannada Digits sourced from the Kannada Digits MNIST data repository in Kaggle. Kannada is a Dravidian language, which is spoken by over 45 million people. The Kannada Digits are as follows;

Kannada Digits from 0 to 9

Input Data Processing

The input is sourced from a CSV file, that contains the flattened version of the images. The data preprocessing involves converting each of these entries to 28X28 arrays of pixel values.

Each entry(row) is converted to a 28 X 28 array

The code below creates the training dataset

import pandas as pd
import numpy as np


np.random.seed(42)
## import data set
data = pd.read_csv('../input/Kannada-MNIST/train.csv')
data['row_number'] = range(0,data.shape[0])
## Shuffling the data
data = data.sample(frac=1,random_state=42)
tmp = pd.DataFrame()
## Getting a balanced dataset with 600 entries per class
for label in range(10):
if label==0:
tmp = data[data['label']==label].head(600)
else:
temp = data[data['label']==label].head(600)
tmp = pd.concat([tmp,temp])
data_train = tmp
row_numbers_in_train_set = tmp['row_number'].values
test_set = data.loc[~data['row_number'].isin(row_numbers_in_train_set)]
## Create one hot encoding
one_hot = pd.get_dummies(data_train['label'].unique())
one_hot['label'] = one_hot.index

data_train = pd.merge(data_train,one_hot)
data_test = test_set.sample(frac=1)
tmp = pd.DataFrame()
## Getting a balanced test set with 120 entries per class
for label in range(10):
if label==0:
tmp = data_test[data_test['label']==label].head(120)
else:
temp = data_test[data_test['label']==label].head(120)
tmp = pd.concat([tmp,temp])
data_test = tmp
data_test = pd.merge(data_test,one_hot)
data_train.drop('label',axis=1,inplace=True)

data_test.drop('label',axis=1,inplace=True)

## Create the train and test set and normalize the inputs
X_train = np.array(data_train.drop([0,1,2,3,4,5,6,7,8,9,'row_number'],axis=1).values)/255
y_train = np.array(data_train[[0,1,2,3,4,5,6,7,8,9]].values)
X_test = np.array(data_test.drop([0,1,2,3,4,5,6,7,8,9,'row_number'],axis=1).values)/255
y_test = np.array(data_test[[0,1,2,3,4,5,6,7,8,9]].values)

The flattened data is imported to create a training data set of 6000 entries and a test dataset of 1000 entries. These pixel entries range from 0 to 255 (greyscale). These entries are normalized by dividing by the max value (255).

The output classes are changed to one-hot encoding representations.

The following re-shapes the input vectors to 28X28 NumPy arrays

X_train = X_train.T
y_train = y_train.T

X_test = X_test.T
y_test = y_test.T
X_train_reshape = np.zeros((X_train.shape[1],1,28,28))

for i in range(X_train.shape[1]):
temp = X_train[:,i]
temp = np.ravel(temp)
temp = temp.reshape(28,28)
X_train_reshape[i,0,:,:] = temp

X_train= X_train_reshape

X_test_reshape = np.zeros((X_test.shape[1],1,28,28))

from matplotlib import pyplot as plt

for i in range(X_test.shape[1]):
temp = X_test[:,i]
temp = np.ravel(temp)
temp = temp.reshape(28,28)
X_test_reshape[i,0,:,:] = temp

X_test= X_test_reshape

Some of the re-shaped arrays are as follows

0
1
8
9

After the processing, the input train data set now has the dimensions (6000,1,28,28) and the test data has (1000,1,28,28).

Dimensions & what they mean

CNN Architecture

After data processing, the images, that are in the form of NumPy arrays, are passed through a series of layers as follows.

Let’s assume that we are passing a single image of dimension (1,1,28,28). The structure of the neural network will then be , as follows

  • Input Layer (1,1,28,28)
  • Convolutional Filters (2,1,5,5)
  • Max Pool Layer (2x2)
  • Fully Connected Layer (1,288)
  • Second Fully Connected Layer (1,60)
  • Output Layer (1,10)
The dimensions in purple are the resultant dimensions of the image, after each corresponding processing stage

The above diagram shows the rough “blueprint” of the network. I will explain convolutional filters and the convolutional operation in the next post.

Thanks for reading! Please feel free to e-mail me at padhokshaja@gmail.com in case of feedbacks/queries. I will do my best to get back to them.

Resources

Next Post

Convolution Operation

--

--

Pradeep Adhokshaja
Analytics Vidhya

Data Scientist @Philips. Passionate about ML,Statistics & hiking. If you like to buy me a coffee, you can use this link https://ko-fi.com/pradeepadhokshaja