Simple CNN using NumPy: Part I (Introduction & Data Processing)
Introduction
Convolutional Neural Networks (CNNs) are a class of neural networks that work well with grid-like data, such as images. They extract useful features from images to make the image recognition process more robust. These networks are inspired by the results of experiments conducted by David Hunter Hubel & Torsten Nils Wiesel who observed different neural activity in the cat’s brain in response to different orientations of a straight line.
First Convolutional Networks
The first convolutional neural network was the Neocognitron, implemented by Dr Kunihiko Fukushima in 1980. This system used a hierarchical structure to learn simple & complex features of an image. An unsupervised learning procedure was used to recognize handwritten characters.
These ideas were improved upon by Dr Yann LeCun and his team in the 1990s, to include back propagation to recognize handwritten postal codes (Le-Net5). These types of implementations have led to drastic improvements in image recognition tasks.
CNNs have the ability to detect useful features (edges, horizontal lines, vertical lines, curvature, etc) from an image, which is akin to different neurons firing in the brain for different orientations of an image. This feature is made possible by the convolutional layer.
The experiments by David Hunter Hubel & Torsten Nils Wiesel can be found in the following Youtube Video (Credits: Ali Moeeny)
In these series of articles, I will try to implement a rudimentary Convolutional Neural Network using NumPy.
Input Data
The input used here will be Kannada Digits sourced from the Kannada Digits MNIST data repository in Kaggle. Kannada is a Dravidian language, which is spoken by over 45 million people. The Kannada Digits are as follows;
Input Data Processing
The input is sourced from a CSV file, that contains the flattened version of the images. The data preprocessing involves converting each of these entries to 28X28 arrays of pixel values.
The code below creates the training dataset
import pandas as pd
import numpy as np
np.random.seed(42)
## import data setdata = pd.read_csv('../input/Kannada-MNIST/train.csv')
data['row_number'] = range(0,data.shape[0])
## Shuffling the data
data = data.sample(frac=1,random_state=42)
tmp = pd.DataFrame()
## Getting a balanced dataset with 600 entries per classfor label in range(10):
if label==0:
tmp = data[data['label']==label].head(600)
else:
temp = data[data['label']==label].head(600)
tmp = pd.concat([tmp,temp])
data_train = tmp
row_numbers_in_train_set = tmp['row_number'].values
test_set = data.loc[~data['row_number'].isin(row_numbers_in_train_set)]
## Create one hot encoding
one_hot = pd.get_dummies(data_train['label'].unique())
one_hot['label'] = one_hot.index
data_train = pd.merge(data_train,one_hot)
data_test = test_set.sample(frac=1)
tmp = pd.DataFrame()
## Getting a balanced test set with 120 entries per classfor label in range(10):
if label==0:
tmp = data_test[data_test['label']==label].head(120)
else:
temp = data_test[data_test['label']==label].head(120)
tmp = pd.concat([tmp,temp])
data_test = tmp
data_test = pd.merge(data_test,one_hot)
data_train.drop('label',axis=1,inplace=True)
data_test.drop('label',axis=1,inplace=True)
## Create the train and test set and normalize the inputs
X_train = np.array(data_train.drop([0,1,2,3,4,5,6,7,8,9,'row_number'],axis=1).values)/255
y_train = np.array(data_train[[0,1,2,3,4,5,6,7,8,9]].values)
X_test = np.array(data_test.drop([0,1,2,3,4,5,6,7,8,9,'row_number'],axis=1).values)/255
y_test = np.array(data_test[[0,1,2,3,4,5,6,7,8,9]].values)
The flattened data is imported to create a training data set of 6000 entries and a test dataset of 1000 entries. These pixel entries range from 0 to 255 (greyscale). These entries are normalized by dividing by the max value (255).
The output classes are changed to one-hot encoding representations.
The following re-shapes the input vectors to 28X28 NumPy arrays
X_train = X_train.T
y_train = y_train.T
X_test = X_test.T
y_test = y_test.T
X_train_reshape = np.zeros((X_train.shape[1],1,28,28))
for i in range(X_train.shape[1]):
temp = X_train[:,i]
temp = np.ravel(temp)
temp = temp.reshape(28,28)
X_train_reshape[i,0,:,:] = temp
X_train= X_train_reshape
X_test_reshape = np.zeros((X_test.shape[1],1,28,28))
from matplotlib import pyplot as plt
for i in range(X_test.shape[1]):
temp = X_test[:,i]
temp = np.ravel(temp)
temp = temp.reshape(28,28)
X_test_reshape[i,0,:,:] = temp
X_test= X_test_reshape
Some of the re-shaped arrays are as follows
After the processing, the input train data set now has the dimensions (6000,1,28,28) and the test data has (1000,1,28,28).
CNN Architecture
After data processing, the images, that are in the form of NumPy arrays, are passed through a series of layers as follows.
Let’s assume that we are passing a single image of dimension (1,1,28,28). The structure of the neural network will then be , as follows
- Input Layer (1,1,28,28)
- Convolutional Filters (2,1,5,5)
- Max Pool Layer (2x2)
- Fully Connected Layer (1,288)
- Second Fully Connected Layer (1,60)
- Output Layer (1,10)
The above diagram shows the rough “blueprint” of the network. I will explain convolutional filters and the convolutional operation in the next post.
Thanks for reading! Please feel free to e-mail me at padhokshaja@gmail.com in case of feedbacks/queries. I will do my best to get back to them.