Building A Convolutional Neural Network in Python; Predict Digits from Gray-Scale Images of Hand-Drawn Digits from 0 Through 9

11 min readDec 30, 2021

“A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data.”

Motivation

In order to predict Digits from 0 through 9, I chose a data set which was based on the famous Kaggle’s MNIST dataset. The dataset contains gray-scale images of hand-drawn digits, from zero through nine. At this article I will predict digits based on pixels data (i.e., numerical data) and convolutional neural network.

1. Convolutional Neural Network

A convolutional neural network, also known as a CNN or ConvNet, is an artificial neural network that has so far been most popularly used for analyzing images for computer vision tasks.

Although image analysis has been the widest spread use of CNNS, they can also be used for other data analysis or classification as well. Let’s get started!

Most generally, we can think of a CNN as an artificial neural network that has some type of specialization for being able to pick out or detect patterns. This pattern detection is what makes CNNs so useful for image analysis.

If a CNN is just an artificial neural network, though, then what differentiates it from a standard multilayer perceptron or MLP?

CNNs have hidden layers called convolutional layers, and these layers are what make a CNN, well… a CNN!

CNNs have layers called convolutional layers.

CNNs can, and usually do, have other, non-convolutional layers as well, but the basis of a CNN is the convolutional layers.

Alright, so what do these convolutional layers do?

Just like any other layer, a convolutional layer receives input, transforms the input in some way, and then outputs the transformed input to the next layer. The inputs to convolutional layers are called input channels, and the outputs are called output channels.

With a convolutional layer, the transformation that occurs is called a convolution operation. This is the term that’s used by the deep learning community anyway. Mathematically, the convolution operations performed by convolutional layers are actually called cross-correlations.

As mentioned earlier, convolutional neural networks are able to detect patterns in images.

Let’s expand on precisely what we mean When we say that the filters are able to detect patterns. Think about how much may be going on in any single image. Multiple edges, shapes, textures, objects, etc. These are what we mean by patterns.

edges
shapes
textures
curves
objects
colors

One type of pattern that a filter can detect in an image is edges, so this filter would be called an edge detector.

Aside from edges, some filters may detect corners. Some may detect circles. Others, squares. Now these simple, and kind of geometric, filters are what we’d see at the start of a convolutional neural network.

The deeper the network goes, the more sophisticated the filters become. In later layers, rather than edges and simple shapes, our filters may be able to detect specific objects like eyes, ears, hair or fur, feathers, scales, and beaks.

In even deeper layers, the filters are able to detect even more sophisticated objects like full dogs, cats, lizards, and birds.

1. Data Understanding

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import itertools
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
import osdf = pd.read_csv("MNIST_ROI.csv")

Exploratory Analysis

df.shape

(59999, 785)

The dataset includes 59,999 records and 785 fields. Each record represents a gray-scale image of hand-drawn digits, between 0 to 9.

The first column, called “Result”, is the digit that was drawn by the user.

The rest of the columns contain the pixel-values of the associated image. Each gray-scale image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total.

df.head()

Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning lighter. This pixel-value is an integer between 0 (for black) and 255 (for white), inclusive.

df.tail()

df.info()

df.describe()

Data Analysis

Let’s check how many images of each digit do we have in the dataset

dig = [0,1,2,3,4,5,6,7,8,9]
num = []
for i in range(0,10):
    num.append(len(df[df['Result']==i]))
    
d = {'Digit': dig, 'Count': num}
df1 = pd.DataFrame(data=d)
df1

import matplotlib.pyplot as pltimport seaborn as snssns.barplot(x = “Count”, y = “Digit”, data = df2, orient=’h’)plt.show()

Let’s see in which rows in the dataset do we have images of the digit “3”

df[df[‘Result’]==3].head()

Let’s print the image from row number 6

pic = df[6:7].values.reshape(785)[1:].reshape(28,28)plt.imshow(pic,cmap='gray')

Let’s see in which rows in the dataset do we have images of the digit “5”

df[df[‘Result’]==5].head()

Let’s print the image from row number 10

pic = df[10:11].values.reshape(785)[1:].reshape(28,28)plt.imshow(pic,cmap=’gray’)

2. Data Preparation

X = df.drop(['Result'],axis=1)X.head()

y = df.Resulty.head()

import sklearn.model_selection as skmodelX_train, X_test, y_train, y_test = skmodel.train_test_split(X, y, test_size=0.33, random_state=42)print("length of all data is ","{:,}".format(len(X)))
print("length of training set is","{:,}".format(len(X_train)))
print("length of test set is","{:,}".format(len(X_test)))

X_train.head()

y_train.head()

Let’s cast our training set and test set from pandas.core.frame.DataFrame to numpy.ndarray

x_train = np.array(X_train)
y_train = np.array(y_train)
x_test = np.array(X_test)
y_test = np.array(y_test)len(X_train)

40199

Let’s draw a number between 0 to 40199

i = random.randint(0,(len(X_train)))
i

34944

Now, let’s print the result of the image from row number 34944 in the training set

print(y_train[i])

Let’s print the image from row number 34944 in the training set

pic = X_train.iloc[i].values.reshape(28,28)plt.imshow(pic, cmap=’Greys’)

x_train.shape

(40199, 784)

Let’s reshape the array to 4-dimnsions so that it can work with the Keras API

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)input_shape = (28, 28, 1)

Let's make sure that the values are float so that we can get decimal points after division

x_train = x_train.astype('float32')x_test = x_test.astype('float32')

Now, let's normalize the RGB codes by dividing it to the max RGB value

x_train /= 255
x_test /= 255print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])

3. Modeling

Let’s build a CNN using a Sequential model and adding the layers:

model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))

Let’s compile our CNN

model.compile(optimizer=’adam’, 
 loss=’sparse_categorical_crossentropy’, 
 metrics=[‘accuracy’])

Now, let’s train our CNN

model.fit(x=x_train,y=y_train, epochs=10)

Accuracy on training set: 99.37%

model.evaluate(x_test, y_test)

Accuracy on test set: 98.22%

The accuracy on the training set is 99.37%, while the test set accuracy is 98.22%. This is an indicative that the convolutional neural network (CNN) is generalizing well to new data and not overfitting.

4. Evaluation

len(X_test)

19800

Let’s draw a number between 0 to 19800

j = random.randint(0,(len(X_test)))
j

11092

Now, let’s make a prediction for the the result of the image from row number 11092 in the test set

pred = model.predict(x_test[j].reshape(1, 28, 28, 1))print(pred.argmax())

Let’s print the image from row number 11092 in the test set

pic1 = X_test.iloc[j].values.reshape(28,28)plt.imshow(pic1, cmap='Greys')

y_pred = model.predict(x_test)y_pred = np.argmax(y_pred,axis=1)y_pred.shape

(19800, )

Confusion Matrix

import sklearn.metrics as skmetcm = skmet.confusion_matrix(y_true=y_test, y_pred=y_pred)def plot_confusion_matrix(cm, classes,
 normalize=False,
 title=’Confusion matrix’,
 cmap=plt.cm.Blues):
 “””
 This function prints and plots the confusion matrix.
 Normalization can be applied by setting `normalize=True`.
 “””
 plt.imshow(cm, interpolation=’nearest’, cmap=cmap)
 plt.title(title)
 plt.colorbar()
 tick_marks = np.arange(len(classes))
 plt.xticks(tick_marks, classes, rotation=45)
 plt.yticks(tick_marks, classes)print(‘Confusion matrix, without normalization’)
print(cm)
thresh = cm.max() / 2.for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
 plt.text(j, i, cm[i, j],
 horizontalalignment=”center”,
 color=”white” if cm[i, j] > thresh else “black”)
plt.tight_layout()
plt.ylabel(‘True label’)
plt.xlabel(‘Predicted label’)cm_plot_labels = [‘0’,’1',’2',’3',’4',’5',’6',’7',’8',’9']plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title=’Confusion Matrix’)

print(“\033[1m The result is telling us that we have: “,(cm[0,0]+cm[1,1]+cm[2,2]+cm[3,3]+cm[4,4]+cm[5,5]+cm[6,6]+cm[7,7]+cm[8,8]+cm[9,9]),”correct predictions.”)print(“\033[1m The result is telling us that we have: “,(cm.sum()-(cm[0,0]+cm[1,1]+cm[2,2]+cm[3,3]+cm[4,4]+cm[5,5]+cm[6,6]+cm[7,7]+cm[8,8]+cm[9,9])),”incorrect predictions.”)print(“\033[1m We have total predictions of: “,(cm.sum()))

Compute precision, recall, f-score and support

To quote from Scikit Learn:

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier to not label a sample as positive if it is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The f1-score can be interpreted as a weighted harmonic mean of the precision and recall, where an f1-score reaches its best value at 1 and worst score at 0.

The f1-score weights the recall more than the precision by a factor of 1.0, which means recall and precision are equally important.

The support is the number of occurrences of each class in y_test.

print(skmet.classification_report(y_test, y_pred))

5. Deployment

So, our convolutional neural network (CNN) model is a pretty good model for predicting the digit from gray-scale image of hand-drawn digits from 0 Through 9. Now, how do we predict the digit from a new gray-scale image?

len(X_test)

19800

Let’s draw a number between 0 to 19800

k = random.randint(0,(len(X_test)))
k

766

Let’s predict using our model the digit from pred1

pred1 = model.predict(x_train[k].reshape(1, 28, 28, 1))print(pred1.argmax())

Our model says that we drew an image of the digit “7”. So, let’s print this image to see whether or not our model was right

pic2 = X_train.iloc[k].values.reshape(28,28)plt.imshow(pic2, cmap='Greys')

Yes indeed! our model was right.

Summary

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.

The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex.

Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field.

A collection of such fields overlaps to cover the entire visual area.

About the Author

Roi Polanitzer, PDS, ADL, MLS, PDA, CPD, F.IL.A.V.F.A., FRM, is a data scientist with an extensive experience in solving machine learning problems, such as: regression, classification, clustering, recommender systems, anomaly detection, text analytics & NLP, and image processing. Mr. Polanitzer is is the Owner and Chief Data Scientist of Prediction Consultants — Advanced Analysis and Model Development, a data science firm headquartered in Rishon LeZion, Israel. He is also the Owner and Chief Appraiser of Intrinsic Value — Independent Business Appraisers, a business valuation firm that specializes in corporates, intangible assets and complex financial instruments valuation.

Over more than 16 years, he has performed data science projects such as: regression (e.g., house prices, CLV- customer lifetime value, and time-to-failure), classification (e.g., market targeting, customer churn), probability (e.g., spam filters, employee churn, fraud detection, loan default, and disease diagnostics), clustering (e.g., customer segmentation, and topic modeling), dimensionality reduction (e.g., p-values, itertools Combinations, principal components analysis, and autoencoders), recommender systems (e.g., products for a customer, and advertisements for a surfer), anomaly detection (e.g., supermarkets’ revenue and profits), text analytics (e.g., identifying market trends, web searches), NLP (e.g., sentiment analysis, cosine similarity, and text classification), image processing (e.g., image binary classification of dogs vs. cats, , and image multiclass classification of digits in sign language), and signal processing (e.g., audio binary classification of males vs. females, and audio multiclass classification of urban sounds).

Mr. Polanitzer holds various professional designations, such as a global designation called “Financial Risk Manager” (FRM, which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as K-Means, SVM and KNN for credit risk measurement and management) from the Global Association of Risk Professionals (GARP), a designation called “Fellow Actuary” (F.IL.A.V.F.A., which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as GLM, RF and NN for determining premiums in general insurance) from the Israel Association of Valuators and Financial Actuaries (IAVFA), and a designation called “Certified Risk Manager” (CRM, which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as DT, NB and PCA for operational risk management) from the Israeli Association of Risk Managers (IARM).

Mr. Polanitzer had studied actuarial science (i.e., implementation of statistical and data mining techniques for solving time-series analysis, dimensionality reduction, optimization and simulation problems) at the prestigious 250-hours training program of the University of Haifa, financial risk management (i.e., building statistical predictive and probabilistic models for solving regression, classification, clustering and anomaly detection) at the prestigious 250-hours training program of the program of the Ariel University, and machine learning and deep learning (i.e., building recommender systems and training neural networks for image processing and NLP) at the prestigious 500-hours training program of the John Bryce College.

He had graduated various professional trainings at the John Bryce College, such as: “Introduction to Machine Learning, AI & Data Visualization for Managers and Architects”, “Professional training in Practical Machine Learning, AI & Deep Learning with Python for Algorithm Developers & Data Scientists”, “Azure Data Fundamentals: Relational Data, Non-Relational Data and Modern Data Warehouse Analytics in Azure”, and “Azure AI Fundamentals: Azure Tools for ML, Automated ML & Visual Tools for ML and Deep Learning”.

Mr. Polanitzer had also graduated various professional trainings at the Professional Data Scientists’ Israel Association, such as: “Neural Networks and Deep Learning”, “Big Data and Cloud Services”, “Natural Language Processing and Text Mining”.