Recognizing Handwritten digits with TensorFlow

Sanjit Jain
4 min readDec 30, 2017

--

DeepLearning is a subfield of machine learning that is a set of algorithms and functions inspired by the structure and fucntioning of the brain.

TensorFlow is a machine learning framework that Google created and is used to design, build and train deep learning models.

This tutorial is an attempt on the MNIST dataset from this Kaggle competition while also explaining the basics of writing TensorFlow code.

Getting Started

  • Enter the challenge
  • Download the dataset from the competition

Import Libraries and Load MNIST Data

import numpy as np
from numpy import array
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
mnist = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

Setup Constants and HyperParameters

LABELS = 10           # Number of labls(1-10)
IMAGE_WIDTH = 28 # Width/height if the image
COLOR_CHANNELS = 1 # Number of color channels
VALID_SIZE = 1000 # Size of the Validation dataEPOCHS = 20000 # Number of epochs to run
BATCH_SIZE = 32 # SGD Batch size
FILTER_SIZE = 5 # Filter size for kernel
DEPTH = 32 # Number of filters/templates
FC_NEURONS = 1024 # Number of neurons in the fully
# connected later
LR = 0.001 # Learning rate Alpha for SGD

Preparing Data

  • One hot encoding of the labels
  • Reshaping into image shape(# images, # IMAGE_WIDTH, # IMAGE_WIDTH, # COLOR_CHANNELS)
  • Splitting into train and validation sets
labels = np.array(mnist.pop('label'))
labels = LabelEncoder().fit_transform(labels)[:, None]
labels = OneHotEncoder().fit_transform(labels).todense()
mnist = StandardScaler().fit_transform(np.float32(mnist.values))
mnist = mnist.reshape(-1, IMAGE_WIDTH, IMAGE_WIDTH, COLOR_CHANNELS)
train_data, valid_data = mnist[:-VALID_SIZE], mnist[-VALID_SIZE:]
train_labels, valid_labels = labels[:-VALID_SIZE], labels[-VALID_SIZE:]

Architecture of our model

Let’s now build a network with two convolutional layers, followed by one fully connected layer.

Fig. 2-conv layer + 1-fully connected layer neural netwrok

Initialize the data with placeholders:

We start building by creating nodes for the input images and target output classes.

tf_data = tf.placeholder(tf.float32, shape=(None, WIDTH, WIDTH, CHANNELS))
tf_labels = tf.placeholder(tf.float32, shape=(None, LABELS))

Weight Initialization

To create this model, we are going to need to create a lot of weights and biases. We’ ll build two handy functions to do this for us.

#generates a weight variable of a given shape.
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
#generates a bias variable of a given shape.
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)

Convolution and Pooling

TensorFlow gives us a lot of flexibility in setting up convolutions and pooling operations.Our convolutions uses a stride of one and are zero padded so that spatial dimmensions are maintained in the output.Our pooling is the plain max pooling over 2x2 blocks with stride 2.We’ll also build functions for these.

#returns a 2d convolution layer with full stride
def conv_2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
#down samples a feature map by 2X
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')

The Model

First Convolutional Layer:

We can now implement our first layer.It will consist of convolution that will compute 32 features for each 5x5 patch.

# First convolution layer - maps one grayscale image to 8 feature maps.
w1 = weight_variable([5, 5, 1, 32])
b1 = bias_variable([32])
layer_conv1 = tf.nn.relu(conv_2d(x, w1) + b1)
# Pooling layer - downsamples by 2X.
layer_pool1 = max_pool_2x2(layer_conv1)

Second Convolutional Layer:

Our second conv layer will have 64 features for each 5x5 patch.

# Second convolutional layer -- maps 32 feature maps to 64.
w2 = weight_variable([5, 5, 32, 64])
b2 = bias_variable([64])
layer_conv2 = tf.nn.relu(conv_2d(layer_pool1, w2) + b2)
# Second pooling layer.
layer_pool2 = max_pool_2x2(layer_conv2)

Fully Connected Layer:

Our image is now of size 7x7 and we will now add a fully-connected layer with 1024 neurons.

wfc1 = weight_variable([7*7*64, 1024])
bfc1 = bias_variable([1024])
flatten_pool2 = tf.reshape(layer_pool2, [-1,7*7*64])
layer_fc1 = tf.nn.relu(tf.matmul(flatten_pool2, wfc1) + bfc1)

Readout Layer:

We add a layer just like for the one layer softmax regression above.

wfc2 = weight_variable([1024, 10])
bfc2 = bias_variable([10])
y_conv = tf.matmul(layer_fc1, wfc2) + bfc2

Train and Evaluate the model

How well does this model do?

We will use tf.Session to create a session to run our model and log after every 500 iteration of our training process

init = tf.global_variables_initializer()
session = tf.Session()
session.run(init)
tf_los=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits(tf_data),labels=tf_labels))tf_acc = 100*tf.reduce_mean(tf.to_float(tf.equal(tf.argmax(tf_pred, 1), tf.argmax(tf_labels, 1))))tf_opt = tf.train.RMSPropOptimizer(LR)
tf_step = tf_opt.minimize(tf_loss)
ss = ShuffleSplit(n_splits=STEPS, train_size=BATCH)
ss.get_n_splits(train_data, train_labels)
for step, (idx, _) in enumerate(ss.split(train_data,train_labels), start=1):
fd = {tf_data:train_data[idx], tf_labels:train_labels[idx]}
session.run(tf_step, feed_dict=fd)
if step%500 == 0:
fd = {tf_data:valid_data, tf_labels:valid_labels}
valid_loss, valid_accuracy = session.run([tf_loss, tf_acc], feed_dict=fd)
history.append((step, valid_loss, valid_accuracy))
print('Step %i \t Valid. Acc. = %f \n'%(step, valid_accuracy))

The final test data accuracy after running the entire code comes around approximately 98.7%

For the complete script and the Jupyter Notebook, head over to my github repo.

--

--