Hand digit Recognition using Logistic Regression (Python + sklearn)

Narayan Jha
Nov 4 · 3 min read

Humans are very good at recognizing the handwritten digits but have you ever wondered how the human brain is that efficient to recognize handwritten digits? Can we use this capability of the brain for machines to recognize the handwriting? Yes! In this blog, we are going to write a program to recognize the handwriting so let's get started.

  • Creating Dataset: We will use a built-in dataset that Scikit learn provides for classifying the hand digits.
from sklearn import datasets
digits = datasets.load_digits()
dir(digits)#OUTPUT OF DIR(DIGITS)['DESCR', 'data', 'images', 'target', 'target_names']

digits variable contains [‘DESCR’, ‘data’, ‘images’, ‘target’, ‘target_names’]. We require features and labels for the training of Hand digit recognition. In this, we will use images as features and targets as a label.

print type(digits.images)
print type(digits.target)
#Output these are the numpy array
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
digits.images.shape#Image shapes
#Out: (1797, 8, 8)

digits.image is an array with 3 dimensions. The first dimension indexes images, and we see that we have 1797 images in total. The next two dimensions correspond to the x and y coordinates of the pixels in each image. Each image has 8x8 = 64 pixels. In other words, this array could be represented in 3D as a pile of images with 8x8 pixels each. Let’s look at the data of the first 8x8 image. Each slot in the array corresponds to a pixel, and the value in the slot is the amount of black in the pixel.

print digits.images[0]#Out:
[[ 0. 0. 5. 13. 9. 1. 0. 0.]
[ 0. 0. 13. 15. 10. 15. 5. 0.]
[ 0. 3. 15. 2. 0. 11. 8. 0.]
[ 0. 4. 12. 0. 0. 8. 8. 0.]
[ 0. 5. 8. 0. 0. 9. 8. 0.]
[ 0. 4. 11. 0. 1. 12. 7. 0.]
[ 0. 2. 14. 5. 10. 12. 0. 0.]
[ 0. 0. 6. 13. 10. 0. 0. 0.]]

Let's try to visualize the image :

import matplotlib.pyplot as plt
plt.imshow(digits.images[0],cmap='binary')
plt.show()

Now let’s investigate the target attribute:

print digits.target.shape
print digits.target
#Out:
(1797,)
[0 1 2 ... 8 9 8]

We are ready with the Features and target lets give this data to the Logistic regression algorithm to classify the Hand digit Recognition.

Let's start making the model.

Splitting Data into Training and Test Sets (Digits Dataset)

We make training and test sets to make sure that after we train our classification algorithm, it is able to generalize well to new data.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)

Create a Model for Machine Learning

from sklearn.linear_model import LogisticRegression

Create instance of the model

# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()

Training the model on the data, storing the information learned from the data

Model is learning the relationship between digits (x_train) and labels (y_train)

logisticRegr.fit(x_train, y_train)

Predict labels for new data (new images)

Uses the information the model learned during the model training process

# Returns a NumPy Array
# Predict for One Observation (image)
logisticRegr.predict(x_test[0].reshape(1,-1))

Predict for Multiple Observations (images) at Once

logisticRegr.predict(x_test[0:10])

Congratulations! Our model is ready to Analyze the handwritten digits.

Analytics Vidhya

Narayan Jha

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade