Hand digit Recognition using Logistic Regression (Python + sklearn)

Humans are very good at recognizing the handwritten digits but have you ever wondered how the human brain is that efficient to recognize handwritten digits? Can we use this capability of the brain for machines to recognize the handwriting? Yes! In this blog, we are going to write a program to recognize the handwriting so let's get started.
- Creating Dataset: We will use a built-in dataset that Scikit learn provides for classifying the hand digits.
from sklearn import datasets
digits = datasets.load_digits()dir(digits)#OUTPUT OF DIR(DIGITS)['DESCR', 'data', 'images', 'target', 'target_names']
digits variable contains [‘DESCR’, ‘data’, ‘images’, ‘target’, ‘target_names’]. We require features and labels for the training of Hand digit recognition. In this, we will use images as features and targets as a label.
print type(digits.images)
print type(digits.target)#Output these are the numpy array
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>digits.images.shape#Image shapes
#Out: (1797, 8, 8)
digits.image is an array with 3 dimensions. The first dimension indexes images, and we see that we have 1797 images in total. The next two dimensions correspond to the x and y coordinates of the pixels in each image. Each image has 8x8 = 64 pixels. In other words, this array could be represented in 3D as a pile of images with 8x8 pixels each. Let’s look at the data of the first 8x8 image. Each slot in the array corresponds to a pixel, and the value in the slot is the amount of black in the pixel.
print digits.images[0]#Out:
[[ 0. 0. 5. 13. 9. 1. 0. 0.]
[ 0. 0. 13. 15. 10. 15. 5. 0.]
[ 0. 3. 15. 2. 0. 11. 8. 0.]
[ 0. 4. 12. 0. 0. 8. 8. 0.]
[ 0. 5. 8. 0. 0. 9. 8. 0.]
[ 0. 4. 11. 0. 1. 12. 7. 0.]
[ 0. 2. 14. 5. 10. 12. 0. 0.]
[ 0. 0. 6. 13. 10. 0. 0. 0.]]
Let's try to visualize the image :
import matplotlib.pyplot as plt
plt.imshow(digits.images[0],cmap='binary')
plt.show()Now let’s investigate the target attribute:
print digits.target.shape
print digits.target#Out:
(1797,)
[0 1 2 ... 8 9 8]
We are ready with the Features and target lets give this data to the Logistic regression algorithm to classify the Hand digit Recognition.
Let's start making the model.
Splitting Data into Training and Test Sets (Digits Dataset)
We make training and test sets to make sure that after we train our classification algorithm, it is able to generalize well to new data.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)Create a Model for Machine Learning
from sklearn.linear_model import LogisticRegression
Create instance of the model
# all parameters not specified are set to their defaultslogisticRegr = LogisticRegression()
Training the model on the data, storing the information learned from the data
Model is learning the relationship between digits (x_train) and labels (y_train)
logisticRegr.fit(x_train, y_train)
Predict labels for new data (new images)
Uses the information the model learned during the model training process
# Returns a NumPy Array
# Predict for One Observation (image)
logisticRegr.predict(x_test[0].reshape(1,-1))Predict for Multiple Observations (images) at Once
logisticRegr.predict(x_test[0:10])
Congratulations! Our model is ready to Analyze the handwritten digits.
