Handwritten Digit Recognition using Machine Learning and Deep Learning in Python
MNIST (“Modified National Institute of Standards and Technology”) is the de facto “hello world” dataset of computer vision and this dataset of handwritten images used as the basis for benchmarking classification algorithms. In this tutorial, we will use Kaggle’s dataset to demonstrate different approaches to solve the image recognition problem.
About dataset
Each image is a 28 by 28 pixel square (784 pixels total). Training dataset contains 42000 entries and test dataset contains 28000 entries. Since it is a digit recognition task, it has 10 classes to predict. State of the art solution has around 0.2% error rate, this can be achieved using Convolutional Neural Network.
Load dataset
import pandas as pd
import numpy as np
def get_dataset():
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')
train_features = train.iloc[:,1:]
train_labels = train.iloc[:,0]
train_features = np.array(train_features).astype(np.uint8)
test_features = np.array(test).astype(np.uint8)
return train_features, train_labels, test_features
1. Using Random Forest
Random forests are an ensemble learning method for classification operate by constructing a multitude of decision trees at training time and outputting the class of the individual trees. Random decision forests correct for decision tree’s habit of overfitting to their training set. Read more on wiki
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score train_features, train_labels, test_features = get_dataset() model = RandomForestClassifier(n_estimators=100) model.fit(train_features, train_labels) predictions = model.predict(test_features) '''
Output array([ 0.96276026, 0.96644056, 0.96023336, 0.96546386, 0.96462601]) '''
Within just 10 lines of code we can predict digit with 96% accuracy.
2. Using Multi-Layer Perceptrons
A multilayer perceptron (MLP) is a class of an Artificial Neural Network. An MLP consists of at least three layers of nodes. Input layer, hidden layer, and an output layer. Read more on wiki
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers import Dense train_features, train_labels, test_features = get_dataset() # Convert labels to one hot encoding train_labels = np_utils.to_categorical(train_labels) # Normalize inputs from 0-255 pixel to 0-1
train_features = train_features / 255.0
test_features = test_features / 255.0 def model(num_pixels, num_classes):
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
model.add(Dense(num_classes, init='normal', activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
num_pixels = train_features.shape[1]
num_classes = train_labels.shape[1] m = model(num_pixels, num_classes)
m.fit(train_features, train_labels, validation_split=0.33, nb_epoch=10, batch_size=100, verbose=2) '''
Output
Train on 28139 samples, validate on 13861 samples
Epoch 1/10
4s - loss: 0.3232 - acc: 0.9069 - val_loss: 0.1751 - val_acc: 0.9486 Epoch 2/10
4s - loss: 0.1291 - acc: 0.9618 - val_loss: 0.1207 - val_acc: 0.9639
Epoch 3/10
4s - loss: 0.0827 - acc: 0.9773 - val_loss: 0.1102 - val_acc: 0.9657
Epoch 4/10
4s - loss: 0.0574 - acc: 0.9832 - val_loss: 0.0965 - val_acc: 0.9698
Epoch 5/10
5s - loss: 0.0371 - acc: 0.9901 - val_loss: 0.0931 - val_acc: 0.9708
Epoch 6/10
5s - loss: 0.0257 - acc: 0.9939 - val_loss: 0.0897 - val_acc: 0.9717
Epoch 7/10
4s - loss: 0.0198 - acc: 0.9954 - val_loss: 0.0894 - val_acc: 0.9745
Epoch 8/10
4s - loss: 0.0126 - acc: 0.9979 - val_loss: 0.0858 - val_acc: 0.9748
Epoch 9/10
4s - loss: 0.0076 - acc: 0.9993 - val_loss: 0.0883 - val_acc: 0.9735
Epoch 10/10
4s - loss: 0.0054 - acc: 0.9995 - val_loss: 0.0873 - val_acc: 0.9763 '''
MLP predicts digit with 97% accuracy.
3. Using Convolutional Neural Network
Convolutional Neural Network (CNN) is a class of deep, feed-forward artificial neural network that can be applied for recognizing images. Read more on wiki
from keras.layers import Activation, Dropout, Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D train_features, train_labels, test_features = get_dataset() # Convert labels to one hot encoding
train_labels = np_utils.to_categorical(train_labels) # Normalize inputs from 0-255 pixel to 0-1
train_features = train_features / 255.0
test_features = test_features / 255.0 def model(num_classes):
# create model
model = Sequential()
model.add(Conv2D(30, 3, 3, input_shape=(1, 28, 28), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model'''
Output
Train on 28139 samples, validate on 13861 samples
Epoch 1/10
28s - loss: 0.0218 - acc: 0.9924 - val_loss: 0.0414 - val_acc: 0.9887
Epoch 2/10
28s - loss: 0.0203 - acc: 0.9932 - val_loss: 0.0395 - val_acc: 0.9889
Epoch 3/10
28s - loss: 0.0175 - acc: 0.9938 - val_loss: 0.0355 - val_acc: 0.9900
Epoch 4/10
28s - loss: 0.0191 - acc: 0.9935 - val_loss: 0.0354 - val_acc: 0.9908
Epoch 5/10
28s - loss: 0.0175 - acc: 0.9937 - val_loss: 0.0342 - val_acc: 0.9896
Epoch 6/10
28s - loss: 0.0187 - acc: 0.9939 - val_loss: 0.0359 - val_acc: 0.9905
Epoch 7/10
29s - loss: 0.0147 - acc: 0.9950 - val_loss: 0.0397 - val_acc: 0.9902
Epoch 8/10
28s - loss: 0.0171 - acc: 0.9940 - val_loss: 0.0394 - val_acc: 0.9900
Epoch 9/10
28s - loss: 0.0132 - acc: 0.9952 - val_loss: 0.0408 - val_acc: 0.9896
Epoch 10/10
30s - loss: 0.0136 - acc: 0.9954 - val_loss: 0.0373 - val_acc: 0.9904'''
This simple CNN can predict images with 99% accurate
Originally published at rajeshpedia.wordpress.com on August 8, 2017.
