How to Build a Multiclass Image Classification Model without CNNs in Python
The beginners guide to build a simple artificial neural networks model
Convolution neural networks (CNNs) is arguably the best machine learning model in computer vision tasks, but before we start learning CNNs it’s better to start with a simpler model.
In this article, we will use backpropagation as an artificial neural networks algorithm to classify MNIST handwritten digits and we will use TensorFlow as our machine learning library.
About the data
I think MNIST handwritten digits is the most popular data to experiment with deep learning. As written on Yann LeCun’s website:
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
The data contains 60.000 training images and 10.000 test images. It’s in grayscale format and has 28 x 28 pixels dimension.
Loading data
MNIST handwritten digits dataset is already available as a part of the TensorFlow library, so we can load it easily by calling the function load_data()
.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_val, y_val) = mnist.load_data()
After the data loaded, first we want to verify that the data is loaded correctly. We can verify them by plotting the sample image.
import matplotlib.pyplot as pltplt.imshow(x_train[0], cmap="gray")
plt.show()
print("Label : ", y_train[0])
Data normalization
The values from the image data are in the range from 0 to 255, 0 for black, 255 for white, and gray is in between them. We want to normalize these image data values to range from 0 to 1. Data labels are encoded into a one-hot format since we will be using softmax activation in the output layer.
train_norm = x_train.astype('float32')
val_norm = x_val.astype('float32')x_train = train_norm / 255.0
x_val = val_norm / 255.0# convert label to one-hot encoded
y_train_enc = tf.keras.utils.to_categorical(y_train)
y_val_enc = tf.keras.utils.to_categorical(y_val)
Build the model
This is the part where we build the model. This is the model architecture we will be using:
It only has three layers with one hidden layer in the middle of the network. No convolution or feature extraction here. We can add more hidden layers if necessary.
We have 784 input neurons. This number corresponds to the number of pixels of our image. We feed all 784 pixels to the input layer. Ten neurons in the input layer correspond to the number of classes.
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())model.add(tf.keras.layers.Dense(128, input_shape=(784,), activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
Train the model
There are some hyperparameters to define before training the model. They are optimizer, learning rate, loss function, and epochs. An optimizer is an algorithm that changes the value of model parameters to reach a minimum loss. The learning rate will determine how much the value changes on each iteration. Since we have a multi-class problem then we use categorical_crossentropy for our loss function. And epochs is simply the number of iterations.
sgd = tf.keras.optimizers.SGD(learning_rate=0.004)model.compile(optimizer=sgd, loss="categorical_crossentropy", metrics=["accuracy"])EP = 25
history = model.fit(x=x_train, y=y_train_enc, epochs=EP, validation_data=(x_val, y_val_enc))
The training process takes less than two minutes. Here we can make the visualization of the training process to check the performance of our model.
import numpy as npplt.style.use("ggplot")
plt.figure()plt.plot(np.arange(0, EP), history.history["loss"], label="train_loss")plt.plot(np.arange(0, EP), history.history["val_loss"], label="val_loss")plt.plot(np.arange(0, EP), history.history["accuracy"], label="train_acc")plt.plot(np.arange(0, EP), history.history["val_accuracy"], label="val_acc")plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
As we can see our model is doing pretty good. The loss is getting close to minimum and the accuracy gradually increasing.
Evaluation and prediction
val_loss, val_acc = model.evaluate(x=x_val, y=y_val_enc)
313/313 [======] — 1s 2ms/step — loss: 0.1642 — accuracy: 0.9530
This model can get us 0.9529 accuracies on validation data. It means that our model was able to recognize the handwritten digits that it never saw in the training process with 95% accuracy. Quite impressive for a simple neural network.
Here we can show some sample of predictions results:
predictions = model.predict(x_val)x_val__ = x_val.reshape(x_val.shape[0], 28, 28)fig, axis = plt.subplots(2, 5, figsize=(12, 6))for i, ax in enumerate(axis.flat): ax.imshow(x_test__[i], cmap='gray') ax.set(title = f"Label : {y_val[i]}\nPrediction : {predictions[i].argmax()}")
The model was able to recognize the overall handwritten digits, but it’s still failed to recognize the bad writing. As we can see it failed to recognize label 5 because the handwritten digits are not very clear. It’s actually five but it also looks like six.
Conclusion
The simple model that we build was able to achieve 95% accuracy in less than two minutes. It’s simple and fast, also it doesn’t need GPU. It is suitable for beginners who want to learn neural networks. However, this model is not suitable to employed on complex image data.
What’s next?
- Try to play with the hyperparameters. Add more hidden layers, change the learning rate, epochs, number of hidden neurons, etc, and see how it affects the accuracy and performance.
- Challenge the model to recognize the external unseen handwritten digits data.
- Try to use feature extraction.
- Try this model on other datasets.