This is how a neural network learns to add, multiply and compare handwritten digits

Mounir Kara Zaitri
Analytics Vidhya
Published in
5 min readSep 20, 2021

WITHOUT knowing their values!

Courtesy of dlpng [https://dlpng.com/png/6906777]

I described in a previous post, how useful are autoencoders in automated labeling. The main property of these networks is their ability to learn features/patterns in the data. This is in fact not specific to autoencoders and can be implemented using other unsupervised techniques, mainly PCA.
The ability to detect and learn features in data can be used in other areas.

In this post, I will present one application of autoencoders. This will be realized through two steps:

  • First, a convolutional autoencoder will be trained on MNIST data.
  • After the training of the encoder and decoder, we will freeze their weights and use them with additional dense layers to “learn” arithmetic operations, namely addition, multiplication, and comparison.

The trick is to never explicitly associate the handwritten digits in MNIST dataset with their respective labels. We will see that the neural networks will be nevertheless able to reach 97+% accuracy in all cases on unseen data.

The first step of the design is described in the following diagram:

MINST autoencoder [Image by author]

In the second step, we will use the encoder in series with dense layers to perform arithmetic operations. We will train only the dense layer weights, and supply the results of the operations as labels. Note that we will not supply the values of the digits (labels).

Training the network to learn arithmetic operations [Image by author]

Training an autoencoder on MNIST data

Similar to the previous article, we will use MNIST data in this experiment. The autoencoder will learn the handwritten digits features using 60000 training samples. We import MNIST using the KERAS library.

We have to scale the data in the range [0,1] and reshape it to KERAS format for images (nbr_samples x width x height x channels).

The autoencoder architecture is based on a series of convolutional layers, that will gradually encode the 28x28x1 image (784 pixels) into a 100 elements array and decode that representation back to the original format. The resulting image -after the training- will hopefully resemble the original one.

The autoencoder is then created using the encoder and the decoder:

Each autoencoder output will be trained as a binary classifier for each pixel.

The early stopping will prevent the autoencoder from overfitting the training data. There are two ways to check the network performance. First, we can evaluate the loss function on test data. We expect it to be close to the loss value on the training data.

Values are very close for both data sets. Another way to verify the autoencoder is to check the obtained reconstitution for a random sample from the test data.

A picture is worth a thousand words! It is clear that the reconstitution is very close to the original image.

Now that we have a trained encoder and decoder, let’s focus on the encoder. For each image, the encoder generates a representation that captures the most “interesting” or “important” features. This representation should be sufficient to reconstitute the image using the decoder. Here is the representation of the sample image we used earlier:

The decoder, using these 100 numbers, will generate a 28x28 image (784 pixels).

And here is where the fun part begins! using the lower-dimension representation, let’s do some math.

Learning arithmetic operations on handwritten digits

The idea is simple. Using the representation of two images, we train a neural network to compute their sum, their product and to compare them. We will not provide the value of each digit, but we will provide the results during the training step.
We will be performing addition and multiplication on numbers in the range [0–9]. The results will be in the range [0–18] and [0–81] respectively. So the outputs will be coded using multiple outputs:
1- Sum units, multiclass output [0,1,2,3,4,5,6,7,8,9]
2- Sum tens, binary output [0,1]
3- Multiplication units, multiclass output [0,1,2,3,4,5,6,7,8,9]
4- Multiplication tens, multiclass output [0,1,2,3,4,5,6,7,8]
5- Comparison result, binary output [0,1]

The model representation [Image by author]

Using the functional API in KERAS we define the network architecture. First, we import the encoder twice and freeze its weights:

Using the outputs of the encoders, we build the model:

This model has two inputs (the two handwritten digits images) and five outputs (units and tens of the sum and product plus the comparison result). We will use two different losses due to the nature of the outputs. Note that there is a common hidden layer of 1000 units, and then five branches (one for each output).
We need to create datasets to train and test our model. Inputs will be random combinations of handwritten digits. Outputs will be the expected results for each combination.

Now we are ready to train our model!

At the end of the training, the accuracy on all outputs is pretty good (9x%). Let’s see first how the model performs on the test data.

Results are still in the 95+% range. Let’s show a random sample of the model predictions.

Voilà! We could improve the accuracy by training the model on more random samples (increase train_size value) or tweak the model architecture. With that being said, we managed to build a neural network that is capable of solving basic arithmetic operations on handwritten digits without explicitly computing their values. Mission accomplished.

Conclusion and future work

In this post, we presented an autoencoder based on MNIST images. During the autoencoder training, the encoder part learns the most important features of the images, in order to reconstruct them later via the decoder. These features are used in further operations (via dense or recurrent layers). We trained a dense neural network, in series with the autoencoder to learn arithmetic operations. The model achieved more than 95% accuracy on all outputs.

Autoencoders are part of unsupervised learning. We are still scratching the surface of these amazing deep learning techniques. I will continue to explore this area, especially using recurrent neural networks and their applications in natural language processing and time series.

References

The notebook [https://jovian.ai/kara-mounir/mnist-autoencoder]

Machine learning mastery blog by Jason Brownlee [https://machinelearningmastery.com]

My Github [https://github.com/zaitrik]

My LinkedIn [https://www.linkedin.com/in/mounir-kara-zaitri-a01a00208/]

--

--

Mounir Kara Zaitri
Analytics Vidhya

I'm a Canadian air traffic controller, fascinated by data analytics and machine learning.