Digit recognition using Tensorflow : MNIST in jpg + Inception v3 transfer learning
How to use TensorFlow and Google’s Inception v3 model to recognize digits from the MNIST dataset converted to JPG format
Edit: If you would like to get in touch with me, feel free to mail me at
teavanist [at] gmail [dot] com ; Medium is not very conducive to conversations
Anyone who tinkers with Machine Learning will at some point play with the celebrated MNIST dataset. It’s supposedly the Hello World equivalent of machine learning applications.
How is this post different from the rest of the MNIST tutorials that you generally find on the Internet?
- Use of Inception model— I didn’t see many people using Inception v3 model to do image recognition on the MNIST data set. Most of them involved building a network from scratch. The methodology I have used is transfer learning on the Inception model instead of creating a network from scratch.
- Conversion to JPG format — MNIST dataset is in IDX file format and most MNIST tutorials I saw converts an image to its MNIST equivalent. I saw just one that converts MNIST to PNG format. I also thought that in future if I make a front end for this, it might be easier to manage if the file is JPG.
Here is what the rest of this post will cover:
- Step 1. Convert MNIST files to JPG format
- Step 2: Re-train Inception
- Step 3: Run the model on the test images
- Analysis
- Next steps
- References
Note: This post assumes that you have already installed Tensorflow and I use a Windows 10 PC.
Step 1. Convert MNIST files to JPG format
If you don’t wish to try out the script for conversion, you can just download the MNIST JPG files ( MNIST Dataset JPG format.zip) directly from my repo and move to Step 2:
https://github.com/teavanist/MNIST-JPG
For the more adventurous ones, the steps to do the conversion are listed below.
You start by downloading the MNIST digit dataset files from the following link:
http://yann.lecun.com/exdb/mnist/
Save all the four files to a directory.
You will notice that all four files are compressed versions in .gz format.
Unzip these files and the dataset will look something like this:
Next, create a folder named output in the same location.
Download the MNIST_JPG.PY file from my repo link below.
https://github.com/teavanist/MNIST-JPG
(This file is a modified version of the script by Myle Ott wrote to converted files to PNG format.)
Your directory should look something like this:
Before running the script, you need to install pillow as it is a dependency. You can do that by opening a Windows command prompt and typing the following command:
pip install pillow
Now you need to execute the mnist_jpg.py script. To do so, open a command prompt in Windows, navigate to your directory and type the following command:
> python mnist_jpg.py output
The script will start converting the files and you will see messages like the below:
Once done, you will find that the script has created two sub-folders within the output folder:
- training
- testing
Note: For my convenience, I renamed these to training_images and testing_images respectively; so you will find these names in the screenshots that follow.
Each of two folders will in turn have sub-folders 0 to 9:
The training folder has around 60,000 images in various sub-folders while the testing folder has around 10,000 images. Some snapshots of the sub-folders containing images are shown below:
The below is a single image opened in MS Paint as you can see, it is a 28 by 28 picture in greyscale.
Step 2: Re-train Inception
Download the retrain.py script from the following location:
https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
Next, let us re-organize the folder before we execute the script (This is totally optional and you can still run the script by changing the arguments to the script to point to the correct location.)
Create four new folders in the root directory:
- bottleneck
- inception
- output_graph
- output_labels
Your main folder should look like this:
Next, re-name the testing and training sub-folders which were automatically generated under the output folder when we did the MNIST conversion.
Move both these folders to the root directory. Once you have done that, you can delete the output folder as it is not needed anymore. Your main folder content will look like this:
You need to install tensorflow_hub as it is a dependency for the retrain.py script . To install tensorflow_hub, open a command prompt and type the following:
> pip install tensorflow_hub
You will see the following message once the installation is complete.
You can now run the script to training the Inception v3 model on MNIST images.
To run the script, open the command prompt, navigate to your main folder location (in my case C:\Users\admin\Desktop\MNIST Dataset JPG format) and type the following command:
python retrain.py --bottleneck_dir=bottleneck/ --model_dir=inception/ --output_labels=output_labels/retrained_labels.txt --output_graph=output_graph/retrained_graph.pb --image_dir=training_images/
The script will start running and you will find messages like the below:
Note: While running this script a second time, I found that it stopped when processing one of the files. It turned out the image that stopped the process was a 0 kb image. I assume this happened due to some issues when I performed the MNIST to JPG conversion. After I removed the file and ran the script, it worked fine.
As the train progresses, you will see the a training statistics being displayed:
Once the training is over, the script output will look something like this (In case you don’t see it on your command prompt, scroll up a little:
It looks like the re-trained model has a training accuracy of around 94%
Once this is complete, you can check how accurately, the model performs on the test images.
Step 3: Run the model on the test images
The Inception model is now trained. You can check how good the model is by checking how many of the test images it classifies correctly.
Download label_image_v3 from my repo:
https://github.com/teavanist/ml-files
Some notes about this script:
- The original label_image.py script is from Google and is used to classify a single image using a pre-trained model
- This version has support for directories: The original script could classify only a single image. This version can predict the label for a number of images in a specified folder
- The result of the labeling is also saved to a CSV file. This is very helpful to do your analysis later
- After analyzing all the images, the script also shows how many images it classified correctly
- Since I am a newbie to Python, I am sure the code that I added is not the most elegant…
Your folder should look something like this:
To test your model on the MNIST label, open a command prompt and navigate to your main folder. Then, type the following command:
python label_imageV3.py --graph=output_graph/retrained_graph.pb --labels=output_labels/retrained_labels.txt --input_layer=Placeholder --output_layer=final_result --directory_name=testing_images/ --process_mode=2
The script will display the following:
For me this script took more than 24 hours to complete on my home Windows PC. Check out my CPU workload during that time.
Once the script finishes execution, it will tell you how many images in the testing_images folder it was able to predict correctly.
The script also generates a CSV file to show statistics about each label. This CSV is super useful for further analysis. My sample is attached.
Your folder will look be something like this:
Analysis
Overall, the model was able to classify only about 93.31% of the digits correctly.
A quick analysis using the data in the CSV file shows that the model made most of its errors in digits 2 and 5 !
The CSV file also tells you which specific files were wrongly classified.
Below are some of the digits that the model could not classify correctly:
Next steps
I plan to spend the next few days trying to improve the accuracy of the model and then write up my experience.
References
- If you want to know more about MNIST dataset, check out the link below:
https://corochann.com/mnist-dataset-introduction-1138.html - Myle Ott’s original script I used as a basis for the JPG conversion:
https://github.com/myleott/mnist_png - Inception model: A good write up by Google on the Tensorflow repo about Inception
https://github.com/tensorflow/models/tree/master/research/inception - Bharat Raj has a good Medium post about the different versions of Inception:
https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202 - A good short video about transfer learning:
https://www.youtube.com/watch?v=kSJCLxDJ2Wc