Fastai v2 — An End-to-End Deep Learning Tutorial for Arabic character recognition

Anel Music
Analytics Vidhya
Published in
8 min readMay 7, 2020

--

The Complete code is available on GitHub featuring an all-in-one jupyter notebook.

In a nutshell:

With today’s rapid progress in machine learning, it is hard to imagine that even a few years ago, robust character recognition imposed a real challenge. Since then, the famous MNIST data set has been used by researchers around the world to train new algorithms. Today, MNIST represents practically no challenge for modern neural networks. Inspired by this, in 2017 a research team has set out to classify the Arabic alphabet. Ahmed El-Sawy, Mohamed Loey and Hazem EL-Bakry achieved an accuracy of 94.1% in their work “Arabic Handwritten Characters Recognition using Convolutional Neural Network, WSEAS, 2017”. Our goal is to show how easy it is to beat this state of the art research result from 2017 using modern training methods.

Aim of this article:

  • Download and unzip the datasets
  • Load and format images and write a custom labeling fuction
  • Split dataset into training, validation and test sets
  • Use transfer learning and the Resnet18 architecture to train a multi-class classifier
  • Use fine-tuning and discriminative learning rates to achieve 97% accuracy within 22 epochs of training
  • Plot and interpret training results

Step 1: Getting the data

The data set contains 16800 images of handwritten characters from the Arabic alphabet. Each image has the size of 32x32 pixel. The training set (13440 images, 490 images per character) and the validation set (3360 images, 120 images per character) are already predefined. You can download the the data set directly from here. Once downloaded you can extract the tar.gz file using:

tar -xzf arabic_mnist_dataset.tar.gz 

Step 2: Loading the data

After importing the fastai module:

from utils import *
from fastai2.vision.all import *

we can create a path object by providing the full path to the directory containing the extracted data set:

# Show what's inside the directory
path = Path('/notebooks/storage/data/arabic_mnist')
path.ls()

Output:

(#3) [Path(‘/notebooks/storage/data/arabic_mnist/train’),Path(‘/notebooks/storage/data/arabic_mnist/.ipynb_checkpoints’),Path(‘/notebooks/storage/data/arabic_mnist/test’)]

Here, we can see that the data set consists of a train and test set. In order to use fastai’s data set splitting method

GrandparentSplitter(train_name=’train’, valid_name=’valid’)

we can either provide the name of the folders containing the data or rename the folders directly. Here we will rename the directory using:

# Rename test to valid
! mv {path / 'test'} {path / 'valid'}

Step 3: Looking at the data filenames

To better understand the data and the naming convention we can print the filenames of the training set:

train_data = path/"train/train"
train_data.ls()[:10], len(train_data.ls())

Output:

((#10) [Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_5374_label_28.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_8290_label_1.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_3424_label_8.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_6270_label_28.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_6026_label_26.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_5237_label_11.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_7807_label_24.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_4456_label_25.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_2777_label_12.png’),Path(‘/notebooks/storage/data/arabic_mnist/train/train/id_9003_label_6.png’)],13440)

We can see that the creators of the data set used numerical labels from 0 to 28, where each number indicates a character from the Arabic alphabet. Although it is perfectly possible to use these numeric labels, we would like to use string labels instead to make interpretation easier later on. For this purpose the filenames of the images have to be mapped to corresponding identifiers.

We will start by creating a simple list containing the labels that we want to use instead of the corresponding numeric label:

arabic_mnist_labels = ['alef','beh','teh','theh','jeem','hah','khah','dal','thal','reh',
'zain','seen','sheen','sad','dad','tah','zah','ain','ghain','feh',
'qaf','kaf','lam','meem','noon','heh','waw','yeh']

Now, we need to think about how to to map the filenames to our arabic_mnist_labels string identifiers. One convenient way is to use regular expressions.

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern

We could analyze the path : /notebooks/storage/data/arabic_mnist/train/id5374label_28.png and wait for “label” followed by underscore “_” followed by numbers “28” followed by “.png” an only catch the numbers.

It turns out that one way of defining such a pattern is to use the following regular expression:

regex = "label_(.+).png"

With this regular expression we can now define our own labeling function that returns the corresponding string-label for each image filename. Keep in mind that the numeric labels start from 1–28, whereas our arabic_mnist_labels start from index 0–27.

def get_arabic_mnist_labels(file_path):
regex = "label_(.+).png"
label = re.search(regex, str(file_path)).group(1)
return arabic_mnist_labels[int(label)-1] # adapt index

Step 4: Creating a Datablock object

Fastai’s Datablock allows us to setup Pytorch’s DataLoaders for our training and validation set, to split the data according to our needs and perform transformations such as resizing, rotating and normalizing our data. Here, we will use a concept called presizing

arab_mnist = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=GrandparentSplitter(),
get_y=get_arabic_mnist_labels,
item_tfms=Resize(460),
batch_tfms=[*aug_transforms(do_flip = False,
size=224, min_scale=0.85),
Normalize.from_stats(*imagenet_stats)]
)
#source
dls = arab_mnist.dataloaders(path)

Presizing: First, resizing images to relatively “large dimensions” that is, dimensions significantly larger than the target training dimensions.

Second, composing all of the common augmentation operations (including a resize to the final target size) into one, and performing the combined operation on the GPU only once at the end of processing, rather than performing them individually and interpolating multiple times.

Finally, we can check the labels to make sure our custom function using:

dls.show_batch(nrows=3, ncols=3)

Output:

So far the data looks as expected. Our labeling function mapped the numeric labels to string identifiers.

Step 5: Create the Neural Net

As our task is to classify image data, we will use a convolutional neural network. To be more precise we will use the ResNet-18 architecture:

ResNet-18 is a convolutional neural network that is 18 layers deep. You can load a pretrained version of the network trained on more than a million images from the ImageNet database

#Setup Neural Net
learn = cnn_learner(dls, resnet18, pretrained=True, loss_func = CrossEntropyLossFlat(), metrics=accuracy, model_dir="/tmp/model/")

To use the power transfer learning we can set the pretrained parameter to True, although it is True by default. Since we want to have a classifier that can classify more than 2 categories but is limited to one label per image, we will use CrossEntropy as our loss function. Here again fastai would have picked the appropriate loss function based on our datablock, where we specifically defined the parameter blocks to consists of a block of images and categories (See Step 4). Keep in mind that the loss function is needed by the computer to compute gradients, whereas the accuracy is a metric that we humans can understand and interpret.

Step 6: Train the Neural Net

Before we start training the neural net, we can use the learning rate finder to pick an appropriate learning rate:

learn.lr_find()

Output:

A good rule of thumb is to pick the learning rate close to the steepest negative slope close to the minimum but not at the minimum itself. In this example we would pick lr = 1e-2.

Basic principle: Begin to train the model while increasing the learning rate from a very low to a very large one, stop when the loss starts to really get out of control. Plot the losses against the learning rates and pick a value a bit before the minimum, where the loss still improves.

For more information in the learning rate finder please refer to this paper by Leslie N. Smith.

  1. Training

First, we will train the neural net for 15 epochs using the once-cycle-training policy and monitor the traing_loss as well as the validation_loss to make sure we are not underfitting or overfitting:

One-cycle-policy: By first slowly increasing the learning rate to the max, which is provided by the learning rate finder and then decreasing it again, it is possible to train complex networks faster and with same or more precision

lr = 1e-2
learn.fit_one_cycle(15, lr)

Output:

The results show that after 15 epochs of training we achieved an classification accuracy of 96.67% and therefore already beat the state of the art result from 2017. However, we can do better. By default the starting layers of the 18 layer ResNet-18 are frozen, which means our training does not influence these layer and the corresponding weights were not updated. We can unfreeze these layers to train the complete neural net:

Fine tuning: Fine tuning is a process to take a network model that has already been trained for a given task, and make it perform a second similar task.

learn.unfreeze()

Now, we can pick a new learning rate using the learning rate finder:

learn.lr_find()

Output:

As the first layers are now unfrozen we will use discriminitive learning rates, which means that the first layers will be trained with a lower learning rate, whereas the last layers, which need to be adapted to our specific domain will be trained with higher learning rates. We can tell fastai to use discriminative learning rates by providing a slice object containing (the min_lr and max_lr):

learn.fit_one_cycle(7, lr_max=slice(10e-6, 1e-4))

Output:

As we can see our final accuracy reached 97.08%. At this point, we are satisfied with the result. Don’t be startled by the slight overfit which is present if the train_loss is a little bit smaller than the valid_loss. Some deep learning theorists would probably advise against this, but in practice, a slight overfit is actually good. What we are ultimately interested in is that our metric (here the accuracy) is as high as possible and that the valid_loss is as low as possible.

Step 7: Analyze results

One way of analyzing the results is looking at the confusion matrix.

Confusion matrix: A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class

Due to the fact that the data set has 28 classes analyzing the confusion matrix is a bit cumbersome.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

Output:

Instead of the matrix, we can also just look at the categories that were most frequently confused with each other:

interp.most_confused(min_val=3)

Output:

[(‘zain’, ‘reh’, 7), (‘qaf’, ‘feh’, 6),(‘noon’, ‘teh’, 5),(‘zah’, ‘tah’, 5),(‘seen’, ‘sad’, 4),(‘thal’, ‘dal’, 4),(‘thal’, ‘zain’, 4),(‘ain’, ‘khah’, 3),(‘dad’, ‘sad’, 3),(‘dal’, ‘reh’, 3),(‘feh’, ‘qaf’, 3),(‘teh’, ‘noon’, 3),(‘theh’, ‘teh’, 3),(‘zain’, ‘thal’, 3)]

Finally, we can look at the classification that caused the highest loss or contributed the most to lowering our models accuracy:

interp.plot_top_losses(9, figsize=(10,10))

After talking to native Arabic speakers we found that cleaning the data set would increase the accuracy dramatically, hence a lot of the chars are written poorly. A quick search on google confirmed that the first image of the output above (feh/noon) would probably also be wrongly classified by humans.

Final Words

No final words. Just get on GitHub, clone this repository and get started yourself.

I hope you enjoyed it.

--

--

Anel Music
Analytics Vidhya

@Accenture Machine Learning Engineer | TUM MSc EE