Classification of the American Sign Language using Pytorch

Helik Thacker
3 min readJul 2, 2020

--

American Sign Language is a natural language that serves as the predominant sign language of Deaf communities in the United States and most of Anglophone Canada. -Wikipedia

The dataset we will use is https://www.kaggle.com/grassknoted/asl-alphabet

It has 87,000 images of size 200X200 pixels. There are 26 classes for A to Z and 3 classes for SPACE, DELETE and NOTHING. So in total, there are 29 classes. Our task will be to classify these classes from the given images. We will go through all the steps from processing the dataset to building a neural network and then predicting the class given an image from the test set.

Processing the dataset

Images are of size 200X200 pixels but we want to be able to complete the training fairly quickly. So we crop the image to 32X32 pixels. We also normalize the images so that no channels get too dominant while training.

The dataset is split into 66000-14000-7000 (training-test-validation).

Visualizing the images

We can visualize the dataset by using the plot feature of the matplotlib package.

Normalized data arranged in a grid

Neural Network Models

Model 1 — Deep Neural Network

We will first develop a deep neural network with just 2 layers and 1 ReLU activation function. The code is as simple as it can be. The layer configuration of the neural net can be seen in the following snippet.

We end up with roughly 85% accuracy on both the validation and the test sets after just 10 epochs of training with learning rate of 0.001 and Adam optimizer.

Validation set accuracy

Model 2 — Deep Convolution Neural Network

Our second model uses convolution layers because this problem is of images and it is good to have spatial relations preserved.

We reach a very good accuracy of 98.8% on the validation set 99% on the test set after 10 epochs. Learning rate of 0.0001 was used and Adam was used as the optimizer function.

Validation set accuracy

Model 3 — Resnet9

The last model we discuss is the Resnet9 model. It has residual blocks we add back the input tensors to the output tensors. We will also use one cycle policy for learning rate along with batch normalization, gradient clipping and weight decay. These factors are known to improve the accuracy of the model.

We reach validation accuracy of 99.9% and test set accuracy of 100% after just 8 epochs. The maximum learning rate was 0.01. Gradient clipping and weight decay were set to 0.1 and 0.0001 respectively. Optimizer used is Adam as in the other two models.

Validation set accuracy

Predicting on a single example on this model

This is an image in the test set which means the model has not seen this image in it’s training. Still the model outputs the correct result.

In this resolution it is not so clear to the eye, but we have our neural network to save us!

Conclusion

As we have seen, it is not that hard to code a neural network for a fairly simple dataset and achieve a decent accuracy. We can also use almost the same code for other datasets like MNIST, CIFAR-10, etc.

The jupyter notebook can be found at ASL.

Please feel free to suggest any corrections or improvements.

This blog post is accompanying the course project of the free course: Deep Learning with PyTorch: Zero to GANs. Do check out the course if you want to learn about deep learning as it is very beginner friendly.

--

--

Helik Thacker

I love Deep Reinforcement Learning and applying it to various problems especially to games.