Programming an AI to Recognize Sign Language with Tensorflow and Keras

Published in

Nerd For Tech

6 min readFeb 20, 2021

Context

For fun, I decided to program a deep learning model to recognize the alphabets of the American Sign Language (ASL). You can find the dataset here from Kaggle. Let’s get to the code!

Code

Note that I used Google Colab for this project, so I had to import the dataset from my Google Drive in the first few lines of code. If you are not using Google Colab, ignore the first 2 lines of code.

Remember to modify your train_path and test_path variables, as they are specific to my computer in the code I am about to show you.

from google.colab import drive
drive.mount('/content/drive')# imports
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Activation, Dropout
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_hub as hub
import numpy as np
import matplotlib.pyplot as plt# setting variables and directories for training and testing paths
img_size = 224
batch_size = 32
epochs = 5
train_path = '/content/drive/My Drive/ASL-recognition/asl_alphabet_train/asl_alphabet_train'
test_path = '/content/drive/My Drive/ASL-recognition/asl_alphabet_test/asl_alphabet_test'# define image data generators for data augmentation and rescaling
augment_train_data = ImageDataGenerator(horizontal_flip=True,
                                        rotation_range=50,
                                        zoom_range=0.2,
                                        width_shift_range=0.2,
                                        height_shift_range=0.2,
                                        rescale=1./255)augment_test_data = ImageDataGenerator(rescale=1./255)# run image data generators on training and testing dataset
train_dataset = augment_train_data.flow_from_directory(train_path,
     shuffle=True,
     classes=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',
     'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'space', 'del', 'nothing'],
     target_size=(img_size, img_size),
     batch_size=batch_size)test_dataset = augment_train_data.flow_from_directory(test_path,
     classes=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',
     'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'space', 'del', 'nothing'],
     target_size=(img_size, img_size),
     batch_size=batch_size)# showing 8 images from training dataset
fig = plt.figure(figsize=(15, 10))
for i in range(1,9):
    plt.subplot(4,2,i)
    plt.imshow(train_dataset[0][0][i-1])
plt.show()# getting pretrained model for transfer learning and defining model
url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2  /classification/4"
download_model = hub.KerasLayer(url,input_shape=(img_size,img_size,3))
model = Sequential([
     download_model,
     Dense(29),
     Activation("softmax")
])# compiling model
model.compile(optimizer=Adam(1e-3),
loss="categorical_crossentropy",
metrics=['accuracy'])#training model
print("\n Model summary: ")
print(model.summary())
print("\n Model Training: ")
model.fit(train_dataset,
batch_size=batch_size,
epochs=epochs)# evaluating model
print("\n Model Evaluation: ")
model.evaluate(test_dataset)# saving model
model.save("/content/drive/My Drive/ASL-recognition/h5/asl_model.h5")# loading saved model
load_model = tf.keras.models.load_model("/content/drive/My Drive/ASL-recognition/h5/asl_model.h5",custom_objects={"KerasLayer":hub.KerasLayer})
print(load_model.summary())

What does this code mean? I’ll explain:

from google.colab import drive
drive.mount('/content/drive')# imports
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Activation, Dropout
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_hub as hub
import numpy as np
import matplotlib.pyplot as plt

Here, we import all necessary libraries and modules from Tensorflow, Keras, Numpy, and Matplotlib that we will need for the rest of the program.

# setting variables and directories for training and testing paths
img_size = 224
batch_size = 32
epochs = 10
train_path = '/content/drive/My Drive/ASL-recognition/asl_alphabet_train/asl_alphabet_train'
test_path = '/content/drive/My Drive/ASL-recognition/asl_alphabet_test/asl_alphabet_test'

We set variables that we will need to train and test the deep learning model, such as the desired image size, batch size, and number of epochs when training the model. The batch size is the number of images that will be propagated through the network before updating the weights of our model.

# define image data generators for data augmentation and rescaling
augment_train_data = ImageDataGenerator(horizontal_flip=True,
                                        rotation_range=50,
                                        zoom_range=0.2,
                                        width_shift_range=0.2,
                                        height_shift_range=0.2,
                                        rescale=1./255)augment_test_data = ImageDataGenerator(rescale=1./255)# run image data generators on training and testing dataset
train_dataset = augment_train_data.flow_from_directory(train_path,
     shuffle=True,
     classes=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',
     'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'space', 'del', 'nothing'],
     target_size=(img_size, img_size),
     batch_size=batch_size)test_dataset = augment_train_data.flow_from_directory(test_path,
     classes=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',
     'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'space', 'del', 'nothing'],
     target_size=(img_size, img_size),
     batch_size=batch_size)

We define 2 image data generators (1 for training and 1 for testing) to perform data augmentation on our images, which will then be fed into our deep learning model. We will manipulate our images with horizontal flips, rotation, vertical/horizontal shifts, and also re-scale our images so that each pixel has a value between 0 and 1. If it works correctly, you should see this for the training and testing image data generators, respectively:

# showing 8 images from training dataset
fig = plt.figure(figsize=(15, 10))
for i in range(1,9):
    plt.subplot(4,2,i)
    plt.imshow(train_dataset[0][0][i-1])
plt.show()

Simply setting up a figure and showing the first 8 images from our training dataset using matplotlib. If all goes correctly, you should see something like this, although the individual images will be different.

# getting pretrained model for transfer learning and defining model
url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2  /classification/4"
download_model = hub.KerasLayer(url,input_shape=(img_size,img_size,3))
model = Sequential([
     download_model,
     Dense(29),
     Activation("softmax")
])

We are using the ImageNet classification CNN model (published by Google) for transfer learning and redefining the last dense layer to output the probability of each of the 29 classes that we defined. You can learn more about transfer learning here.

# compiling model
model.compile(optimizer=Adam(1e-3),
loss="categorical_crossentropy",
metrics=['accuracy'])

We compile our model and decide how we calculate its loss (categorical cross-entropy), the optimizer we use to update the weights (Adam), and how we measure the success of the model (accuracy).

#training model
print("\n Model summary: ")
print(model.summary())
print("\n Model Training: ")
model.fit(train_dataset,
batch_size=batch_size,
epochs=epochs)

Finally, we get to train our model. First, we view a summary of the model, and then we train the model (model.fit()) on our training dataset using our specified batch size and number of epochs. If all of your code is correct, it should look something like this (except there will be 10 epochs, not 5):

# evaluating model
print("\n Model Evaluation: ")
model.evaluate(test_dataset)

Once our model has been trained to a satisfactory accuracy, we can evaluate it on the testing dataset to determine whether our model has been over-fitted. You should see something like this:

# saving model
model.save("/content/drive/My Drive/ASL-recognition/h5/asl_model.h5")

If we are satisfied with the model, we can save it to a specified directory to use for later. I decided to save it in the h5 format, as it is small and doesn’t require super large amounts of space to store.

# loading saved model
load_model = tf.keras.models.load_model("/content/drive/My Drive/ASL-recognition/h5/asl_model.h5",custom_objects={"KerasLayer":hub.KerasLayer})
print(load_model.summary())

If we want to access our model for later, we will be able to load it in from the specified directory above, and do all the things that one would be able to do with a normal model (e.g. train, predict, show summary, evaluate, etc.).

Conclusion

If you want to see the completed code, you can view it at my Github repository here.

With that being said, I hope you enjoyed my article! Feel free to check out my other articles, with many more coming soon!

If you have any questions or would like to connect, feel free to email me at: alexander.chow911@gmail.com

To learn more about me: LinkedIn

Programming an AI to Recognize Sign Language with Tensorflow and Keras

Context

Code

What does this code mean? I’ll explain:

Conclusion

Written by Alexander Chow