Sign Language Recognition with Keras VGG19 Image Model — Part 1

Mihir Garimella
4 min readAug 4, 2022

--

Detecting Sign Language Letters in Real Time Using MediaPipe and Keras

Before we begin, if you have already read this article or are looking for the next section of the article in which we explain the implementation of the model, you can find the link to the second part of the paper below:

Sign Language Recognition with Keras VGG19 Image Model — Part 2 | by Mihir Garimella | Aug, 2022 | Medium

Image of Sign Language ‘F’
Image of Sign Language ‘F’ from Pexels

Sign Language is a form of communication used primarily by people hard of hearing or deaf. This type of gesture-based language allows people to convey ideas and thoughts easily overcoming the barriers caused by difficulties from hearing issues.

A major issue with this convenient form of communication is the lack of knowledge of the language for the vast majority of the global population. Just as any other language, learning Sign Language takes much time and effort, discouraging to from being learned by the larger population.

However, an evident solution to this issue is present in the world of Machine Learning and Image Detection.

Many large training datasets for Sign Language are available on Kaggle, a popular resource for data science. The one used in the development of this model has about 3,000 images of each of the 26 English Letters in Sign Language, and the additional ‘Space’ sign.

Significant (ASL) Sign Language Alphabet Dataset | Kaggle

Sign Language ‘A’ taken from dataset mentioned

The first step of preparing the data for training is to create a validation and test set for using the images provided. This model doesn’t rely on validation/test information to produce results, so simply using the last couple of images in each letter category to make the two datasets is sufficient.

After the datasets are made, we move on to the preprocessing phase of the Keras model.

The code above both imports the required Keras commands and uses the ImageDataGenerator function to shape and size the images in the training data to suit the VGG19 model. The parameters in the train_datagen variable reshape the images in the training dataset so that the model understands the input image files.

After processing the images, the model must be set to recognize all of the classes of information being used in the data, namely the 27 different groups of images.

Notice the initialization of the algorithm with the adding of variables such as the vgg19 model, and the condensing to 27 features.

Finally, defining the loss functions and metrics along with fitting the model to the data will create our Sign Language Recognition system. It is important to recognize the model.save() command at the end of the statement due to the length of time required to build the model. Re-training the model for every use can take hours of time.

This code has a lot to unpack. Let's look at it in sections.

Line 1:

The model.compile() function takes many parameters, of which three are displayed in the code. The optimizer and loss parameters work together along with the epoch statement in the next line to efficiently reduce the amount of error in the model by incrementally changing computation methods on the data.

Along with this, the metric of choice to be optimized is the accuracy functions, which ensures that the model will have the maximum accuracy achievable after the set number of epochs.

Line 2–5:

The function run here fits the previously designed model to the data from the generators developed in the first bit of code. It also defines the number of epochs or iterations the model has to enhance the accuracy of the image detection.

Line 7:

Of all of the statements in the code bit, the model.save() function may be the most important part of this code, as it can potentially save hours of time when implementing the model.

Image of Sign Language ‘X’ from Pexels

The model developed accurately detects and classifies Sign Language symbols with about 80% accuracy. Together, with help from camera libraries such as Open-CV, we will create an app that can capture live images of hand gestures and predict, with confidence, what letter is being displayed.

All of the code used in this article is on my GitHub page, linked below:

mg343/Sign-Language-Detection (github.com)

In the next part of this article, we explain how to implement and integrate the model created with other image processing libraries to create real time predictions of Sign Language from your computer camera. The second article can be found at the link below:

Sign Language Recognition with Keras VGG19 Image Model — Part 2 | by Mihir Garimella | Aug, 2022 | Medium

--

--