Sign Language Translation in real time

5 min readOct 30, 2019

Over 5% of the world’s population suffers from disabling hearing loss.

The problem

A large proportion of the hearing-impaired population utilize sign language as means of communication and are unable to do so, satisfactorily, when it comes to online video communication due to lack of technological investment.

With the growing adoption of video communication and it’s application such as telehealth, people who are hearing-impaired need to have the means for proper and natural communication with their healthcare network, irrespective of whether or not the practitioner knows how to sign language.

Also..

There is no Universal Sign Language and achieving it is a near impossible task because of its incredibly dynamic nature as well as the fact that there are over 200 dialects of sign language in the world. Popular sign languages have been fortunate enough to come under the attention of some projects and have enough technological investment in it to produce results.

However, most of the not so popular sign languages have been left out and are not so fortunate. There is little to no work being done in such communities. I thought I could help.

The Solution

My aim was a towering one — develop a web application that translates the sign language in real time using the web camera by capturing the subject signing. The above mentioned goal would require:

Creating and Gathering data.
Training a model (Machine Learning) to identify the sign language.
Developing the user Interface.

Building the image data set

Machine learning is a part of AI (Artificial Intelligence) which is the procedure of teaching systems how to learn. All in all, we do this by giving computers a lot of instances of “labelled” data— for example here is a picture, and it is a canine — and educate the computer to discover likenesses regarding objects of a similar label; a procedure called “Supervised learning”.

In order to train the machine learning model to recognize a subject signing the alphabets of a sign language, we required a number of diverse images of people signing alphabets of the sign language along with which English letter each image depicted. The model would also require learning where the signing hands would be in each photo, and for that, we require bounding boxes around the hands in the photos of the data set.

Data sets for popular sign languages like ASL do exist but not for less popular ones.

Using various public online videos (from YouTube, Skillshare, etc.) of people demonstrating the sign language, we extracted every frame from every video and then manually drew bounding boxes for object localization to flag the letter in order to train the model to recognize and predict where the hands are. Since diverse input is always encouraged in machine learning, we manually captured around 700 images (currently, as of now) from 3 different subjects as well as in various different lighting conditions ranging from low — mid to bright light in order to increase accuracy during real time execution. The data set and images were processed to 640 x 480 pixel resolutions to improve the processing of the model without adversely affecting the data in a way that it degrades the accuracy in exchange for processing of the model.

Training the model

In order to classify the diverse set of alphabets we trained our Neural Network. A neural network is a progression of calculations that attempts to perceive hidden connections in a data set through a procedure that imitates the manner in which the human cerebrum works. In this sense, neural systems allude to frameworks of neurons, either natural or artificial in nature.

Since convolutional neural networks have been proven to be successful at image classification, we implemented the same for our problem. We trained a convolutional neural network using Pytorch (a Python machine learning framework) for the prediction of the position of the hands (each point of the bounding box), as well as the class of the image (the letter). With the final push, we reached a model that could predict the signed letters of the alphabet with an accuracy of about 90%.

The Endgame

Our aim was to interpret sign language in real time.

There were 2 final components left to the project. A backend service that would return the predicted letter when provided with the image of the sign; and; a front-end that utilized the subject’s web cam and capture and display video while approaching the backend for predictions.

For the backend, the model was wrapped up in a Flask app (Python). The 4 points of the bounding box and the letter (class) of the image would be returned when using a POST request with the image as the payload. Now on the client side, JavaScript was used to capture the users’ web cam using getUserMedia method on the browser. An invisible canvas was utilized in order to capture a frame from the video every 200ms, then asked for a prediction and showed the results.

User Interface

Since the project’s potential users are speech, hearing and vision impaired, the User Interface of the project has been kept simple in order to be attractive and practical at the same time to the end user. In order for the User Experience to remain as optimal as possible, the GUI and UI has a simple color scheme as well as text to speech in order to better the user experience of the vision impaired users. The web app will also be seamlessly integrated with video chat to maintain the theme of ease of access and simplicity. All of the GUI will be directed towards making things simpler for the end user. The interaction of the web application will be accessible throughout all age groups.

Where to now?

With a greater dataset and all the more tweaking of our models, we accept we could create precise and dependable innovation for signing. Obviously, communication through signing includes something other than just hands and letters; it consolidates facial appearances and gestured sequences to shape full sentences. While a solution for sign language interpretation is as yet an open issue, this project showcases an ever so slight push towards improving the lifestyle of the hearing impaired.