Transcribing Sign Language using PowerAI Vision — an Introduction

Peter Fox
Systems AI
Published in
3 min readMay 10, 2019

Written by Edward ffrench and Peter Fox

Early in 2019, Systems AI brought in six new faces to shake things up in London Southbank. We set ourselves the challenge of producing an engaging and enterprising demonstration of IBM’s PowerAI Vision technology. PowerAI Vision is IBM’s image and video analysis platform for use in deep learning. Initial discussions lead quickly to the suggestion of using AI to interpret sign language, with clear use cases for the deaf community. We wanted to design a technology, using PowerAI Vision, that would be able to take a video of somebody doing some sign language and translate those signs into text. This might then be readily extended to produce an audio output.

Choose your medium

Los tres amigos! Ed, Zoe and I signing “IBM”

We considered several internationally recognised sign languages but settled on American Sign Language (ASL) as our preferred medium. The reasoning being that, as well as being arguably the most widely used sign language internationally, ASL uses single-handed fingerspelling for its alphabet, making it the most intelligible language for computer vision and object detection. Also, all but one letter can be identified by just a single frame, which is not the case for many of its alternatives. It was also a convenient bonus that IBM is an American-owned company!

Whilst the precise details of the final product were yet to be ascertained, we knew enough to get started.

Obtaining a “good” dataset

Typical image from our training data set

One of the greatest challenges in deep learning is sourcing a sufficiently large dataset for your model to perform accurately. For us, this meant we needed a lot of images of people signing the letters of the alphabet. A good source of data is often the internet; however, we were unable to find any substantial existing datasets of high enough quality. Hence, we needed to create our own from scratch.

In practice, this involved team members taking thousands of photos using mobile and desktop devices. This created an entirely new problem of itself, which we will elaborate on in a later post. Whilst we have managed to produce a working model, we are constantly adding to our dataset to improve future models up to the writing of this very blog post. Another issue encountered in the gathering of our dataset was that we identified a need to obtain signings of each letter at different points relative to the body, at different angles and with either hand.

Coming up next time from the ASL team… more on our data cleaning process, including some cool bespoke Python coding solutions!

Developed by Edward ffrench AI Specialist, Peter Fox Technical Sales Specialist & Zoë Osorio Machine Learning Engineer at IBM

--

--