Best Machine Learning Case Study I have ever done(For Blind and Deaf People)

Nishesh Gogia
Analytics Vidhya
Published in
11 min readAug 20, 2021

--

A Machine Learning case study…

It is a very difficult world when we are unable to interact with our fellow mates. Imagine the world for deaf and mute, they face more challenges in communicating than we do, they surely can use sign languages but how many of us are thorough with the sign languages. Hence, we recognized this problem and thought of making the interaction easier. We are using a Leap Motion Controller for the gesture recognition for which we created a Machine Learning model that will detect the gesture and tell it further to the people who are not aware of the sign language, hence making it easier for interaction.

Human Computer Interaction (HCI) plays an important role in today’s world with the increasing application of computer in everyday life. HCI aims to make human machine interaction more natural, like the human interaction. Hand gestures are able to represents ideas and actions using different hand shapes which can be identified by the gesture recognition system.

For many hearing impaired individuals, sign language is the primary means of interaction, these gestures can be classified as manual and non-manual gesturs. Manual gestures are the ones that consider only hand movements, whereas non-manual are the ones that include other feature such as facial expressions and lip movements. The gestures used in our model are manual gestures.

We have used the basic alphabets for the gesture recognition. These alphabets are classified as manual gestures. The data was collected by the help of Leap Motion Controller and then seperated for training and testing purpose. These were then fed to the ML model to give a desired output at the testing.

Leap Motion Controller is a new device developed for gesture interaction by Leap Motion.

The project is to aid the deaf and mute, the work started by implementing the LMC in python and running the libraries. Then we optimise the code for data collection and data collection was done. After the collection of data, data processing and feature extraction was done. We then used these features for our ML model. The training and further testing of model was done, This was then integrated with our application models.

LITERATURE SURVEY

This Project work had a lot of different aspects as I went through the Internet i found different ways gesture recognition was done and i further did my research for which kind of sensors and algorithms to use. I went through different research papers and found LMC to be most suitable.

The Leap Motion sensor or controller is a small, promising and highly accurate device that connects with a PC or mac and enables users to manipulate digital objects with hand motions. It is also used for user friendly gestures recognition services. It works with other hardware and can provide more accurate and robust results. Until this point nearly all interaction with computer programs required an intermediary step(Mouse,keyboard,etc) between the human hand and digital environment. The Leap Motion Controller is a big step towards bridging this gap and allowing humans to manipulate computer programs in a similar manner that they manipulate real world.

Some of the applications where Leap motion sensor can be used are Gesture based computing allow to user to play games, it can be used to create designs.

LMC uses an infrared scanner and sensor to map and track the human hand, This information is used to create, in real time , a digital version of the hand that can manipulate digital objects. LMC only works with the programs that are specifically written for it.

LMC is a computer hardware sensor device that supports hand and finger motions as input like a keyboard or a mouse, requiring no hand contact or touching.

Most of the work in this project is inspired by the beautiful research paper by B. P. Pradeep Kumar(Department of ECE, Jain University, Bangalore) and M. B. Manjunatha(Department of ECE, AIT, Tumkur)

Hand Gesture Recognition Using Leap Motion Controller for …http://ipco-co.com › PET_Journal › Acecs-2016

In this paper they have used variety of algorithms like KNN(K nearest neighbor) and SVM(support vector machine),Decision Trees.

BUSINESS PROBLEM

Business problem is very simple,

Lets say i own a mcdonalds store or a Dominos store which stays too busy all the time, now one day a mute couple came to my store and tried to place orders but the staff at the counter did not understand their language and mind you, store is very busy, People are highly impatient when it comes to order their food. My staff can’t give too much time on one customer so he/she asked the mute couple to wait for a while.

Now for everyone around, it’s not a big issue but for the couple it can be highly embarrassing. Being a store owner i don’t want to give customers bad experience so i decided to think about this problem.

One Solution could be to train every staff member with sign language which could be very costly and impractical.

Another solution is add self ordering machines like ATM’S where anybody can come and simply place order via machine and grab their meal, this could be a good solution but I as an owner want to give those customers a special treatment.

Another solution is to add a Machine Learning System to the store which can understand the sign language and can translate sign language to ENGLISH/HINDI to the staff, this will give people a special treatment and also could be fun for other customers specially to the kids.

CONSTRAINTS

1.Low Latency(In Real time within in few nano seconds the model should translate sign language to english language)

2.Errors can be more embarrassing for the mute/deaf people and that can affect the Brand Value.

MACHINE LEARNING FORMULATION

This is a very simple Multi-Class classification problem where we have 26 classes (A to Z)

DATA

Data is manually produced by Leap Motion Sensor for every alphabet, Total 26000 rows has been taken from the Sensor and every five rows represent one instance of alphabet so we have 200 rows for every alphabet.

PERFORMANCE METRIC

Now here, it is a Multi-Class Classification problem, We want High Precision and High Recall so F1 score will be a good metric and we don't have any imbalance in the data so F1_weighted will work fine.

Another metric can be multiclass logloss.

DATA COLLECTION

LMC was connected to python to extract the values for training our model, to connect LMC with python we used LEAP.PY file given by LMC API. We imported Leap.py file from the folder provide by Leap.

Code in leap.py shows the different libraries such as circle gestures,key tap gesture and swipe gestures were used. I have taken into consideration the finger names, bones in each finger and their start and end points. In each frame the LMC gives us the value for each finger its position, it is distal bone co-ordinate, the palm width, palm positions and palm radius as shown in the picture.

I have used Indian Sign Language to generate the data shown in the picture.

I have collected the following values for each set of hands:

  1. Palm position
  2. Sphere position
  3. Finger position
  4. Distal bone coordinate foreach finger.

Below picture shows the data collected for the letter “A”. The data then formulated in the upcoming steps for the machine learning model.

DATA PRE-PROCESSING/FEATURE EXTRACTION

Now for one Alphabet let’s say “A” we have five rows in the data, we have to convert these five rows to one row because these five rows are telling us about one incident when LEAP gives us the coordinates of 5 fingers for one alphabet (here A). Fore every finger we have different set of values(refer the above image). I have labelled data manually.

In raw data we had 26000 values and for every 5 values represent 1 alphabet, so it means in total we have 200 values for each alphabet.

Leap motion returns a set of relevant hand points and some hand pose features. The extracted key points are coordinates of finger positions from the input gesture. The points are, the centre of palm(say A), tip of thumb(say B), index finger(say C),middle finger(say D), ring finger(say E), pinky finger(say F) for each hand. The coordinates be A(x1,y1,z1),B(x2,y2,z2),c(x3,y3,z3),D(x4,y4,z4),E(x5,y5,z5),F(x6,y6,z6) for each hand.

First feature extracted is EUCLIDIAN DISTANCE, feature points corresponding to each gesture are stored in database file. During Testing, a current gesture is captured and key points are extracted from that gesture. At run time, distances are calculated from the extracted feature points using Euclidian distance formula as follows:

di=(((xi-(xi+1))²+(yi-(yi+1))²+((zi-(zi+1))²)^(1/2)

where i=1 to 8 for single handed gestures.

Distance from palm to tip of the finger will form one kind of features

d1(distance from palm centre to tip of thumb )

d2(distance from palm to tip of index finger)

d3(distance from palm to tip of middle finger)

d4(distance from ring finger to middle finger)

d5( distance from middle finger to index finger)

Distance between fingers will form extra features

d6(distance from pinky finger to ring finger)

d7(distance from ring finger to middle finger)

d8(distance from middle finger to index finger)

Third kind of features extracted were the cosine angles between 2 fingers

COSPR(cosine angle between pinky finger and ring finger)

COSRM(cosine angle between ring finger and middle finger)

COSMI(cosine angle between middle finger and index finger)

COSIT(cosine angle between index finger and thumb finger)

In total 12 features I have extracted from our raw data(d1,d2,d3,d4,d5,d6,d7,d8,d9,cospr,cosrm,cosmi,cosit). I made a new data frame from these 12 features called “BBV.csv”.

dataframe after extracting features

DATA ANALYSIS

Data analysis is a very important step to understand data, I have done Univariate and Bivariate Data Analysis of the data.

Its very difficult to put everything in one blog, so to see detailed data analysis please refer

Correlation Between different features

Very few features are correlated with each other as we can see so we don’t have to drop any feature as of now.

DIMENSION REDUCTION

This is a very important step as here we are using distance metric and we know the curse of dimensionality which says:

“given an observation of interest, find its nearest neighbors (in the sense that these are the points with the smallest distance from the query point). But in high dimensions, a curious phenomenon arises: the ratio between the nearest and farthest points approaches 1, i.e. the points essentially become uniformly distant from each other. This phenomenon can be observed for wide variety of distance metrics, but it is more pronounced for the Euclidean metric than, say, Manhattan distance metric. The premise of nearest neighbour search is that “closer” points are more relevant than “farther” points, but if all points are essentially uniformly distributed”

Dimension reduction is a very wonderful technique to visualize data in lower dimension.

Mainly there are two techniques which i gonna use in this project that is

  1. PCA(stands for Principle component analysis) The idea is very simply, we want to find the direction f’ such that variance of projection of xi’s over f’ is maximum, here xi’s are the data points.
  2. T-sne is a complex idea, but intutively it tries to embedd higher dimensional points to lower dimensions so that its easy to visualize them. TSNE Stands for t distribution stochastic neighbourhood embedding.

PCA

For code please refer:

Observation of PCA

  1. Its not very trivial to give 26 different clusters when we have so close hand gestures.
  2. We can see some clear clusters of alphabets which means hand gestures for different alphabets can be easily differentiated.

TSNE

For code refer this

Observation of T-SNE

1.Its a bit complex and difficult to visualize 26 classes in 2D but TSNE did a god job if not a great job.

2.I have use different perplexities and different number of iteration and came to this above conclusion because after this it was not changing much.

3.Now we dont have a clear picture but we can see some clusters and that is very much interpreatble because some hand gestures are really close.

MODEL TRAINING, EVALUATION AND HYPERPARAMETER TUNING

We have tried various algorithms like SVM,KNN,LOGISTIC REGRESSION, RANDOM FOREST and the algorithm which gave us the best performance metric is selected for deployment.

Please Refer this for more understanding…

Future Scope/Conclusion

This project gave me a way to help the deaf and mute and provide an aid to them, The project work was comprised under few headings such as data collection, in which we collected the data set for each alphabet giving us around 5 different values such as palm radius, palm position and figure tip position for each finger.

This dataset was then preprocessed and feature extraction was done, next step was training and testing of ML model.

KNN gave us the best f1 score so Random Forest can be used to deploy on any hardware.

The model can be further used in various applications. One of the applications can be to use this model in various hotels and restaurants. Since all the waiters are not well known to the sign language used by the deaf and mute can help the waiters to understand their orders.

Thanks for reading…

Nishesh Gogia

--

--