Recognising Eye Blinking With Tensorflow.js

Published in

The Web Tub

6 min readApr 1, 2022

For many years, developing Machine Learning (ML) projects have been mostly limited to Python users. Fortunately for web developers, this changes with the rise of libraries like Tensorflow.js, which allow us to train and develop ML models directly in the browser or Node.js environment.

Today, we will explore the world of “hybrid machine learning development” by converting eye blinks into text, with the usage of Face Landmark Detection model. The app will be using Vue.js and Framework7 to build the frontend, but feel free to use your preferred technologies. After finishing the tutorial, you will have a functioning app that can be run on the computer, as well as on iOS or Android devices (with the help of Monaca deployment).

Just a disclaimer — previous experience in ML is beneficial, but not necessary to follow along.

Link for the repository: Blink-To-Text.

Terminology

Let’s look at the technologies applied in this project.

Tensorflow.js
Tensorflow.js is a library for machine learning in JavaScript. It provides many pre-trained models that are ready to use, but also gives the user an option to train and build their own model directly in JavaScript. The whole list of pre-trained models can be found here.
If you are a complete beginner and want to try a simpler approach, I suggest you to start with ml5.js, which is built on top of Tensorflow.js and provides a user-friendly interface for building ML applications.

Vue.js & Pinia
If you are a frontend developer, you must have heard the name Vue.js. In short, it is a JavaScript framework for building user interfaces, which divides parts of code into logical segments called components. The motivation to choose Vue.js over React or Angular was the lightweight of the framework and easy learning curve.
As for the state management throughout our app, Pinia will do the work. It is similar to the Vuex library, but has a simpler API and does not require as much boilerplate code.

Monaca
“Cross-platform hybrid mobile app development platform and tools in the cloud.” In this app, Monaca will serve as an easy solution for deployment on a mobile device. You also have the possibility of development in the cloud with Monaca Cloud IDE, so you do not have to download anything on your computer.

Prerequisites:

Node v14.19.0 or higher,
free account on Monaca.

Workflow:

Implement functions to capture the camera (on both computer and mobile).
Load Tensorflow.js model and use the video from camera as an input.
Capture the prediction and convert it into text with Morse Code table.

App Wireframe

Setting up The Project

To simplify the development, we can start from a template provided by Monaca CLI. After creating an account, open the console and write the following commands:

$ npm install -g monaca         // install Monaca CLI
$ monaca login                  // login with your new account
$ monaca create Blink-To-Text   // create new projectChoose: Onsen UI and Vue.js -> Framework7 Vue3 Minimal

After this, you should see a new folder “Blink-To-Text” created. Open it in your preferred code editor and we can start coding.

Capturing Video

Firstly, we need to install the dependencies:

$ npm install com.virtuoworks.cordova-plugin-canvascamera
$ npm install cordova-plugin-file

Go to “./js” and create a file named “hybridFunctions.js”. Start with three helper functions to detect type of device:

Next comes accessing the camera. Passed options object can contain settings for camera (e.g. width, height, etc.).

The mobile camera needs one more step to be loaded. CanvasCamera plugin saves the data on the device, therefore we access it through the readImageFile function:

The purpose of callback passed as an argument is to obtain the video and display it on the frontend side of the app. For this, we need to export the functions and create a Vue.js component in “./pages” called “PredictingPage.vue”. There we assign the returned values to <video> and <img> HTML elements for computer and mobile camera respectively.

The other two components are “LoadingPage.vue” (shown during loading the model) and “MoreCodePage.jsx” (includes instructions for the app). They do not include logic, so the code should be very straightforward.

Now that the videos are being captured, we can move on to another part.

Predicting Blinks

Let’s install all dependencies for Tensorflow.js:

$ npm install @tensorflow-models/face-landmarks-detection
$ npm install @tensorflow/tfjs-backend-webgl
$ npm install @tensorflow/tfjs-converter
$ npm install @tensorflow/tfjs-core

The code for blink prediction was inspired by this package. The author describes in detail the process of capturing the blink here. Simply put:

The Face Landmark Detection model captures points of the eye.
Eye Aspect Ratio (EAR) is calculated — Euclidian distance between points on top and bottom of the eye.
We get a result of either closed or opened eye.

Create a file “./js/blinkPrediction.js” that will encapsulate all the Tensorflow.js logic. Import the libraries and load the model:

The EAR_THRESHOLD value of 0.27 proved to be accurate in blinking prediction. The Euclidian distance was calculated by the function below.

This function is called on line 12 and 16 of the lower code snippet, after we obtain the predicted points.

Here on line 4, we use the video as input to our model, and we determine if the eye is closed by EAR value.
Since Morse Code needs two different inputs, long blinks will be assigned as a dash (“ — “) and short one as a dot (“.”). From line 22 to 34 we calculate how many consecutive frames included closed eyes, and therefore we can estimate the length of blink.

To see the predictions, load the model and create the first prediction of an empty image in “./components/app.vue” while the Vue.js app is loading.

After the camera is loaded, send the video to the startPrediciton(video) function in “PredictingPage.vue” and get a result. We need to await the result since the prediction takes some time.

Connecting Parts of the App Together with Pinia

Pinia will be our state management library to allow all parts of the app to share the data. Install it with npm install pinia and define it in “./js/app.js”.

Create a folder “./store” and inside it a file “blinkStore.js”. The content of the file is as follows:

We define the state of the app on line 5, and on line 11 we have actions. In startCapturingBlinks() you can see the 7-second timer, after which the captured sequence of blinks will be sent to the “./js/morseCodeTable.js” file containing Morse Code table, which returns a letter after conversion.
On line 16 we call the stopCapturingBlinks() if the number of captured blinks is four. This is due to the fact that none of the letters are encoded with more than four characters.

To access and show the items of the store, we must import the store and define a variable as shown below:

Now the individual components can talk to each other through the Pinia store. If you followed all steps, great job! You should be able to recognise eye blinking and convert it into text!

Conclusion

In this article, we have seen how to access the video from your computer or mobile camera and predict if the eyes did short or long blink. Conversion to text is only one of many possibilities of application of this prediction. You can use it to count blinks and detect if the person is tired, or control your app with winking of left or right eye…just use your imagination!
If you want to deploy and test it on your own mobile device, check this tutorial to guide you through the Monaca deployment process.

In case of any questions or suggestions for improvements, do not hesitate to contact me.

Good luck and have fun coding! :)

The prediction accuracy will vary depending on your phone’s capabilities.

Resources

https://www.tensorflow.org/js/
https://vuejs.org/
https://monaca.io/
https://pinia.vuejs.org/