Multi-gesture detection with Tensorflow API ‘TFOD’

MOHAMED BEN ALI
Linagora LABS
Published in
13 min readJan 25, 2022

What is LinTO ?

LinTO is an intelligent Open Source assistant based only on Open Source technologies, It’s GAFAM free and easy to install on a cloud.

Research and development in the LinTO project pursued two paths. The first was to create functionalities common to personal assistants, allowing users to use their voice to read their mails, consult their agendas, open and share documents, search for information, and so on. The second consisted in developing advanced options to produce high-quality transcriptions of full multi-party conversations and follow conversational exchanges in order to take notes, produce relevant real-time recommendations, and automatically generate meeting minutes.

LinTO has been developed both as a device that runs on the Raspberry Pi (or other ARM smart boards) and as an open-source platform that can be deployed, adopted, or maintained either directly on a company’s premises or remotely as a service (SaaS).

At the end of our meeting, a report has to be drawn up, this is where LinTO comes in with the capability to perform voice, gestural detection.

Let’s consider a use case in which six members of a team are present in a work meeting.

During this meeting, let’s assume there were three people who spoke and four people who voted for new internal regulation.

During my scientific research with LINAGORA Labs, I worked mainly on visual detection approaches such as the detection of people and their gestures.
In this article, I will explain the different steps to build a dataset of images with several gestures and how to train and evaluate a model with the Tensorflow API tool.

In the upcoming parts, I will detail the process that is necessary to perform facial and gestural detection through the construction of a dataset of images.

We’ll start with the setup.

Step 1: Prepare development environment

First of all, we have to install the environment in ubuntu 20.04 machine to run your object detection application.

  • Install and configure python and pip3:
# At first you will update & upgrade your ubuntu machine
sudo apt -y update && sudo apt -y upgrade
# Install python 3.8 repository
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.8
# Install pip for python3 on ubuntu 20.04
sudo apt install python3-pip
# Verify the installation by cheking the pip3 version
pip3 --version
# Output
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
  • Install Jupyter

Jupyter-notebook is an open document format based on JSON. They contain a complete record of the user’s sessions and include code, narrative text, equations …

To install jupyter on your machine, you need to type the following command:

pip3 install jupyter

Jupyter will be installed on the ubuntu 20.04 system, to run it, type the command mentioned below:

jupyter notebook
  • Install and configure Opencv

To create a model for object detection, you will install Opencv is a graphic library. She specializes in image processing, whether for still photos or video.

# Refresh the packages index and install the OpenCV package
sudo apt update
sudo apt install python3-opencv
# Verify the installation
python3 -c "import cv2; print(cv2.__version__)"
# Output
3.2.0

To install the latest OpenCV version from the source, perform the following steps:

# Install the required dependencies
sudo apt install build-essential cmake git pkg-config libgtk-3-dev \
libavcodec-dev libavformat-dev libswscale-dev libv4l-dev \ libxvidcore-dev libx264-dev libjpeg-dev libpng-dev libtiff-dev \ gfortran openexr libatlas-base-dev python3-dev python3-numpy \ libtbb2 libtbb-dev libdc1394-22-dev zip nano
#Clone the OpenCV’s and OpenCV contrib repositories
mkdir ~/opencv_build && cd ~/opencv_build
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
# Create a buid directory
cd ~/opencv_build/opencv
mkdir build && cd build
# Set up the OpenCV build with CMake
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D INSTALL_C_EXAMPLES=ON \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D OPENCV_GENERATE_PKGCONFIG=ON \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_build/opencv_contrib/modules \ -D BUILD_EXAMPLES=ON ..

You will see something like the screen below:

Start the compilation process:

make -j64

Modify the -j flag according to your processor. If you do not know the number of cores in your processor, you can find it by typing nproc.

The compilation may take several minutes or more, depending on your system configuration.

Install OpenCV with make:

sudo make install

To verify the installation, type the following commands and you should see the OpenCV version.

C++ bindings:

pkg-config --modversion opencv4

Output:

4.5.3

Python bindings:

python3 -c "import cv2; print(cv2.__version__)"

You will see something like below:

  • Install Tensorflow and Tensorboard

TensorFlow is a free and open-source software library for machine learning and artificial intelligence.

TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow.

To install the two tools, type the command mentioned below:

# Install Tensorflow
pip3 install tensorflow
# Install Tensorboard
pip3 install tensorboard

To verify the installation, type this command:

import tensorflow as tfprint(tf.__version__)

Output:

2.6.0

Or you can type the following commands to verify:

# Show installation of tensorflow and version
pip3 show tensorflow

Output:

# Show installation of tensorboard and version
pip3 show tensorboard

Output:

  • Install Tensorflow API

As we are going to build an object detection model with Tensorflow API, we need to install all dependencies. All steps you can find on the installation page.

Tensorflow Object Detection API depends on the following libraries:

  • Protobuf
  • setuptools
  • cython
  • Pillow
  • lxml
  • tf_Slim
  • CocoAPI
  • Matplotlib

To install all these libraries, you have to type the following commands below:

# The remaining libraries can be installed on Ubuntu using via apt
sudo apt install protobuf-compiler python3-pil python3-lxml
# Install python packages with pip3
pip3 install matplotlib
pip3 install tf_slim
pip3 install setuptools
pip3 install cython

We need to install TensorflowAPI using the following commands:

# Create an folder named Tensorflow (for example)
mkdir Tensorflow
cd Tensorflow
# Use Git to clone the Model Garden for TensorFlow
git clone https://github.com/tensorflow/models.git
# Clone COCO API
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
# Edit the Makefile with changing python to python3
make
cp -r pycocotools ../models/research/
# Install protobuf in models/research/
cd models/research
mkdir protoc
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zipunzip protobuf.zip
./protoc/bin/protoc object_detection/protos/*.proto --python_out=.
# Install the Object Detection API
# From within TensorFlow/models/research/
cp object_detection/packages/tf2/setup.py .
python3 -m pip install .

Output:

Successfully installed object-detection-0.1

Add research/slim to your PYTHONPATH:

# From within TensorFlow/models/research/slim
export PYTHONPATH=$PYTHONPATH:TensorFlow/models/research/slim

To test the installation, run the following command from within Tensorflow\models\research:

python3 object_detection/builders/model_builder_tf2_test.py

You should observe a printout similar to the one below:

Once you restart the machine you must run the following commands in your terminal to launch the Tensorflow API.

# From within Tensorflow/models/research
protoc/bin/protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:TensorFlow/models/research/slim

Step 2: Construction of gesture images dataset

To build our dataset of gesture images, we must choose the set of gestures we will use. It’s best to use universal gestures, such as those documented here:

Subsequently, I chose 10 gests from the list above.

Gesture name = hello

Gesture name = hand_up (handup)

Gesture name = victory

Gesture name = I love you (iloveyou)

Gesture name = yes

Gesture name = call me

(callme)

Gesture name = no

Gesture name = ok

Gesture name = okay

Gesture name = thanks

  • Build dataset of gesture images

The objective of this part is to record videos according to the 10 gestures mentioned in the previous part, you just have to use the camera of your computer, then you can use the cheese tool like below:

Once you finish recording videos for each gesture, you apply the following code to create frames from a gesture video.

The result of this code is as follows:

Before announcing the annotation of the gesture image dataset, the labelImg tool must be installed.

To install labeImg, type the following command lines:

# From github clone this repository
git clone https://github.com/tzutalin/labelImg
cd labelImg
# Install pyqt5
sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
# Run the tool
python3 labelImg.py

This step of annotation is very important because, with more annotated images, one always obtains good results during the inference. It is necessary to annotate the images in Pascal VOC(XML file) format and in a regular way.
You can follow this demo video:

You can open the image directory on LabelImg and add a bounding box and a label name. When you save, it gives you a .xml file with the details of x,y coordinates of the bounding box (xmin, ymin, xmax, ymax) and labels name.

The following figure shows you the dimensions of the x, y coordinates obtained by the .xml file.

By finishing this step you should have a folder with .jpg images and a .xml file. The dataset folder will look like this:

Step 3: Prepare data for Transfer Learning

In this part, we have to prepare the data for Transfer Learning, then we divide our data into 90% training data and 10% testing data. Using the code located in the scripts folder of the project as shown in the next command:

# From scripts folder in the project
python3 partition_dataset.py -x -i <path_to_dataset_folder> -r 0.1
  • Create Label Map file

Once you have created the train and test folders, you must then create the label map file which contains the 10 labels, you will find the instructions in the .ipynb file. The following code will allow you to create the label map file:

Output:

item {
name:'hello'
id:1
}
item {
name:'yes'
id:2
}
item {
name:'no'
id:3
}
item {
name:'thanks'
id:4
}
item {
name:'callme'
id:5
}
item {
name:'ok'
id:6
}
item {
name:'okay'
id:7
}
item {
name:'vectory'
id:8
}
item {
name:'handup'
id:9
}
item {
name:'iloveyou'
id:10
}
  • Generate TFrecords

TFRecord is TensorFlows binary storage format. We can generate a TFRecord file for the train and test directory using python code in the scripts folder. Run the following line:

# Run generate_tfrecord.py for train
python3 generate_tfrecord.py -x generate_tfrecord.py -x <path_to>/train/ -l <path_to>/annotations/label_map.pbtxt -o <path_to>/annotations/train.record
# Run generate_tfrecord.py for test
python3 generate_tfrecord.py -x generate_tfrecord.py -x <path_to>/test/ -l <path_to>/annotations/label_map.pbtxt -o <path_to>/annotations/test.record

Output:

Successfully created the TFRecord file:/annotations/train.record
Successfully created the TFRecord file:/annotations/test.record
  • Download TF Models Pretrained Models

For this project, we have chosen Mobilenet SSD as the base model. The SSD models offer high speed and are ideal for detection on video streams. The Mobilenet is suitable for low-core devices as it consumes less space while still giving decent precision figures. To download the ssd_mobilenet_v2 model, just type the following command:

wget  http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
  • Edit pipeline.config file

Before starting the retraining of the mobilenet_v2 model, we must configure the pipeline.config file which is located in the download model folder.

It suffices to change the number of classes equal to 10, edit the path of the label list ‘label_map.pbtxt’, and edit the tf_record path ‘train.record, test.record’. Finally, Change the batch_size to 4.

Run the following code to make the necessary modifications:

Step 4: Retrain the model

Retrain the model using the following command:

python3 <path_to>/models/research/object_detection/model_main_tf2.py \ 
--model_dir=<path_to>/workspace_mohamed/models/mobilenet_version2 \
--pipeline_config_path=<path_to>/workspace_mohamed/models/mobilenet_version2/pipeline.config \
--num_train_steps=15000

Output:

Instructions for updating:
Use fn_output_signature instead
INFO:tensorflow:Step 8100 per-step time 1.150s loss=0.319
I0527 14:49:23.321400 140398352971584 model_lib_v2.py:682] Step 8100 per-step time 1.150s loss=0.319
INFO:tensorflow:Step 8200 per-step time 1.100s loss=0.324
I0527 14:51:14.390439 140398352971584 model_lib_v2.py:682] Step 8200 per-step time 1.100s loss=0.324
INFO:tensorflow:Step 8300 per-step time 1.098s loss=0.332
I0527 14:53:04.773121 140398352971584 model_lib_v2.py:682] Step 8300 per-step time 1.098s loss=0.332
INFO:tensorflow:Step 8400 per-step time 1.106s loss=0.274
I0527 14:55:01.560966 140398352971584 model_lib_v2.py:682] Step 8400 per-step time 1.106s loss=0.274
INFO:tensorflow:Step 8500 per-step time 1.179s loss=0.397
I0527 14:57:07.724631 140398352971584 model_lib_v2.py:682] Step 8500 per-step time 1.179s loss=0.397
INFO:tensorflow:Step 8600 per-step time 1.173s loss=0.284
I0527 14:59:06.614590 140398352971584 model_lib_v2.py:682] Step 8600 per-step time 1.173s loss=0.284
INFO:tensorflow:Step 8700 per-step time 1.271s loss=0.215
I0527 15:01:12.691615 140398352971584 model_lib_v2.py:682] Step 8700 per-step time 1.271s loss=0.215
INFO:tensorflow:Step 8800 per-step time 1.249s loss=0.275
I0527 15:03:16.936001 140398352971584 model_lib_v2.py:682] Step 8800 per-step time 1.249s loss=0.275
INFO:tensorflow:Step 8900 per-step time 1.259s loss=0.300
I0527 15:05:21.362021 140398352971584 model_lib_v2.py:682] Step 8900 per-step time 1.259s loss=0.300
INFO:tensorflow:Step 9000 per-step time 1.306s loss=0.186
I0527 15:07:25.405282 140398352971584 model_lib_v2.py:682] Step 9000 per-step time 1.306s loss=0.186

Step 5: Save graph

You can choose the latest checkpoint file to export it to a graph file for inference. Checkpoint files get stored in the data folder with the .index extension

To export the graph model use the following command:

python3 <path_to>/models/research/object_detection/exporter_main_v2.py \ 
--input_type=image_tensor \
--pipeline_config_path=<path_to>/workspace_mohamed/models/mobilenet_version2/pipeline.config \
--trained_checkpoint_dir=<path_to>/workspace_mohamed/models/mobilenet_version2/ \
--output_directory=<path_to>/workspace_mohamed/models/mobilenet_version2/export

Step 6: Evaluate model

The objective of the evaluation phase is to know the performance of the trained model in terms of mAP .

Then type the following command to evaluate our model:

python3 <path_to>/models/research/object_detection/model_main_tf2.py --model_dir=<path_to>/workspace_mohamed/models/mobilenet_version2 \
--pipeline_config_path=
<path_to>/workspace_mohamed/models/mobilenet_version2/pipeline.config \
--checkpoint_dir=<path_to>/workspace_mohamed/models/mobilenet_version2

Output:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.646
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.792
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.646
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.683
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.695
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.695
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695
INFO:tensorflow:Eval metrics at step 15000
I0921 10:30:51.929377 139872141592384 model_lib_v2.py:988] Eval metrics at step 15000
INFO:tensorflow: + DetectionBoxes_Precision/mAP: 0.646000
I0921 10:30:51.934390 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP: 0.646000
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.50IOU: 1.000000
I0921 10:30:51.935504 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP@.50IOU: 1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.75IOU: 0.792357
I0921 10:30:51.936339 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP@.75IOU: 0.792357
INFO:tensorflow: + DetectionBoxes_Precision/mAP (small): -1.000000
I0921 10:30:51.937054 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (medium): -1.000000
I0921 10:30:51.937712 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP (medium): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (large): 0.646009
I0921 10:30:51.938401 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Precision/mAP (large): 0.646009
INFO:tensorflow: + DetectionBoxes_Recall/AR@1: 0.683333
I0921 10:30:51.939035 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Recall/AR@1: 0.683333
INFO:tensorflow: + DetectionBoxes_Recall/AR@10: 0.694762
I0921 10:30:51.939728 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Recall/AR@10: 0.694762
INFO:tensorflow: + DetectionBoxes_Recall/AR@100: 0.694762
I0921 10:30:51.940590 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Recall/AR@100: 0.694762
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (small): -1.000000
I0921 10:30:51.941257 139872141592384 model_lib_v2.py:991] + DetectionBoxes_Recall/AR@100 (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (medium): -1.000000

There is another method to evaluate a model using Tensorboard, then type the following command:

# in your terminal
tensorboard --logdir <path_to_dir>/eval
# Output
2021-11-03 14:46:06.061849: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-03 14:46:06.061879: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.4.1 at http://localhost:6007/ (Press CTRL+C to quit)
# go to this link
http://localhost:6007/

Results with Tensorboard:

Step 7: Real-Time gestures detection

In this part of the project, we will show the results of gesture detection in real-time using the following code:

Output:

Conclusion

In this medium article, we presented the different steps to build a gesture detection model with the Tensorflow API and by retraining the mobilenet version SSD model.

In addition, we have built a dataset of gesture images which is located in the following repository: https://github.com/linto-ai/multi-hand-gesture-dataset.git
With this dataset of gesture images, we obtained an mAP equal to 64.6%, we can improve easily the precision by annotating more images for each label.

--

--

MOHAMED BEN ALI
Linagora LABS

Resaerch Engineer — Deep Learning & Big DATA Engineering