How to Train a Real-Time Facemask Object Detector With Tensorflow Object Detection API (TFOD2)

horczech
The Startup
Published in
10 min readNov 11, 2020

The goal of this tutorial is to provide a detailed, step-by-step tutorial that can be (hopefully 🤞) followed by anyone with elementary programming skills, who is interested in the training object detection model on its own dataset.

Prerequisites

  1. Setup the Tensorflow Object Detection API (see the tutorial)
  2. Download the facemask dataset from Kaggle
  3. Download the helper files from my GDrive
  4. Make a big cup of coffee ☕️

The GDrive contains many little cheats that can make your life a little bit better.

  • scripts/tfrecord_converter.py : just change the paths the downloaded Kaggle dataset and it will convert it to TFRecord files and generates Label Map
  • scripts/web_cam.py : uses your checkpoint on webcam stream, just change a few paths at the bottom of the file
  • scripts/environment.yaml : rplicate my virtual environment for TensorFlow2 using the Anaconda
  • trained_model_checkpoint: contains a reasonably trained model for face-mask detection together with the pipeline.config file so you can try detecting objects without any training!
  • training_resources: just copy this folder to your Docker container and start training. All you need is there, just run the following script
python models/research/object_detection/model_main_tf2.py 
--pipeline_config_path ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001/pipeline.config
--model_dir ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001

Some of the scripts used in this tutorial require Python 3.6 and TesnsorFlow2 installed on your computer. If you don't have it, simply install Anaconda and create a virtual environment from my exported scripts/environment.yaml using this command:

conda env create -f environment.yml

then you can activate the environment using

conda activate tf2_cpu

Dataset Preprocessing

After downloading the facemask dataset from Kaggle, we have to convert both image data and its annotations into a TFRecord file format which is supported by TensorFlow.

What is TFRecord?

The TensorFlow documentation describes TFRecords as a simple format for storing a sequence of binary records. The data converted into this format take up less disk space, which makes every operation performed on the dataset faster. It is also optimized to handle large datasets by splitting it into multiple subsets.

The TFRecord file is nothing more than a list of tf.train.Example that is created for each image and its annotation. The tf.train.Example is a {key: value} dictionary containing, for example, the byte array of image data, coordinates of boundary boxes, class ids and other necessary features. The “key” is a string holding the name of the particular feature and the “value” is the feature itself that must be one of the tf.train.Feature types (tf.train.BytesList, tf.train.FloatList or tf.train.Int64List).

Example of tf.train.Example for single image

The conversion to TFRecord

To convert your custom dataset you can either write your own script from scratch using this tutorial from TFOD documentation or you can modify one of the existing scripts that can be found in the downloaded repository in models/research/object_detection/dataset_tools/ folder.

In case of the facemask dataset, I've created my own script where you have to just set the path to the annotation folder, the folder with images and the output folder. The script will automatically split the dataset into a training and evaluation part and creates the “train.record” and “eval.record” files that can be used for training. On top of that, the script will also create a “label_map.pbtxt” that contains the mapping between the class name and class id. Label map is another file that is essential to model training. It is a simple text file containing a mapping between the class id and class name. See the example of the label map below:

Model Training

Now, we are finally getting to the fun part! We can go to the TFOD Model Zoo and choose a model that we will try to train. The model zoo contains many models that are already pre-trained on the COCO 2017 dataset. The pre-trained models save us lots of time of training since it is already able to detect 90 categories. For the purpose of real-time face mask detector, I have downloaded one of the smallest and the fastest model available SSD MobileNet V2 FPNLite 320x320. The downloaded folder contains the checkpoint folder that contains the model checkpoint (containing model variables) that we will use for our training, saved_model folder that contains a model ready (model with variables + computational graph) to use and pipeline.config file that will be used to configure the training.

Training configuration

The training is configured in the pipeline.config file. The configuration seems very confusing at first since it contains a shit load of options, but in order to start training without any optimization, we have to change “only” 8 parameters.

  1. Row 3, num_classes: Change the number of classes to 3 since we are detecting only people without a mask, with mask and people that are not wearing a mask correctly.
  2. Row 138, batch_size: The default batch size is typically too large for people without supercomputers or people that don't have a dataset with tens of thousands of images. So I usually decrease the batch size to something between 5 and 32. Keep in mind that if you are training on GPU the whole batch + model has to fit the GPU memory. If it doesn't fit, the training fails on memory overflow.
  3. Row 162, fine_tune_checkpoint: set the path to the downloaded model checkpoint without any suffix e.g. path_to_downloaded_model_folder/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0
  4. Row 168, fine_tune_checkpoint_type: change the checkpoint type to “detection”
  5. Row 172 & Row 182, label_map_path: set the path to the label_map.pbtxt file
  6. Row 174, input_path: set the path to the generated TFRecord file with the training data
  7. Row 174, input_path: set the path to the generated TFRecord file with the evaluation data

In the code snippet below, you can find an unchanged pipeline.config file with marked lines that you have to change:

pipeline.config

To further optimize your model training, you can change the train_config part of the config file (starts at row 137 ). For example, you can change num_steps (row 163) which defines how many steps will be the model trained. One step means one batch of images followed by the recalculation of model weights (variables). Another commonly changed parameter is learning_rate. When the training is too slow you can increase the learning rate, or when the training loss is too unstable you can decrease it. Also, when you are not using a constant learning rate (the default learning rate function for our model is cosine decay), make sure that the total_steps of the learning rate is set to the same value as num_steps in order to slowly decrease the learning rate with an increasing number of steps all the way to zero (learn more about learning rate here). I also like to change the data_augmentation_options which allows you to easily add data augmentation options. This can help to prevent early overfitting in case you have a small dataset.

Unfortunately, there is no documentation covering all the configuration options so you have to check the comments in source files to find out more. Go to the models/research/object_detection/protos/ where you find all proto files with the options. The entry file is pipeline.proto. For example, all possible options of data augmentations can be found in preprocessor.proto.

Start training!

In order to start training, we have to start the Docker container with the TFOD environment, that we prepared in the setup tutorial, and copy the downloaded model, training data, evaluation data, and label map to the container and start training.

First of all, start the Docker Desktop app and open the PowerShell. In PowerShell window write “docker images -ato check that the Docker image for Object Detection called “od” was created.

Before starting the container it is a good idea to create some folder e.g. “training_resources” that we will mount to the container. That way we can easily share files between the container and our computer. Afterward, we can start the container with the shared folder simply by running:

docker run -it -v path_to_shared_folder\training_resources:/home/tensorflow/training_resources od

where the “-v” mounts the shared folder by entering the path to the shared folder and path where to mount it in container divided by a colon without any space. The “od” at the end of the command is the name of the Docker image.

In the shared folder create a new folder “dataset” where you copy eval.record, train.record, label_map.pbtxt and next to the dataset folder copy the downloaded model (so you get the same file structure that is on my GDrive). Lastly, you have to modify and copy the default configuration file pipeline.config so it works with your dataset. I like to create a folder for each training with its own config file and put it inside the model folder e.g.

"...\training_resources\ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8\training\001\pipeline.config"

When you will be setting the paths in your config file make sure to full absolute path from the root directory of the Docker container. For example, the path to the checkpoint file will be

"/home/tensorflow/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0"

FINALLY, we can start the training by running the following command:

python models/research/object_detection/model_main_tf2.py 
--pipeline_config_path ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001/pipeline.config
--model_dir ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001

Where “~” is a shortcut for the home directory which is “/home/tensorflow/” so you don't have to writhe full path all the time and the “model_dir” is a folder where will be the training output saved.

If the training starts correctly, you should see every 100 steps a log message in PowerShell containing information about the current loss.

If everything goes right you should be able to see after a while log information with step number and current loss. This information will be printed every 100 steps.

How to find out that the model is trained?

The main indicator of the training progression is the loss function. The loss function tells us how bad is the model prediction on training and evaluation data. Here we have to make a clear distinction between the training and evaluation dataset. The training dataset is used to train the network meaning that training data are used to modify the model variables. On the other hand, the evaluation dataset is used only to find out how good or bad our model is in doing predictions on new data that it didn't see during the training. So the evaluation data does NOT affect the model variables.

The ideal training (blue line) and evaluation (red line) loss that we want to achieve are in the image below. You can notice that the evaluation loss is always higher than training loss which makes sense since the model is trying to find a model with the smallest possible loss and it will always perform a little bit worse on brand new data. When both losses decrease, we can keep training more and get better and better results. The situation changes when the evaluation loss stops decreasing (or even starts increasing). From that moment, the model will start losing its ability to generalize the knowledge which results in worse performance on new data. You can also imagine it as if the model starts to memorizing the training dataset by heart instead of deducing general knowledge that would be transferable to new data.

Our goal is to find the “optimal capacity” point where we get the best possible model.

source: https://srdas.github.io/DLBook/ImprovingModelGeneralization.html

The TFOD helps us to monitor the training process by generating the training summary in TensorBoard. In order to generate the evaluation loss, you have to open a new PowerShell window and connect the Docker container where you are training your model using the command:

docker exec -it 3d2966956350 bash

where “3d2966956350” is the id of the container. Now you can run the evaluation script using the same command as the training script with an extra flag “checkpoint_dir” pointing to the same directory asmodel_dir” :

python models/research/object_detection/model_main_tf2.py 
--pipeline_config_path ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001/pipeline.config
--model_dir ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001
--checkpoint_dir ~/training_resources/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/training/001

That will run the coco evaluation every 5 minutes and it will generate the files for TensorBoard. You can start the TensorBoard in another PowerShell (this is the last PowerShell 😅) using the command

tensorboard --host=127.0.0.1 --logdir=path\to\the\training\output

Where the “log_dir” path is the path to the training output (the same path as the “model_dir”). Afterward, you can open some browsers and go to 127.0.0.1 where will be the TensorBoard running where you can find the loss functions, learning rate, mAP (mean average precision) and detection examples on evaluation images.

Using the model with web-cam

Great! You have trained your first model! Hooray🎉🎉

This is the simplest step, all you have to do is to download my web_cam.py and change the paths to the label_map.pbtxt, pipeline.config, and checkpoint files that were generated in the output folder. Keep in mind that each checkpoint has 2 files e.g. “ckpt-31.index” and “ckpt-31.data-00000-of-00001” so when you will manipulate with checkpoint don't forget to keep both.

Now, you can kick back, start the script and enjoy the sweet fruits of your labor 👏👏👏

--

--