ML Tips&Tricks / TF2 OD API — training

Construction feat. TF2 Object Detection API

Are you protected?

Ivan Ralašić
Dec 9, 2020 · 11 min read
Image for post
Image for post
Collection of Construction Safety Helmet by skitterphoto from Pexels.com

Introduction

The construction industry is one of the most dangerous industries according to the common stats from OSHA:

Out of 4,779 worker fatalities in private industry in calendar year 2018, 1,008 or 21.1% were in construction — that is, one in five worker deaths last year were in construction. The leading causes of private sector worker deaths (excluding highway collisions) in the construction industry were falls, followed by struck by object, electrocution, and caught-in/between. These “Fatal Four” were responsible for more than half (58.6%) the construction worker deaths in 2018, BLS reports. Eliminating the Fatal Four would save 591 workers’ lives in America every year.

1. Falls — 338 out of 1,008 total deaths in construction in CY 2018 (33.5%)

2. Struck by Object — 112 (11.1%)

3. Electrocutions — 86 (8.5%)

4. Caught-in/between— 55 (5.5%)

Wearing proper Construction Personal Protective Equipment (PPE) is incredibly important for construction worker safety and can help reduce the number of serious construction-related injuries. Construction PPE includes eye and face protection, foot protection, hand protection, head protection, and hearing protection. The most important part of the construction worker’s PPE is probably the helmet. Helmets are mandatory wherever there is a potential for objects falling from above, bumps to the head from fixed objects, or accidental head contact with electrical hazards.

Monitoring for proper usage of PPE is a time consuming and hard task for safety managers on construction sites since the environment is very dynamic and a lot of workers from different trades are present on the site simultaneously.

At Forsight, a construction tech startup focused on construction safety and security, we strongly believe that a combination of technology and in-depth domain knowledge can go a long way in making construction sites a safer and better environment where the risk of a serious injury is reduced to a minimum.

In this article, we’ll show you the first steps towards building a safety monitoring solution for construction sites. We’ll show how TF2 OD API can be used to train a basic object detection model for safety helmets, as well as to give you a few tips&tricks on how to efficiently manage your data and experiments, so let’s dive in.

P.S. the supplementary Google Colab notebook should help you to reproduce the results and the training process in no time!

Dataset

Image for post
Image for post
Black and Gray Mining Rig by cookie-cutter from Pexels.com

Getting the data

An excellent helmet dataset by Liangbin Xie from Northeastern University in China was published on Harvard Dataverse. This dataset represents a good starting point for the development of a monitoring solution for proper PPE usage, although to create a solution for real-world use-case you’ll need to ensure that all the previously mentioned variables are covered. The dataset originally consists of 7035 images (5269 in the training set and 1766 in the validation set). The dataset includes three classes: helmet, head, and person.

We’ve gone a step further and labeled all persons and safety vests in addition to the previously existing labels. As a courtesy of Forsight, we’re sharing the improved version of the dataset in form of binary TFRecord files that can be used out-of-the-box to train your model using the provided Google Colab notebook. If you want to use additional data that you have collected and labeled by yourself, you’ll have to convert it into TFRecord format using the instructions provided here.

Advanced Dataset Management

As our team and efforts grew, so did our data. Managing data, especially associating which model trained on what data, became a time-consuming ordeal that forced us to spend time on manual documentation which proved to be an error-prone process, which in turn wastes more time trying to reproduce results.

Having already used Allegro Trains as our research management solution, it was an easy choice for us to upgrade to the enterprise solution that offers dataset management, along with other features. The dataset management from Allegro.ai allows researchers to accurately reproduce experiments by tying the specific data version that the network was trained on.

Image for post
Image for post
Screenshot of the Allegro.ai UI for dataset management (image by the author)

The dataset manager provides full visibility into which data is available and allows our researchers to write queries and get specific subsets of the dataset without data duplication and with an easy to use interface.

Image for post
Image for post
A GIF of Allegro’s workflow for dataset management (image by the author)

This is also the time when we abandoned the TFRecord files and integrated directly with Allegro’s system. This allows us to get frames on-the-fly, as needed, from our centralized data storage instead of having to copy the whole dataset to each new training machine. Allegro Trains’ caching made sure that data persists on the machine and there’s no need to copy it with each new experiment.

We also used the dataset manager to balance and debias our data. Naturally, some events and objects occur less frequently compared to others. If you want to address those events properly during the training of an object detection model, you have to make sure that this gap is bridged using either certain augmentations or synthetic data.

Having a single control panel for all of our machine learning stack proved to be invaluable. Being able to change the training dataset with a few mouse clicks reduced the amount of custom data-mangling code we had to write, and helped us avoid the even worse option — integration with another tool to manage our data and then writing custom integration code between the data management tool and the experiment management one.

Training, evaluation, testing

We’ve prepared a Google Colab notebook so you can start training your helmet detection model within seconds. We are using a fork of TF OD API repo since we’ve applied a patch to the pycocotools, which enables getting per-category mAP metrics. This way we get more granular insight into the performance of the trained model. For example, you might be interested in how the model performs in relation to the size of the observed objects. From the aggregated statistics, it’s impossible to get such insights, while the proposed patch enables you to do it! Furthermore, we’ll be using Trains, which offers powerful visualization capabilities and makes performance analysis of the trained model much easier.

Image for post
Image for post
Per category mAP metrics for EfficientDet D0 model on helmet dataset (image by author)

Finally, we’ve shared a script that allows you to perform the model evaluation without having to stop the training. This has been one of the biggest frustrations when switching from TF1 to TF2 OD API as reported, since the simultaneous train/eval workflow doesn’t work out-of-the-box! This script just handles both the train and eval part by using subprocesses, and it uses the exact same parameters as the model_main_tf2.py script.

Script to run train and eval for TF2. Usage instructions:python object_detection/model_main_tf2_train_eval.py \--model_dir PATH/TO/MODEL_DIR \--pipeline_config_path PATH/TO/DIR/pipeline.config \--num_train_steps <TRAIN_STEPS> \--sample_1_of_n_eval_examples 1 \--eval_time <EVAL_EVERY_N_SECONDS> \--task_name <TRAINS_TASK_NAME>

Configuring the training and evaluation pipeline

Image for post
Image for post
Close-up of Machine Part in Factory by pixabay from Pexels.com

The TensorFlow Object Detection API uses protobuf files to configure the training and evaluation process. The first part of the pipeline.config file is related to the general model settings (i.e. meta-architecture, feature extractor). The only thing that you need to change in this part of the config file is the number of classes num_classes present in your dataset!

# SSD with EfficientNet-b0 + BiFPN feature extractor,
# shared box predictor and focal loss (a.k.a EfficientDet-d0).
# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070
# See Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from an EfficientNet-b0 checkpoint.
#
# Train on TPU-8

model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90 # in our dataset we have 4 different classes
add_background_class: false
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: [1.0, 2.0, 0.5]
scales_per_octave: 3
}
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 512
max_dimension: 512
pad_to_max_dimension: true
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 64
class_prediction_bias_init: -4.6
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true
decay: 0.99
epsilon: 0.001
}
}
num_layers_before_predictor: 3
kernel_size: 3
use_depthwise: true
}
}
feature_extractor {
type: 'ssd_efficientnet-b0_bifpn_keras'
bifpn {
min_level: 3
max_level: 7
num_iterations: 3
num_filters: 64
}
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.99,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 1.5
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}

The next part of the config file defines the parameters that are used during the training step (ie. optimizer parameters, input preprocessing, and feature extractor initialization values). Since we’re going to use a pre-trained checkpoint in our training process, we need to define fine_tune_checkpoint path to our checkpoint file. Reusing pre-trained classification or object detection checkpoints (defined by fine_tune_checkpoint_type) speeds up the training process drastically since we’re not starting from randomly initialized model weights.

Another significant parameter in the train_config is the batch_size. On the one hand, the available GPU memory directly influences the maximum batch size, so you’ll need to reduce it so that the training data fits into the GPU memory. On the other hand, the batch size impacts how quickly a model learns and the overall stability of the learning process, so it is a very important hyperparameter that should be well understood and tuned!

train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0"
fine_tune_checkpoint_version: V2
fine_tune_checkpoint_type: "classification"
batch_size: 128
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
use_bfloat16: true
num_steps: 300000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_scale_crop_and_pad_to_square {
output_size: 512
scale_min: 0.1
scale_max: 2.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 8e-2
total_steps: 300000
warmup_learning_rate: .001
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}

The train_input_reader defines which dataset the model should be trained on. The TensorFlow Object Detection API accepts inputs in the TFRecord file format by default. You have to specify the locations of both the training and evaluation files. Additionally, you should also specify a label map, which defines the mapping between a class ID and class name. The label map should be identical between training and evaluation datasets.

train_input_reader: {
label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt"
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord"
}
}

The eval_config part of the pipeline.config file determines what set of metrics will be reported for evaluation. We’re using COCO object detection evaluation metrics which are defined here.

eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1;
}

Finally, eval_input_reader part of the pipeline.config file defines the evaluation dataset.

eval_input_reader: {
label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord"
}
}

After tuning the parameters discussed above, we just need to start the training process and wait for the loss to converge… sit and relax!

Testing the trained model

Image for post
Image for post
Person In A Construction site by aleksey from Pexels.com (with inference results overlaid)

Conclusion

In order to help ML/AI enthusiasts, and anyone interested in helping to solve this problem in general, we’ve shared a sample dataset for training a safety helmet detector that can be used in real-world scenarios. Furthermore, we’ve created and shared a Google Colab notebook which gives you the ability to train your own helmet detector and use it on your own data in a matter of minutes.

If you’re interested in this topic and you would love to work on preventing accidents on construction sites, please reach out to us.

References

  1. Xie, Liangbin, 2019, “Hardhat”, https://doi.org/10.7910/DVN/7CBGOS, Harvard Dataverse, V1
  2. Tensorflow Object Detection API, https://github.com/tensorflow/models/tree/master/research/object_detection
  3. COCO — Common Objects in Context, https://cocodataset.org/#home

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store