Face Detection for CCTV surveillance

--

Step-by-step tutorial for detection of faces in a surveillance frame using Tensorflow Object Detection API and customization of the pre-trained models in it.

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out.

INSTALLATION:

As this work is categorized under machine learning , basic packages for any deep learning learner would be tensorflow and keras(which we aren’t using anyways) , and here we have pure tensorflow changes to get this done.

pip install keras

If you are struggling with a CPU , without a Nvidia GPU:

sudo pip install tensorflow

And for geeks with GPU (Nvidia), make sure you checkout the driver compatibility and install the tensorflow with proper dependencies.please headout here and follow the instructions step by step and get your tensorflow installation done properly.

SETUP OF MODELS DIRECTORY:

Next ,step is to setup the models directory which is available in models , where we can just do :

git clone https://github.com/tensorflow/models

then, to setup the object detection API, please follow the official documentation here. Make sure the protoc and the python slim dependencies should be set accordingly in the research folder whenever we get any “not found” errors.

protoc object_detection/protos/*.proto — python_out=.export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Finally, let’s install the object_dection library formally by doing the following from within the models directory:

sudo python3 setup.py install

As of we are set with the setup of the Object detection API and tensorflow . Further let’s move on with the dataset creation.

DATASET CREATION:

We made our face detection dataset free to download and use by anyone.

Face Detection Dataset on Dataturks

For this task , we chose Grimace faces dataset. This dataset has 20 images of 18 individuals each who try to give different expressions over time with suitable lighting conditions. This dataset is even considered the best among many face datasets due to the diverse background and lighting conditions. Due to the same circumstances faced even with the CCTV footages where images aren’t that bright , we decided that this dataset would serve the purpose.

Few sample images of the individuals in the dataset

Now , further we need to annotate these images by making bounding boxes for which we used bounding box tool available on dataturks.com as shown below.

After annotating all the images by bounding the boxes over the face , we can download a json file which has label and the co-ordinates of the bounding box as (x,y) co-ordinates looks like :

json file after annotating the images

Since , tensorflow object detection API provides us an easy way to train on custom objects if we have the dataset in PASCAL VOC format where every image has its .xml file , we will use a python script which will convert this json to PASCAL VOC format.

                             #PASCAL_VOC_CONVERTER.py
import argparse
import sys
import os
import json
import logging
import requests
from PIL import Image
################### INSTALLATION NOTE #######################
##############################################################
## pip install requests
## pip install pillow
###############################################################
###############################################################
#enable info logging.
logging.getLogger().setLevel(logging.INFO)
def maybe_download(image_url, image_dir):
“””Download the image if not already exist, return the location path”””
fileName = image_url.split(“/”)[-1]
filePath = os.path.join(image_dir, fileName)
if (os.path.exists(filePath)):
return filePath
#else download the image
try:
response = requests.get(image_url)
if response.status_code == 200:
with open(filePath, ‘wb’) as f:
f.write(response.content)
return filePath
else:
raise ValueError, “Not a 200 response”
except Exception as e:
logging.exception(“Failed to download image at “ + image_url + “ \n” + str(e) + “\nignoring….”)
raise e
def get_xml_for_bbx(bbx_label, bbx_data, width, height):# We store the left top and right bottom as point ‘0’ and point ‘1’
xmin = int(bbx_data[‘points’][0][‘x’]*width)
ymin = int(bbx_data[‘points’][0][‘y’]*height)
xmax = int(bbx_data[‘points’][1][‘x’]*width)
ymax = int(bbx_data[‘points’][1][‘y’]*height)
xml = “<object>\n”
xml = xml + “\t<name>” + bbx_label + “</name>\n”
xml = xml + “\t<pose>Unspecified</pose>\n”
xml = xml + “\t<truncated>Unspecified</truncated>\n”
xml = xml + “\t<difficult>Unspecified</difficult>\n”
xml = xml + “\t<occluded>Unspecified</occluded>\n”
xml = xml + “\t<bndbox>\n”
xml = xml + “\t\t<xmin>” + str(xmin) + “</xmin>\n”
xml = xml + “\t\t<xmax>” + str(xmax) + “</xmax>\n”
xml = xml + “\t\t<ymin>” + str(ymin) + “</ymin>\n”
xml = xml + “\t\t<ymax>” + str(ymax) + “</ymax>\n”
xml = xml + “\t</bndbox>\n”
xml = xml + “</object>\n”
return xml
def convert_to_PascalVOC(dataturks_labeled_item, image_dir, xml_out_dir):“””Convert a dataturks labeled item to pascalVOCXML string.
Args:
dataturks_labeled_item: JSON of one labeled image from dataturks.
image_dir: Path to directory to downloaded images (or a directory already having the images downloaded).
xml_out_dir: Path to the dir where the xml needs to be written.
Returns:
None.
Raises:
None.
“””
try:
data = json.loads(dataturks_labeled_item)
width = data[‘annotation’][0][‘imageWidth’]
height = data[‘annotation’][0][‘imageHeight’]
image_url = data[‘content’]
filePath = maybe_download(image_url, image_dir)with Image.open(filePath) as img:
width, height = img.size
fileName = filePath.split(“/”)[-1]
image_dir_folder_Name = image_dir.split(“/”)[-1]
xml = “<annotation>\n<folder>” + image_dir_folder_Name + “</folder>\n”
xml = xml + “<filename>” + fileName +”</filename>\n”
xml = xml + “<path>” + filePath +”</path>\n”
xml = xml + “<source>\n\t<database>Unknown</database>\n</source>\n”
xml = xml + “<size>\n”
xml = xml + “\t<width>” + str(width) + “</width>\n”
xml = xml + “\t<height>” + str(height) + “</height>\n”
xml = xml + “\t<depth>Unspecified</depth>\n”
xml = xml + “</size>\n”
xml = xml + “<segmented>Unspecified</segmented>\n”
for bbx in data[‘annotation’]:
bbx_labels = bbx[‘label’]
#handle both list of labels or a single label.
if not isinstance(bbx_labels, list):
bbx_labels = [bbx_labels]
for bbx_label in bbx_labels:
xml = xml + get_xml_for_bbx(bbx_label, bbx, width, height)
xml = xml + “</annotation>”#output to a file.
xmlFilePath = os.path.join(xml_out_dir, fileName + “.xml”)
with open(xmlFilePath, ‘w’) as f:
f.write(xml)
return True
except Exception as e:
logging.exception(“Unable to process item “ + dataturks_labeled_item + “\n” + “error = “ + str(e))
return False
def main():
#make sure everything is setup.
if (not os.path.isdir(image_download_dir)):
logging.exception(“Please specify a valid directory path to download images, “ + image_download_dir + “ doesn’t exist”)
return
if (not os.path.isdir(pascal_voc_xml_dir)):
logging.exception(“Please specify a valid directory path to write Pascal VOC xml files, “ + pascal_voc_xml_dir + “ doesn’t exist”)
return
if (not os.path.exists(dataturks_JSON_FilePath)):
logging.exception(
“Please specify a valid path to dataturks JSON output file, “ + dataturks_JSON_FilePath + “ doesn’t exist”)
return
lines = []
with open(dataturks_JSON_FilePath, ‘r’) as f:
lines = f.readlines()
if (not lines or len(lines) == 0):
logging.exception(
“Please specify a valid path to dataturks JSON output file, “ + dataturks_JSON_FilePath + “ is empty”)
return
count = 0;
success = 0
for line in lines:
status = convert_to_PascalVOC(line, image_download_dir, pascal_voc_xml_dir)
if (status):
success = success + 1
count+=1;
if (count % 10 == 0):
logging.info(str(count) + “ items done …”)
logging.info(“Completed: “ + str(success) + “ items done, “ + str(len(lines) — success) + “ items ignored due to errors”)def create_arg_parser():
“”””Creates and returns the ArgumentParser object.”””
parser = argparse.ArgumentParser(description=’Converts Dataturks output JSON file for Image bounding box to Pascal VOC format.’)
parser.add_argument(‘dataturks_JSON_FilePath’,
help=’Path to the JSON file downloaded from Dataturks.’)
parser.add_argument(‘image_download_dir’,
help=’Path to the directory where images will be dowloaded (if not already found in the directory).’)
parser.add_argument(‘pascal_voc_xml_dir’,
help=’Path to the directory where Pascal VOC XML files will be stored.’)
return parser
if __name__ == ‘__main__’:
arg_parser = create_arg_parser()
parsed_args = arg_parser.parse_args(sys.argv[1:])
global dataturks_JSON_FilePath
global image_download_dir
global pascal_voc_xml_dir
#setup global paths needed accross the script.
dataturks_JSON_FilePath = parsed_args.dataturks_JSON_FilePath
image_download_dir = parsed_args.image_download_dir
pascal_voc_xml_dir = parsed_args.pascal_voc_xml_dir
main()

To run this script:

python PASCAL_VOC_CONVERTER.py output.json image_dowload_folder pascal_voc_out_folder

Then , go to the dataset folder and then split the annotation files with 80–20 train-test ratio and put train files into train folder and test files into test folder inside the dataset folder . This folder should have all the images with xml files outside and the two directories inside it should have the test and train images and xml files in overall.

Now we need to convert these XML files to singular CSV files that can be then converted to the TFRecord files.To do this , we’ll use xml_to_csv.py with few changes where :

def main():
image_path = os.path.join(os.getcwd(), 'annotations')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('raccoon_labels.csv', index=None)
print('Successfully converted xml to csv.')

should be changed to :

def main():
for directory in ['train','test']:
image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))
xml_df = xml_to_csv(image_path)
xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
print('Successfully converted xml to csv.')

where data is the dataset folder and also make sure that we create a directory called Object detection and it looks like :

Object detection
-data/
--test_labels.csv
--train_labels.csv
-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg
-training
-xml_to_csv.py

For now create a empty training directory and put it in this folder, this folder will have all the required files for the training . Now to generate the tf records, lets use generate_tfrecord.py and has one change i.e.,

# TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'face':
return 1
else:
None

After this,now we can run the generate_tfrecord.py script. We will run it twice, once for the train TFRecord and once for the test TFRecord.

python3 generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.recordpython3 generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record

Now, in our data directory, we should have train.record and test.record.

CONFIGURATION SETTING:

To train a new model or a customized pre-trained model let’s set up a configuration file for that. To do this, we need the Images, matching TFRecords for the training and testing data, and then we need to setup the configuration of the model, then we can train. Here, we are using transfer learning to do the job to take the advantage of quicker training with less data.

Tensorflow has a few pre-trained modelss like faster-rcnn with COCO validation , ssd_mobilenet etc.. which have been tested with their speeds and have been properly tuned by professionals.

different models with speeds in ms

For now ,we have used ssd_mobilenet over COCO to attain more speed in detection but if you are aiming for higher precision then FRCNN would serve the purpose. We will use checkpoint and also configuration file of the mobilenet to customize the training process for faces. Few other checkpoint options can be checked out here.

wget https://raw.githubusercontent.com/tensorflow/models/master/object_detection/samples/configs/ssd_mobilenet_v1_pets.configwget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz

We shall the config in the training directory, and extract the ssd_mobilenet_v1 in the models/object_detection directory. In the configuration file we need to search all the PATH_TO_BE_CONFIGURED points and change them to our requirement.Here we have to change the number of classes to 1 and batch_size can be altered based on our requirement.The default batch size is set to 24, if we encounter any memory issues then we can reduce the batch size to fit it accordingly in the VRAM.

Further we need to create a face-detection.pbtxt in training folder which looks like :

item {
id: 1
name: 'face'
}

if we have more than one class, we can assign different id’s with different class names.On overall our configuration file will look like:

                  ssd_mobilenet_v1_pets.config
# SSD with Mobilenet v1, configured for the mac-n-cheese dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "${YOUR_GCS_BUCKET}" to find the fields that
# should be configured.

model {
ssd {
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
anchorwise_output: true
}
}
localization_loss {
weighted_smooth_l1 {
anchorwise_output: true
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}

train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint:"ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
num_steps:20000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}

train_input_reader: {
tf_record_input_reader {
input_path: "data/train.record"
}
label_map_path: "data/face-detection.pbtxt"
}

eval_config: {
num_examples: 40
}

eval_input_reader: {
tf_record_input_reader {
input_path: "data/test.record"
}
label_map_path: "training/face-detection.pbtxt"
shuffle: false
num_readers: 1
}

Before jumping to training phase , lets have a look at mobile net. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks.

The below architecture has been used on iphone’s for super fast applications like face recognition etc..

Network topology

Training:

Before proceeding to that , as a point of note, this model has been trained on a gcp instance with NVIDIA Tesla k80 with 7GB ram and 12 GB graphic card.If you are training such models on CPU , then please put your CPU in a room for 10 days and go enjoy a holiday trip!

Our training directory has to be put in the object_detection folder which is organized as:

training-train.record-test.record-face-detection.pbtxt-pipeline.config-ssd_mobilenet_v1_pets.config

There is a written train.py in the object_detection folder and there by run it as:

python3 train.py — logtostderr — train_dir=training/ — pipeline_config_path=training/ssd_mobilenet_v1_pets.config

and whoop! we are there and make sure you have the number of steps in the config file. We ran it for 35000 steps , until the loss was nearly 1. The model will perform better if it has loss less than 1. For now , this should be fine

Then apart from training we need to even evaluate our model , isn’t it? So, for this purpose , tensorflow has provided us tensorboard to visualize the model even while and after training. To run:

tensorboard --logdir=${PATH_TO_MODEL_TRAINING_DIRECTORY}After this run the following command in another terminal in order to view the tensorboard on your browser:ssh -i public_ip -L 6006:localhost:6006Now open your browser and say localhost:6006

And then tensorboard loads , and you can see the loss graphs:

all loss graphs vs epochs
batch v/s epochs
steps v/s epochs

After looking at the above graphs , we can conclude that the training process has proceeded normally as expected and we can even tune many other parameters like learning rate , batch size to achieve better results within less number of steps. And in the classification_loss we can clearly see that it has approached 1 approximately.

TESTING:

In order to test the model , one the gcp instance which we used , we have hosted jupyter notebook on the port 8008. Host the jupyter notebook on the gcp instance by launcing it as:

ssh -i public_ip -L 8008:localhost:8008

and then on the local system open localhost:8008 and use the notebook which is already there in object_detection folder as objectdetection_tutorial.ipynb.

Before using the notebook , we need to export the inference graph . Luckily for us, in the models/object_detection directory, there is a script that does this for us: export_inference_graph.py

To run this, you just need to pass in your checkpoint and your pipeline config, then wherever you want the inference graph to be placed. For example:

python3 export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/ssd_mobilenet_v1_pets.config \
--trained_checkpoint_prefix training/model.ckpt-35000 \
--output_directory face_graph

Then in the jupyter notebook , the following change has to be made:

# What model to download.
MODEL_NAME = 'face_inference_graph'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'face-detection.pbtxt')

NUM_CLASSES = 1

after this we can delete the download model part in the code. Then in the detection section change TEST_IMAGE_PATHS variable to :

TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

where all the images should be in a folder with naming as image1.jpg , image2.jpg ……

Then in run drop-down click run-all and below are few outputs on CCTV footages.

image1.jpg
image2.jpg
image3.jpg
image4.jpg

CONCLUSION:

As we can see that the results were quite impressive , but we cannot see the accuracy levels above 90% due to the loss being greater than 1. Thereby if we train until loss gets down below 1 or near to zero , we can get much accurate results.

FRCNN in terms of accuracy would give very good results compared to mobilenet, but mobilenet is very responsive for real-time applications.When tested on 60 images 52 faces were detected properly, 87% confidence level has been recorded.

That’s it for now , we hope that this blog was useful .

I would love to hear any suggestions or queries. Please write to me at sameer.gadicherla@dataturks.com

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out.

More where this came from

This story is published in Noteworthy, where thousands come every day to learn about the people & ideas shaping the products we love.

Follow our publication to see more product & design stories featured by the Journal team.

--

--

DataTurks: Data Annotations Made Super Easy

Data Annotation Platform. Image Bounding, Document Annotation, NLP and Text Annotations. #HumanInTheLoop #AI, #TrainingData for #MachineLearning.