Object Detection using SSD Mobilenet and Tensorflow Object Detection API : Can detect any single class from coco dataset.

6 min readJul 7, 2020

SSD (Single Shot MultiBox Detector) is a popular algorithm in object detection. It’s generally faster than Faster RCNN. In this post, I will give you a brief about what is object detection, what is tenforflow API, what is the idea behind neural networks and specifically how SSD architecture works. Then I’ll provide you the step by step approach on how to implement SSD MobilenetV2 trained over COCO dataset using Tensorflow API. In this tutorial you can detect any single class from the classes provided by COCO dataset. After this, I believe you can implement your own SSD with some patience. In this post, I will follow the original architecture from the paper.

What Are Covered

Object Detection
SSD(Single Shot Detector) Architecture
Tensorflow API for object detection
Implementation

What is Object Detection?

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and Pedestrian Detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Object detection as the term suggest is the procedure to detect the objects in real world. For example, dog,car,humans, birds etc. In this process we can detect the presence of any still object with much ease. another great thing that can be done with it is that detection of multiple objects in a single frame can be done easily. For Example, in the image below the SSD model has detected mobile phone, laptop, coffee, glasses in a single shot. It detects different objects in a single shot.

Output from SSD Mobilenet Object Detection Model

SSD MobileNet Architecture

The SSD architecture is a single convolution network that learns to predict bounding box locations and classify these locations in one pass. Hence, SSD can be trained end-to-end. The SSD network consists of base architecture (MobileNet in this case) followed by several convolution layers:

By using SSD, we only need to take one single shot to detect multiple objects within the image, while regional proposal network (RPN) based approaches such as R-CNN series that need two shots, one for generating region proposals, one for detecting the object of each proposal. Thus, SSD is much faster compared with two-shot RPN-based approaches.

For more details about SSD architecture and its working, please read it’s official paper here

Tensorflow Object Detection API

Before moving to Tensorflow API, let’s understand what is API? To explain it better let’s take an example:

Imagine you’re sitting at a table in a restaurant with a menu of choices to order from. The kitchen is the part of the “system” that will prepare your order. What is missing is the critical link to communicate your order to the kitchen and deliver your food back to your table. That’s where the waiter or API comes in. The waiter is the messenger — or API — that takes your request or order and tells the kitchen — the system — what to do. Then the waiter delivers the response back to you. In this case, it is the food. So, here your helpful waiter is analogous to you API.

Now, similarly, the TensorFlow object detection API is the framework for creating a deep learning network that solves object detection problems.

There are already pretrained models in their framework which they refer to as Model Zoo. This includes a collection of pretrained models trained on the COCO dataset, the KITTI dataset, and the Open Images Dataset.model

They are also useful for initializing your models when training on the novel dataset. The various architectures used in the pretrained model are described in this table

For more about TensorFlow object detection API, visit their github repo here.

Implementation

start with one new colab notebook and follow the steps one by one.

step 1

Install tensorflow version 2 or higher

!pip install -U --pre tensorflow=="2.*"

step 2

make sure to install pycocotools for coco detection API.

!pip install pycocotools

step 3

get tensorflow/models by cloning the repository.

import os
import pathlib

if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('models').exists():
  !git clone --depth 1 https://github.com/tensorflow/models

move (cd) to research directory of the repo

cd models/research

step 4

compile protobufs

!protoc object_detection/protos/*.proto --python_out=.

install object_detection python package

!pip install object_detection

step 5

import required libraries

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display

install tf_slim python package:

!pip install tf_slim

import object detection modules:

from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

step 6

function to load your model

def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name, 
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model

this is the code to load your label map. Label maps map indices to category names/Class names. For example when our neural network predicts 1, it will correspond to “person” class or if it will predict, suppose 18, it will correspond to “dog” category.

PATH_TO_LABELS = 'object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

this is the path to your test images. This will help you to check your model detections over the given class. You can change your test images by going to models/research/object_detection/test_images to check the accuracy of SSD mobilenet over the given class.

PATH_TO_TEST_IMAGES_DIR = pathlib.Path('object_detection/test_images')
TEST_IMAGE_PATHS = sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
TEST_IMAGE_PATHS

step 7

load your object detection SSD mobilenet v1 model for object detection

model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
detection_model = load_model(model_name)

Now, check the model’s input signature.

print(detection_model.inputs)
detection_model.output_dtypes

Now add this wrapper function which is calling the model and returns the output.

def run_inference_for_single_image(model, image):
  image = np.asarray(image)
  input_tensor = tf.convert_to_tensor(image)
  input_tensor = input_tensor[tf.newaxis,...]
  output_dict = model(input_tensor)
  num_detections = int(output_dict.pop('num_detections'))
  output_dict = {key:value[0, :num_detections].numpy() 
                 for key,value in output_dict.items()}
  output_dict['num_detections'] = num_detectionsoutput_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
   
  if 'detection_masks' in output_dict:detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
              output_dict['detection_masks'], output_dict['detection_boxes'],
               image.shape[0], image.shape[1])      
    detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
                                       tf.uint8)
    output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
    
  return output_dict

step 8

Now, this is the main step where you can just pass on the class id corresponding to the category you want to detect in the class_id parameter of the function given below.(provided it should be present in the coco dataset). You can check the class id and their respective classes here.

def show_inference(model, image_path,class_id):
  image_np = np.array(Image.open(image_path))
  output_dict = run_inference_for_single_image(model, image_np)
  boxes = []
  classes = []
  scores = []
  for i,x in enumerate(output_dict['detection_classes']):
    if x==class_id and output_dict['detection_scores'][i] > 0.5:
      classes.append(x)
      boxes.append(output_dict['detection_boxes'][i])
      scores.append(output_dict['detection_scores'][i])
  boxes = np.array(boxes)
  classes = np.array(classes)
  scores = np.array(scores)
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      boxes,
      classes,
      scores,
      category_index,
      instance_masks=output_dict.get('detection_masks_reframed', None),
      use_normalized_coordinates=True,
      line_thickness=2)
  
  display(Image.fromarray(image_np))

step 9

this is the final step to see your output on the test images.

for image_path in TEST_IMAGE_PATHS:
  show_inference(detection_model, image_path, class_id)

Results

Let’s suppose we have two images in the test_images directory and I’ve passed class_id to be 1 which corresponds to “person” so in both the images only persons are detected.

for image_path in TEST_IMAGE_PATHS:
  show_inference(detection_model, image_path, 1)

In this case only persons are detected in the second image and no other object is detected.

Now, let’s change the class_id to be 18 which corresponds to “dog” category.

for image_path in TEST_IMAGE_PATHS:
  show_inference(detection_model, image_path, 18)

Here, only dogs are detected and no other object.

In the next post I’ll discuss about how to train and implement your custom object detection model using Tensorflow object detection API.