Effortless Object Detection In TensorFlow With Pre-Trained Models

6 min readJun 3, 2024

Effortless Object Detection In TensorFlow With Pre-Trained Models-CodeTrade

Object detection is a crucial task in computer vision that involves identifying and locating objects within an image or a video stream. The implementation of object detection has become more accessible than ever before with advancements in deep learning libraries like TensorFlow.

In this blog post, we will walk through the process of performing object detection using a pre-trained model in TensorFlow, complete with code examples. Let’s start.

Explore More: How To Train TensorFlow Object Detection In Google Colab: A Step-by-Step Guide

Steps to Build Object Detection Using Pre-Trained Models in TensorFlow

Before diving into the code, you must set up your environment and prepare your dataset. In this example, we’ll use a pre-trained model called ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 from TensorFlow’s model zoo, which is trained on the COCO dataset.

1. Data Preparation

First, let’s organize our data into the required directory structure:

import os
import shutil
import glob
import xml.etree.ElementTree as ET
import pandas as pd

# Create necessary directories
os.mkdir('data')

# Unzip your dataset into the 'data' directory
# My dataset is 'Fruit_dataset.zip'
# Replace the path with your dataset's actual path
!unzip /content/drive/MyDrive/Fruit_dataset.zip -d /content/data

# Move image and annotation files to their respective folders
# Adjust paths according to your dataset structure
# This code assumes that your dataset contains both 'jpg' and 'xml' files
# and organizes them into 'annotations_train', 'images_train', 'annotations_test', and 'images_test' folders.
# You may need to adapt this structure to your dataset.
# images & annotations for test data
for dir_name, _, filenames in os.walk('/content/data/test_zip/test'):
    for filename in filenames:
        if filename.endswith('xml'):
            destination_path = '/content/data/test_zip/test/annotations_test'
        elif filename.endswith('jpg'):
            destination_path = '/content/data/test_zip/test/images_test'
        source_path = os.path.join(dir_name, filename)
        try:
            shutil.move(source_path, destination_path)
        except:
            pass

# images & annotations for training data 
for dir_name, _, filenames in os.walk('/content/data/train_zip/train'):
    for filename in filenames:
        if filename.endswith('xml'):
            destination_path = '/content/data/train_zip/train/annotations_train'
        elif filename.endswith('jpg'):
            destination_path = '/content/data/train_zip/train/images_train'
        source_path = os.path.join(dir_name, filename)
        try:
            shutil.move(source_path, destination_path)
        except:
            pass

2. Convert XML Annotations to CSV

To train a model, we need to convert the XML annotation files into a CSV format that TensorFlow can use. We’ll create a function for this purpose:

import glob
import xml.etree.ElementTree as ET
import pandas as pd

def xml_to_csv(path):
    classes_names = []
    xml_list = []

    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            classes_names.append(member[0].text)
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text))
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    classes_names = list(set(classes_names))
    classes_names.sort()
    return xml_df, classes_names

# Convert XML annotations to CSV for both training and testing data
for label_path in ['/content/data/train_zip/train/annotations_train', '/content/data/test_zip/test/annotations_test']:
    xml_df, classes = xml_to_csv(label_path)
    xml_df.to_csv(f'{label_path}.csv', index=None)
    print(Successfully converted {label_path} xml to csv.')

This code outputs CSV files summarizing image annotations. Each file details the bounding boxes and corresponding class labels for all objects within an image.

3. Create TFRecord Files

The next step is to convert our data into TFRecords. This format is essential for training TensorFlow object detection models efficiently. We’ll utilize the generate_tfrecord.py script included in the TensorFlow Object Detection API.

#Usage:  
#!python generate_tfrecord.py output.csv output.pbtxt /path/to/images output.tfrecords

# For train.record
!python generate_tfrecord.py /content/data/train_zip/train/annotations_train.csv /content/label_map.pbtxt /content/data/train_zip/train/images_train/ train.record

# For test.record
!python generate_tfrecord.py /content/data/test_zip/test/annotations_test.csv /content/label_map.pbtxt /content/data/test_zip/test/images_test/ test.record

Ensure that you have a label_map.pbtxt file containing class labels and IDs in your working directory or adjust the path accordingly.

4. Download Pre-trained Model

Download the pre-trained model ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz and unzip it:

# Download the pretrained model
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
!tar -xzvf ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz

This model provides a good trade-off between speed and accuracy and is suitable for many real-time object detection tasks.

5. Training the Model

Now, it’s time to train your object detection model. Make sure to adjust the pipeline_config_path and model_dir paths according to your environment:

%cd /content/models/research/object_detection

!python model_main_tf2.py --pipeline_config_path=/content/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config --model_dir=/content/training --alsologtostderr

This command will start the training process using the specified configuration file and save the trained model checkpoints in the training directory.

6. Export the Trained Model

Once training is complete, export the trained model for inference:

!python /content/models/research/object_detection/exporter_main_v2.py --trained_checkpoint_dir=/content/training --pipeline_config_path=/content/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config --output_directory /content/inference_graph

Your trained model will be saved in the inference_graph directory, ready for object detection inference.

7. Optional: Customize Label Text Font

If you want to customize the font for the labels in your visualizations, you can follow these steps:

# Download a different font (optional)
!wget https://freefontsdownload.net/download/160187/arial.zip
!unzip arial.zip -d .

%cd utils/
!sed -i "s/font = ImageFont.truetype('arial.ttf', 24)/font = ImageFont.truetype('arial.ttf', 50)/" visualization_utils.py
%cd ..

8. Loading the Saved Model and Running Inference

Now that we’ve trained our object detection model and exported it, it’s time to load the saved model and run inference on an example image. This process will help us identify and visualize objects within the image.

# Import necessary libraries
import tensorflow as tf
import time
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from PIL import Image
from google.colab.patches import cv2_imshow
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

# Set the desired output image size
IMAGE_SIZE = (8, 6)

# Load the saved model
PATH_TO_SAVED_MODEL = "/content/inference_graph/saved_model"  
# Update with your saved model path
print('Loading model...', end='')

# Load the saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)
print('Done!')

# Load the label map
category_index = label_map_util.create_category_index_from_labelmap("/content/label_map.pbtxt", use_display_name=True)
# You can also specify the path to your label map file.

# Define a function to load an image into a NumPy array
def load_image_into_numpy_array(path):
    return np.array(Image.open(path))

# Specify the path to the image for inference
image_path = "/content/data/train_zip/train/images_train/mixed_9.jpg"
image_np = load_image_into_numpy_array(image_path)

# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image_np)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis, ...]

# Run inference on the image
detections = detect_fn(input_tensor)

# All outputs are batched tensors.
# Convert to numpy arrays, and take index [0] to remove the batch dimension.
# We're only interested in the first num_detections.
num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
              for key, value in detections.items()}
detections['num_detections'] = num_detections

# Detection classes should be integers.
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

9. Visualizing Object Detections

Now that we have run inference on our example image, let’s visualize the detected objects by drawing bounding boxes and labels on the image.

# Create a copy of the image with bounding boxes and labels
image_np_with_detections = image_np.copy()
# Visualize the detections on the image
viz_utils.visualize_boxes_and_labels_on_image_array(
 image_np_with_detections,
 detections['detection_boxes'],
 detections['detection_classes'],
 detections['detection_scores'],
 category_index,
 use_normalized_coordinates=True,
 max_boxes_to_draw=5,
 min_score_thresh=0.1, # Adjust this value to set the minimum probability for boxes to be classified as True
 agnostic_mode=False,
 line_thickness=1
)
# Display the image with detections
%matplotlib inline
plt.figure(figsize=IMAGE_SIZE, dpi=200)
plt.axis("off")
plt.imshow(image_np_with_detections)
plt.show()

In this code, we first load our saved model and perform inference on a sample image. Then, we use the TensorFlow Object Detection API’s utility functions to visualize the detections by drawing bounding boxes and labels on the image.

You can adjust the min_score_thresh parameter to control the minimum confidence score required for an object to be displayed in the visualization.

Here are the results of object detection on an image:

These steps allow you to use your trained object detection model for real-world applications, such as identifying and localizing objects in images or videos.

Conclusion

In this blog post, we’ve explored the fascinating world of object detection using pre-trained models in TensorFlow. Object detection is crucial in various applications, from autonomous vehicles to surveillance systems and retail analytics. Leveraging deep learning and TensorFlow, we’ve demonstrated how to build and deploy a custom object detection model.