Object Detection in ROS2 with PyTorch’s Faster

7 min readJun 28, 2024

In robotics, detecting and identifying objects in the environment is crucial. This ability allows robots to navigate spaces, interact with objects, and better understand their surroundings. In this blog post, we will explore how to set up an object detection system in ROS2 using PyTorch’s Faster R-CNN, a state-of-the-art model for object detection.

Understanding the Basics of Object Detection

Object detection involves identifying and locating objects within an image, a crucial task for applications such as autonomous driving, robotics, and video surveillance. Modern object detection models can identify multiple objects in a single image, providing bounding boxes and labels for each detected object.

Why Faster R-CNN?

Faster R-CNN (Region-based Convolutional Neural Network) is a powerful and widely used object detection model. It builds on the earlier R-CNN and Fast R-CNN models by introducing a Region Proposal Network (RPN), which shares full-image convolutional features with the detection network. This enhancement significantly improves both speed and accuracy.

The Model Used

For this project, we use Faster R-CNN with a ResNet-50 backbone and Feature Pyramid Network (FPN). This model is available in PyTorch’s torchvision library and comes pre-trained on the COCO (Common Objects in Context) dataset.

Why ResNet-50?

Residual Networks (ResNet): ResNet introduces skip connections or shortcuts to jump over some layers. ResNet-50 is a 50-layer residual network that helps in training deep networks by mitigating the vanishing gradient problem.
Feature Pyramid Network (FPN): FPN builds feature pyramids inside the neural network to improve the detection of objects at different scales.

The COCO Dataset

The COCO dataset is a large-scale object detection, segmentation, and captioning dataset. It contains over 200,000 labeled images with more than 80 object categories, making it one of the most comprehensive datasets for object detection.

Why COCO?

Diverse Objects: The dataset includes a wide range of objects commonly found in everyday scenes.
High Quality: COCO provides rich annotations for object detection, segmentation, and keypoints.
Benchmarking: It is a standard benchmark for object detection models, allowing for easy comparison of performance.

Integrating Faster R-CNN with ROS2

ROS2 (Robot Operating System 2) is an open-source framework for developing robot software. It offers services for a heterogeneous computer cluster, including hardware abstraction, device control, implementation of common functionalities, message passing between processes, and package management.

Steps to Set Up Object Detection in ROS2:

Create a ROS2 Package: Set up a new ROS2 package and include necessary dependencies like rclpy, sensor_msgs, and cv_bridge.
Implement the Object Detection Node: Write a node that subscribes to an image topic, performs object detection using Faster R-CNN, and publishes the results.
Visualization: Use rqt_image_view or RViz to visualize the detected objects with bounding boxes and labels.

Step-by-Step Implementation

1. Create a ROS2 Package

First, create a new ROS2 package:

ros2 pkg create opencv_tools --build-type ament_python

Install necessary dependencies:

pip install torch torchvision

Update setup.py to include these dependencies and update package.xml to include necessary ROS2 dependencies.

2. Implement the Object Detection Node

In your package, create a Python script (e.g., object_detection.py) with the following content:

Imports

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge, CvBridgeError
import cv2
import torch
import torchvision
from torchvision.models.detection import FasterRCNN_ResNet50_FPN_Weights

rclpy: ROS2 client library for Python, used for creating nodes, publishers, and subscribers.
Node: Base class for ROS2 nodes.
Image: Message type for images in ROS2.
CvBridge: Library to convert between ROS image messages and OpenCV images.
cv2: OpenCV library for image processing.
torch and torchvision: PyTorch libraries for building and deploying neural networks, specifically for using pre-trained models.

ObjectDetectionNode Class

class ObjectDetectionNode(Node):

    def __init__(self):
        super().__init__('object_detection_node')
        self.subscription = self.create_subscription(Image, 'image_raw', self.listener_callback, 10)
        self.publisher_ = self.create_publisher(Image, 'detection_image', 10)
        self.bridge = CvBridge()
        self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT)
        self.model.eval()
        self.coco_labels = {v: k for k, v in self.model.coco_labels.items()}

__init__: Constructor to initialize the node.

super().init(‘object_detection_node’): Initializes the node with the name ‘object_detection_node’.
self.subscription: Subscribes to the ‘image_raw’ topic to receive images.
self.publisher_: Publishes the detection results to the ‘detection_image’ topic.
self.bridge: Initializes the CvBridge for converting ROS images to OpenCV images.
self.model: Loads the pre-trained Faster R-CNN model with ResNet-50 backbone and Feature Pyramid Network.
self.model.eval(): Sets the model to evaluation mode.
self.coco_labels: Creates a dictionary to map COCO labels.

Listener Callback

    def listener_callback(self, msg):
        try:
            cv_image = self.bridge.imgmsg_to_cv2(msg, 'bgr8')
        except CvBridgeError as e:
            self.get_logger().error('Failed to convert image: %s' % str(e))
            return

listener_callback: Callback function triggered when a new image is received.

cv_image = self.bridge.imgmsg_to_cv2(msg, ‘bgr8’): Converts the ROS image message to an OpenCV image in BGR format.
CvBridgeError: Handles any conversion errors.

Image Processing and Object Detection

        transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
        image_tensor = transform(cv_image)
        outputs = self.model([image_tensor])[0]

transform: Defines a transform to convert the image to a tensor.
image_tensor = transform(cv_image): Applies the transform to the OpenCV image.
outputs = self.model([image_tensor])[0]: Passes the image tensor through the model to get detection outputs.

Drawing Bounding Boxes

        for i, (box, score, label) in enumerate(zip(outputs['boxes'], outputs['scores'], outputs['labels'])):
            if score >= 0.5:
                x1, y1, x2, y2 = box.int().tolist()
                label_name = self.coco_labels[label.item()]
                cv2.rectangle(cv_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
                overlay = cv_image.copy()
                cv2.rectangle(overlay, (x1, y1), (x2, y2), (0, 255, 0), -1)
                alpha = 0.4
                cv_image = cv2.addWeighted(overlay, alpha, cv_image, 1 - alpha, 0)
                cv2.putText(cv_image, label_name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2)

for i, (box, score, label) in enumerate(zip(outputs[‘boxes’], outputs[‘scores’], outputs[‘labels’])): Loops through each detected object.

if score >= 0.5: Filters detections based on a confidence score threshold.
x1, y1, x2, y2 = box.int().tolist(): Extracts the coordinates of the bounding box.
label_name = self.coco_labels[label.item()]: Gets the label name for the detected object.
cv2.rectangle: Draws a green bounding box around the detected object.
overlay = cv_image.copy(): Creates a copy of the image for overlay.
cv2.rectangle(overlay, (x1, y1), (x2, y2), (0, 255, 0), -1): Fills the bounding box with a semi-transparent green color.
alpha = 0.4: Defines the transparency level.
cv_image = cv2.addWeighted(overlay, alpha, cv_image, 1 — alpha, 0): Combines the overlay with the original image.
cv2.putText: Adds the label name above the bounding box.

Publishing the Detection Image

try:
 detection_image = self.bridge.cv2_to_imgmsg(cv_image, 'bgr8')
 self.publisher_.publish(detection_image)
 except CvBridgeError as e:
 self.get_logger().error('Failed to convert image: %s' % str(e))

detection_image = self.bridge.cv2_to_imgmsg(cv_image, ‘bgr8’): Converts the OpenCV image back to a ROS image message.
self.publisher_.publish(detection_image): Publishes the detection image.

Main Function

def main(args=None):
    rclpy.init(args=args)
    node = ObjectDetectionNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

main: Entry point of the script.

rclpy.init(args=args): Initializes the ROS2 Python client library.
node = ObjectDetectionNode(): Creates an instance of the ObjectDetectionNode.
rclpy.spin(node): Keeps the node running, allowing it to process callbacks.
node.destroy_node(): Destroys the node when shutting down.
rclpy.shutdown(): Shuts down the ROS2 Python client library.

Application of Robotics Using NVIDIA Jetson with Object Detection

Integrating object detection capabilities into robotic systems opens up a wide array of applications, especially when using powerful edge computing devices like NVIDIA Jetson. Below, we’ll explore how this technology can be utilized in various robotic applications, highlighting the benefits and potential use cases.

NVIDIA Jetson for Edge AI

NVIDIA Jetson platforms, such as the Jetson Nano, Jetson Xavier NX, and Jetson AGX Xavier, are designed to deliver GPU-accelerated computing power in a compact form factor. They are ideal for deploying deep learning models at the edge, allowing for real-time processing of visual data without the need for cloud connectivity. This makes them perfect for applications where latency, bandwidth, or privacy concerns are critical.

Benefits of Using NVIDIA Jetson

Real-Time Processing: NVIDIA Jetson’s GPU capabilities enable real-time processing of deep learning models, which is essential for applications requiring immediate responses.

Edge Computing: By processing data locally on the device, Jetson platforms reduce latency and dependency on cloud services, ensuring consistent performance even in areas with poor connectivity.

Energy Efficiency: Jetson devices are designed to provide high performance with low power consumption, making them suitable for mobile and battery-powered robots.

Scalability: The Jetson platform supports a range of devices, allowing developers to scale their applications from prototypes to production systems seamlessly.

Rich Ecosystem: NVIDIA provides a comprehensive ecosystem, including SDKs like JetPack, DeepStream, and TensorRT, which simplifies the development and deployment of AI applications.

Conclusion

In this blog post, we have explored how to set up an object detection system in ROS2 using PyTorch’s Faster R-CNN with a ResNet-50 backbone. We discussed the importance of object detection, the features of the Faster R-CNN model, and the COCO dataset used for training the model. By integrating these powerful tools, we can enable robots to better understand and interact with their environments, paving the way for more advanced and intelligent robotic systems.

Feel free to dive deeper into the implementation details and customize the node according to your specific use case. With the ever-growing advancements in AI and robotics, the possibilities are endless!