Human Pose Estimation with PyTorch and ROS2: A Complete Guide

4 min readJul 2, 2024

In this blog post, we will explore how to perform human pose estimation using PyTorch’s Keypoint R-CNN model and integrate it with ROS2 to visualize body joints and skeletons in RViz.

Introduction
Model and Dataset
Code Explanation
Image Encoding and Decoding
Inference and Visualization
Conclusion

1. Introduction

Human pose estimation is a critical task in computer vision, involving the detection of key body joints in images or videos. This technology has applications in fields like robotics, healthcare, and sports analytics. In this guide, we will use the PyTorch framework and a pre-trained Keypoint R-CNN model to detect human poses and visualize the results in ROS2.

2. Model and Dataset

Model: Keypoint R-CNN

The Keypoint R-CNN model, part of the Detectron2 library developed by Facebook AI Research (FAIR), extends the Faster R-CNN object detection model to detect keypoints. Each detected person in an image has up to 17 keypoints, representing various body parts like shoulders, elbows, wrists, hips, knees, and ankles.

Dataset: COCO (Common Objects in Context)

The COCO dataset is a large-scale object detection, segmentation, and captioning dataset. It includes over 200,000 labeled images with annotations for keypoints of people, making it an ideal dataset for training pose estimation models.

3. Code Explanation

We will create a ROS2 package named opencv_tools and a node that:

Subscribes to an image topic.
Uses a pre-trained Keypoint R-CNN model to estimate human poses.
Publishes the image with keypoints and skeletons overlaid.
Publishes MarkerArray messages for visualization in RViz.

Installing Dependencies

Ensure you have the necessary dependencies installed:

pip install torch torchvision opencv-python

Code Explanation

Importing Necessary Libraries

These imports bring in the necessary libraries for working with ROS2 (rclpy, Node, Image, Marker, MarkerArray), OpenCV (cv2), and PyTorch (torch, torchvision). CvBridge is used for converting between ROS images and OpenCV images.

2. Defining the PoseEstimationNode Class

This defines a class PoseEstimationNode which inherits from Node. The constructor (__init__) initializes the node with the name 'pose_estimation_node' and logs an initialization message.

3. Setting Up Subscriptions and Publishers

self.create_subscription: Subscribes to the input_image topic to receive images. The image_callback method is called whenever a new image is received.
self.create_publisher: Creates publishers for output_image and visualization_marker_array topics to publish processed images and marker arrays.

4. Initializing CvBridge and Model

self.bridge = CvBridge(): Initializes CvBridge for converting between ROS and OpenCV images.
self.device = torch.device("cpu"): Specifies the device to run the model on (CPU in this case).
self.model: Loads the pre-trained Keypoint R-CNN model with COCO weights and moves it to the specified device. The model is set to evaluation mode using self.model.eval().
self.transforms: Defines image transformations (conversion to tensor) to be applied before feeding the image to the model.

5. Image Callback Function

self.bridge.imgmsg_to_cv2: Converts the ROS image message to an OpenCV image. If the image encoding is not bgr8, it converts the encoding appropriately.
Error handling: Logs an error if the conversion fails.

6. Processing the Image for Pose Estimation

The image is transformed to a tensor and passed through the model to get the outputs.
torch.no_grad(): Disables gradient calculation for inference, which reduces memory usage and speeds up computations.

7. Handling Model Outputs

Extracts keypoints and scores from the model output.
Filters keypoints with confidence scores greater than 0.5.
Initializes a MarkerArray to store the markers for visualization.

8. Drawing Keypoints and Skeleton

For each keypoint with high confidence, a circle is drawn on the image and a Marker is created for visualization.
Skeleton pairs are defined to connect keypoints.
For each valid pair, a line is drawn on the image and a Marker is created for visualization.
The MarkerArray is published to visualize keypoints and skeleton in RViz.
The processed image is converted back to a ROS message and published.

9. Main Function

Initializes the ROS2 node, creates an instance of PoseEstimationNode, and keeps the node running.
Destroys the node and shuts down ROS2 when done.

4. Image Encoding and Decoding

In ROS2, images are published and subscribed to in the form of sensor_msgs/Image messages. These messages need to be converted to OpenCV images for processing, and back to ROS messages for publishing. We use the CvBridge library for these conversions.

Encoding: The input image is received as a ROS message and converted to an OpenCV image using CvBridge.

Decoding: After processing, the modified image is converted back to a ROS message for publishing

5. Inference and Visualization

The inference involves passing the image through the Keypoint R-CNN model and extracting keypoints with high confidence scores. We then draw circles at these keypoints and lines connecting them to form a skeleton.

For visualization in RViz, we publish MarkerArray messages with markers representing the keypoints and lines.

6. Conclusion

In this blog post, we’ve walked through the complete process of setting up human pose estimation using PyTorch and integrating it with ROS2 for visualization in RViz. The Keypoint R-CNN model provides robust performance on the COCO dataset, allowing for accurate detection and visualization of human poses.

References: