NVIDIA Jetson Nano: Publish TF Between Object and Camera Frames with ROS 2

Kabilankb
6 min readJun 5, 2024

--

Introduction

Welcome, robotics and computer vision enthusiasts! In this blog post, we’ll embark on a thrilling journey to publish a transformation frame (TF) between an object and a camera frame using the NVIDIA Jetson Nano, a powerful single-board computer ideally suited for embedded AI applications. By delving into this transformative process, we’ll unlock the potential for real-time object pose estimation, enabling robots and intelligent systems to perceive and interact with their surroundings more effectively.

Prerequisites

  • A keen interest in robotics and computer vision
  • Basic understanding of Python programming
  • Familiarity with the ROS (Robot Operating System)
  • An NVIDIA Jetson Nano development board
  • A webcam or other camera module

Setting Up the Environment

  1. Install ROS 2: Follow the official ROS 2 installation guide for your Jetson Nano operating system.
  2. Create a ROS 2 Package: Establish a new ROS 2 package to house your Python code. Refer to the ROS 2 documentation for detailed instructions.
  3. Install Necessary Dependencies: Employ pip to install the required Python libraries: rclpy, cv2, numpy, and tf2_ros.

Robots See the World with Transforms: Grabbing that Coffee Cup with ROS 2

Imagine you’re a robot arm tasked with grabbing a coffee cup. You have a fancy camera as your eye, but it only tells you the cup’s position relative to itself. How do you translate that into the coordinates your arm needs to reach and grab it?

This is where ROS 2 Transforms come in! They act like a universal translator for your robot, allowing it to understand the positions of objects relative to different parts of itself (like the camera) and the world (like the table the cup sits on).

Frames of Reference: Your Robot’s Worldview

Think of your robot’s environment as a scene in a play. Each actor (camera, gripper, cup) exists on a “stage” called a frame. These frames have unique names and define the origin and orientation of the object within that frame.

For example, the camera might have a frame named “camera_link” with the origin at the center of its lens. The cup might be in a frame named “cup_frame” based on the camera’s detection.

The Power of Transformation: From Here to There

ROS 2 Transforms are like stage directions for your robot. They tell it how to translate the position of an object from one frame to another.

Here’s the magic: You don’t need to do complex math yourself. ROS 2 uses the tf2 library to handle the heavy lifting. You provide the library with the names of the two frames (e.g., “camera_link” and “base_link” for the robot’s base) and the timestamp (to ensure everything is in sync), and it delivers the transformed position.

Beyond Coffee Cups: A World of Possibilities

ROS 2 Transforms are powerful for various robotics tasks. They enable robots to:

  • Navigate through obstacles by understanding their positions relative to the robot’s body.
  • Manipulate objects with different grippers by knowing their positions relative to the end-effector.
  • Collaborate with other robots by understanding their positions in the shared workspace.

Creating workspace

ros2 pkg create --build-type ament_python name

Create python file and paste following code

import rclpy
# ... other imports

class ObjectTracker(Node):
def __init__(self):
# ROS 2 node initialization
super().__init__('object_tracker')
# ... publisher and subscriber creation

# Background subtraction for object detection
self.back_sub = cv2.createBackgroundSubtractorMOG2()
# ... other object detection and pose estimation variables

def timer_callback(self):
# Capture frame from camera
ret, frame = self.cap.read()

# Perform background subtraction and apply morphology
fg_mask = self.back_sub.apply(frame)
# ... further processing for image filtering

# Extract object contour and bounding box
contours, _ = cv2.findContours(fg_mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# ... process contours and determine object center

# Estimate object pose using PnP
image_points = np.array([ ... ], dtype=np.float32) # Object bounding box corners in image space
_, rvec, tvec = cv2.solvePnP(self.object_points, image_points, self.camera_matrix, self.dist_coeffs)

# Construct and publish the pose message
pose = Pose()
pose.position.x = tvec[0][0]
# ... fill other pose attributes (y, z, orientation)

# Publish TF between camera and object frames
self.publish_transform(tvec, quaternion)

# Publish image with bounding box and pose information
self.publish_image(frame)

def publish_transform(self, tvec, quaternion):
# Create a TransformStamped message
t = TransformStamped()
# ... populate header, translation, and rotation fields
self.tf_broadcaster.sendTransform(t)

def publish_image(self, frame):
# Convert frame to ROS image message and publish
image_message = self.bridge.cv2_to_imgmsg(frame, encoding='bgr8')
self.image_publisher_.publish(image_message)

if __name__ == '__main__':
rclpy.init()
# ... object_tracker creation and execution
rclpy.shutdown()

Run the Nodes

ros2 run object_tracking object_tf_publish 

Explanation:

Node Initialization and Publishers/Subscribers:

  • The code establishes a ROS 2 node named object_tracker to manage communication within the ROS ecosystem.
  • Publishers are created to disseminate object coordinates, camera images, and object poses.

rviz2

  • Add a new display by clicking the “Add” button.
  • Select “Image” from the display types and set the “Image Topic” to /object_tracking/image_raw.
  • Add a new display by clicking the “Add” button again.
  • Select “TF” from the display types to visualize the TF tree.
  • Optionally, adjust the Fixed Frame to camera_frame.

Now, RViz should display the video feed from your webcam with the bounding box and centroid coordinates overlayed on the image. Additionally, the estimated pose of the tracked object will be published to the /object_tracking/pose topic, and the transformation between the camera and the object will be published and visualized in RViz.

Conclusion:

In this blog post, we’ve successfully navigated the process of publishing a transformation frame (TF) between an object and a camera frame on the NVIDIA Jetson Nano. By leveraging the power of computer vision, ROS 2, and the Jetson Nano’s capabilities, we’ve unlocked a valuable technique for real-time object pose estimation.

The Benefits:

Enhanced Robot Perception: Robots can now accurately perceive the location and orientation of objects in their environment, enabling them to grasp objects, navigate obstacles, and interact with the world more effectively.

Improved Object Tracking: By continuously publishing TFs, we can track objects across multiple frames, facilitating tasks like motion analysis and path planning.

Next Steps:

Calibration: Fine-tune the camera matrix and distortion coefficients for enhanced pose estimation accuracy.

Multiple Object Tracking:Extend the code to handle tracking and pose estimation for multiple objects simultaneously.

Integration with Robotic Manipulators: Employ the object pose information to control a robotic arm for tasks like object grasping or manipulation.

Call to Action:

Share your thoughts and questions in the comments below! Let’s embark on a collaborative exploration of this exciting field. If you’ve successfully implemented this code on your Jetson Nano, feel free to share your experiences and insights.

--

--