Understanding how robots discover objects and their pose

Published in

Programming Robots

4 min readNov 18, 2020

One of the most popular videos in my YouTube channel is an illustration of how one can draw on Cozmo’s face by simply moving the cube around. The program is simple to write using the Cozmo Codelab. But there are very important concepts to learn here. One of the most essential jobs of commercial robots is to grasp and move objects. Behind this technology is the art of determining the pose of an object, known as 6D Object Pose Estimation (6DOPE). Before, going into a deep dive on 6D OPE, I would first like you to review the following video.

Why 6D? Because the pose of any object can be described using (i) 3 dimensional coordinates of the approximate center of the object, and (ii) the 3D rotational matrix which can be expressed as a quarternion. The quarternion is a powerful concept by which any rotation can be expressed using 3 different coordinates. An excellent tutorial which gives a great intuitive insight into the concept of the quarternion is available here. An excellent research paper available here summarizes the state of the art in 6D OPE. Another excellent research paper with a different approach compared to the previous one is available here. A comprehensive survey of different approaches used for 6D OPE is available here. The authors in the last paper do a great job of balancing classical versus modern deep learning based approaches.

The Anki Cozmo and Vector SDK allow us to programmatically explore how 6D OPE works in practice. Here is an example of detecting the lightcube using the Vector SDK.

import time
import argparse
import anki_vector
from anki_vector.events import Events
from anki_vector.util import degreesdef handle_object_appeared(robot, event_type, event):
    # This will be called whenever an Object comes into view.
    print(f"Vector started seeing an object: \n{event.obj}")if __name__ == "__main__":
   parser = argparse.ArgumentParser()
   args = anki_vector.util.parse_command_args(parser)
   with anki_vector.Robot(serial=args.serial,
                          default_logging=False,
                          show_viewer=True,
                          enable_nav_map_feed=True) as robot:
      # Place Vector's cube where he can see it
      robot.events.subscribe(handle_object_appeared, Events.object_appeared)
      robot.behavior.set_lift_height(0.0)
      robot.behavior.set_head_angle(degrees(0.0))

In this program, we register an event to be raised when an object is seen by Vector. The event handler prints all the details of the object including the object pose.

Here is a sample image that is shown in Vector’s viewer and a sample output:

Vector started seeing an object: 
<LightCube object_id=1 pose=<Pose: <Position x: 174.72 y: -58.31 z: 44.92> <Quaternion q0: 0.58 q1: 0.48 q2: 0.42 q3: 0.50 <Angle Radians: 1.42 Degrees: 81.63>> <Origin Id: 23>> is_visible=True>

In the above output, the 3D coordinates of Vector’s lightcube are identified as (x=174.72, y=-58.31, z=44.92). These coordinates are relative to Vector’s current origin, which is usually the point at which Vector first woke up, or the point at which Vector was placed after being lifted. In this case, the origin is represented by Id 23. The 3D rotation coordinates are represented by the following quarternion (q0=0.58, q1=0.48, q2=0.42, and q3=0.50). Note the interesting feature about quarternions here: q0²+q1² + q2² + q3² = 1.

Now once Vector has estimated the pose of the object, how is this helpful. Let’s say that we instruct Vector to reach its lightcube. Now Vector will kick in its path planning algorithm to plot a way to the light cube (using the estimated pose). Here is a small modification to the above code to allow Vector to reach the cube.

def handle_object_appeared(robot, event_type, event):
    # This will be called whenever an Object comes into view.
    print(f"Vector started seeing an object: \n{event.obj}")
    robot.behavior.go_to_pose(event.obj.pose)

So the simple go_to_pose command instructs Vector to begin path planning in order to reach the pose detected.

You are not limited to just detecting the pose of a lightcube. You can capture the pose of any object that Vector can recognize. This includes custom objects which are defined here, any human face that Vector has detected. or the charger. As an example, if you lift Vector, then he looses all orientation. In that case, he would be able to return back to the charger by seeing it and detecting its pose.

PS: Vector is simply the best consumer gadget for educational purposes tha I have seen. If you would like to learn Artificial Intelligence with the help of Vector, I have a course available at http://robotics.thinkific.com I will feel honored to have you as a student. Alternatively, if you wish to donate this course to someone who owns a Vector, I am sure that it will serve as a great gift and also enable that person to learn a lot. I am also the editor of a Medium publication: “Programming Robots” where you would find other articles on Vector and other robots.

Understanding how robots discover objects and their pose

Written by Amitabha