What is In-Hand Pose Estimation?

Published in

OMRON SINIC X

4 min readJun 1, 2020

A robot’s representation of its environment is never perfect. When the robot grasps an object, there will generally be a slight offset, so the object is not exactly where the robot expected it to be:

The offset between the real and predicted object pose is shown in red.

Humans intuitively correct their estimate of the object’s pose in their hand by interacting with it. We call this process of determining the position of a grasped object in-hand pose estimation, and in the project described in this post, we work towards giving robots this ability.

This post is based on our ICRA 2020 paper “Contact-based in-hand pose estimation using Bayesian state estimation and particle filtering”[PDF]. If you prefer to watch a video rather than reading this post, please head over to this 10-minute presentation. Otherwise, skim this video summary and read on:

Why is this needed?

For many industrial tasks, such as insertion, the object’s pose needs to be known with high precision. Even small offsets can cause the task to fail, and the robot to lock up. This is true for the majority of industrial robots, which are highly rigid and position-controlled.

If the peg is not at the correct position, moving the robot forward would cause a protective stop

Common approaches to solve this problem include impedance control, but this requires either a robot with adequate sensors and control speed, or a lower-level access that robot operators and integrators usually do not have. Visual servoing is complicated by cameras being sensitive to lighting conditions, and the fact that the problem needs to be set up for each part, which requires a significant system modeling effort.

We desired a method that needs no complicated modeling, and which would work using only the shape of the grasped object. We also wanted to avoid the need for extensive calibration, which is a common pain point for novice users and small businesses who consider automating their repetitive tasks. We were convinced that, with the right problem formulation, our robots would become more autonomous and ultimately more helpful. We will explain our view of the problem next.

Taking actions to gain knowledge

The robot moves in a world that is full of uncertainty. Consider that the position of the object may be guessed via a camera mounted on the robot’s wrist. There are many sources of noise and imprecision: the computer vision algorithm estimating the position of the object, the resolution of the camera or the calibration of the camera itself. Thus, while we have an idea about the pose of the object, there is uncertainty associated with it. We call this pair (pose estimate + uncertainty) our belief.

Before we proceed to the next task after grasping an object (e.g. “insert peg into bearing”), we need to reduce this uncertainty. We do this by interacting with the environment–in our case by touching the environment with the object. We call this step an action. Each action should improve our belief. A more accurate belief will be closer to the true pose of the object.

For our paper, we chose to implement only the touch action, and represented the belief as a 6D pose in the gripper’s frame of reference and a normal distribution in each degree of freedom. We make the assumption that the object does not move in the gripper, that the robot can detect collisions, and that we know the surface that the object is touching.

During the “Touch action”, when the robot detects that the object has made contact with the surface, we sample particles from the belief and use a standard collision checking library (FCL) to determine the distance of each randomly sampled particle to the surface. Particles that are either too far away or penetrating the surface are discard, and a new normal distribution calculated from the remaining particles. This constitutes the new belief.

Results

We were glad to see that our method converged very quickly to the correct values. However, we also found that there were limitations to our first implementation: the state representation is sensitive to the object’s center of rotation, the number of particles needs to be tuned, and the robot needs to move quite slowly to touch the environment, which limits the applications.

One of the main advantages of our method is its ease of setup. No modelling of the robot’s gripper is required, and only very common force sensors are used. It is very light-weight and generic, and formulated in a way that allows it to be combined with other types of actions — for example, common computer-vision-based pose estimation using a camera, or tactile sensing. This will be our next step.

What next?

Besides extending the project to multiple modalities in further papers, we plan to release a ROS package within the year. In the future, we hope to combine this work with grasp planning under uncertainty, so it can be used in a straight-forward fashion on real problems.

Are you a graduate student or researcher interested in working on this or similar topics? Contact us for internship opportunities and collaborations!

Post based on:
Felix von Drigalski, Shohei Taniguchi, Robert Lee, Takamitsu Matsubara, Masashi Hamaya, Kazutoshi Tanaka and Yoshihisa Ijiri, “Contact-based in-hand pose estimation using Bayesian state estimation and particle filtering”, ICRA 2020 [PDF] [Summary video] [Presentation video]

What is In-Hand Pose Estimation?

Why is this needed?

Taking actions to gain knowledge

Results

What next?

Written by Felix von Drigalski