Mono-camera object-pose estimation in autonomous driving systems

Published in

Evocargo

7 min readSep 15, 2022

Enclosing an obstacle in an optimal bounding box

Autonomous vehicles must operate safely on the road and select optimal routes. Overall safety and efficiency are established through the many small choices a vehicle makes in real time, including how it detects obstacles and selects its path around them.

In this article, I’ll share my algorithm for efficient, optimal estimation of the obstacle’s pose. First, I’ll give a brief overview of object detection, which is needed for a clear understanding of what the algorithm takes as input, and then formulate the problem that the algorithm is designed to solve.

Object detection

An autonomous car sees people, cars, and other obstacles on the road as boxes. These might be 2D or 3D boxes depending on the type of detector you’ve chosen for your project.

I won’t encourage the use of one detector or another — let’s just agree that I’ll treat 2D boxes as the input in this article. 2D detectors are well-suited for many projects (see the links at the end of the post) as there are plenty of datasets and neural network architectures for them, and they don’t require as much computational power as 3D detectors. (And fast computation is of the uppermost importance for real-time localization and planning.)

Figure 1. A 2D detector localizes objects in an image from a camera.

So, let’s return to our case. I have a 2D box enclosing a big truck — the obstacle that the autonomous vehicle needs to drive around. The lower edge of the box covers too much of the useful area of the road. I’ll correct it by applying the drivable-area-segmentation neural network. That gives me a set of points that partially defines the object’s pose.

Figure 2. The drivable area segmentation (to the left). The lower edge point pattern (joined in a blue polyline for better illustration) gets closer to the real shape of the object (to the right).

The autonomous vehicle’s planning algorithms work in bird’s-eye-view (BEV). So, the point pattern is converted to the required layout. Let’s look at our situation from above.

Figure 3. Schematic view of the road, my autonomous vehicle, and the truck with the points along its edges.

Finally, in the BEV image, I need to enclose the big truck in a rectangular bounding box and build the route around this box.

So, what’s the problem?

Cameras installed on an autonomous vehicle often capture only one or two sides of an obstacle, not the whole of it. Hence, in BEV, the point pattern excludes the sides that are out of the cameras’ view. If I draw a minimal rectangle only considering these points, I get a bounding box that is unacceptably far from the real obstacle, both in size and pose. Moreover, it overlaps the drivable area and literally blocks the way, so the autonomous vehicle will have to stop. Obviously, an autonomous vehicle designed for safe and efficient transportation needs a more accurate solution.

Figure 4. To the left — A faulty bounding box (in red). To the right — The desired bounding box (in blue) that will allow the autonomous vehicle to calculate the best drive-around trajectory.

Here is the box I need:

The whole object fits into the rectangular bounding box, including any part beyond the cameras’ range.
The rectangle isn’t too big or too small.

Solution: The L-shape algorithm

Now we are finally getting to the interesting part. To build an elegantly optimal bounding box in BEV, I suggest using my L-shape algorithm. The main idea is to fit points of the object’s two sides in two perpendicular lines that resemble an L (hence the name of the algorithm), and then build a rectangle based on this L-shape.

The magic behind the L-shape algorithm involves geometry and some formulas, so I hope you love math as much as I do.

The L-shape algorithm involves two steps:

1 — Define the axis of the rectangle’s orientation

2 — Define the rectangle’s center and size

Orientation

First, I have the point pattern in BEV. By fitting the points in two perpendicular lines, I define the orientation of the rectangle's axis.

One line perpendicular to another y=kₓ+d₁ (where k describes the slope of this line) is set by the equation x=-kᵧ+d₂.

The vectors (k 1)ᵀ and (1 -k)ᵀ are perpendicular to one another.

The points should be split between the two lines. Let’s say nₓ is the number of points belonging to the first line, and nᵧ is the rest of the points, belonging to the second line. Solve this task:

Now let’s make it shorter

And even shorter

The solution for such a task is as follows:

I want to go through all possible splits of points along the two perpendicular lines and choose the pair with the lowest mean squared error (MSE) value. It would be time-consuming to build a new matrix for every try, so I wish to formulate the calculation with a O(n) time complexity. To do so, I take a piece of paper and write out the following matrices:

Here, the summation symbol, ∑, indicates the sum of the elements of the corresponding vectors.

The beautiful thing about this approach is that I calculate all the sums for nₓ=0 with O(n) complexity, and then I consistently move points from the second line to the first line. The sums are recalculated in constant time:

Thus, by going through the points twice, I know the orientation of the rectangle’s axis.

Center and size

I need all the points of the object (the truck in our example) to be inside the box.

Let’s consider the equation 𝑦−𝑘ₓ=0, where y is a point and kₓ is the orientation of a line. When I specify one of the points in this equation, I get a distance (up to a constant factor and sign) between this point and the line. By going through all the points, I get the minimal and maximal shifts of straight lines under the condition that all the points are enclosed by those lines. Thus, I have the following four equations:

Hence the size of the rectangle formed by these lines (with coefficient normalization) is

Previously, I found all the possible shifts of the lines, including their middle position. Since I know where the middle lines are, I can calculate the rectangle’s center from where they intersect.

The algorithm can be illustrated as follows:

Figure 5. Shifting the lines for the set of points and selecting the right ones when the rectangle’s center is known

Hooray! Now I’ve got a rectangular box that bounds all the points of the object, the orientation and size of which are close to the object’s real size and pose.

Bonus case: Three-point obstacle

If you have only three points, all line pairs you get have MSE=0 (see figure below). In this case, you may hesitate which one to choose. To avoid such ambiguity, I add a “second round” — if the new MSE value is close to the best one obtained previously, I additionally calculate the rectangles’ area for both orientations, and go with the smaller one.

Figure 6. Selecting a smaller rectangle for a three-point object

Conclusion

The L-shape algorithm I presented in this post can help you estimate an object’s real pose when you know the positions of some of its points. This algorithm can be further improved. For example, you could try to eliminate the problem of outliers in the input data that is typical for algorithms using mean square error (MSE) metrics. To do so, you can introduce robust estimation methods such as Random Sample Consensus (RANSAC) and get even more accurate results for the real-world tasks.

🔗 Links to articles on the use of 2D detectors in various projects:

Object Detection for Autonomous Driving by Shikhar Semwal and Vibhor Sharma (PDF).
Use of CNN(YOLO) in Self Driving vehicles by Maurya Vijayaramachandran and Akilesh B (PDF).
Autonomous Vehicle Object Detector with YOLOv4 and Darknet by Ridwan Alam
Autonomous Driving Car Detection Application using YOLO algorithm (Tensorflow/Keras)
YOLOv4-v3 Training Automation API for Linux by BMW InnovationLab.