Member-only story

Lifting 2D object detection to 3D in autonomous driving

Patrick Langechuan Liu
Towards Data Science
10 min readNov 4, 2019

Monocular 3D object detection predicts 3D bounding boxes with a single monocular, typically RGB image. This task is fundamentally ill-posed as the critical depth information is lacking in the RGB image. Luckily in autonomous driving, cars are rigid bodies with (largely) known shape and size. Then a critical question is how to effectively leverage the strong priors of cars to infer the 3D bounding box on top of conventional 2D object detection.

In contrast to conventional 2D object detection which yields 4 degrees of freedom (DoF) axis-aligned bounding boxes with center (x, y) and 2D size (w, h), the 3D bounding boxes in autonomous driving context generally have 7 DoF: 3D physical size (w, h, l), 3D center location (x, y, z) and yaw. Note that roll and pitch are normally assumed to be zero. Now the question is, how do we recover a 7-DoF object from a 4-DoF one?

Tight constraint between 2D and 3D bounding boxes (source)

One popular way, proposed by the pioneering work of Deep3DBox (3D Bounding Box Estimation Using Deep Learning and Geometry, CVPR 2017) is to regress the observation angle (or local yaw, or allocentric yaw, as explained in my previous post) and 3D object size (w, h, l) from the image patch enclosed by the 2D bounding box. Both the local yaw and the 3D object size (which usually assumes a unimodal distribution with small variance around subtype mean) are strongly tied to…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Patrick Langechuan Liu
Patrick Langechuan Liu

Written by Patrick Langechuan Liu

A physicist-turned AI engineer. AI for autonomous driving at NVidia. www.linkedin.com/in/patrick-llgc

Responses (2)