Exploring the KITTI 3D object detection data set
From directory structure to 2D bounding boxes
The Kitti 3D detection data set is developed to learn 3d object detection in a traffic setting. In upcoming articles I will discuss different aspects of this dateset.
- From directory structure to 2D bounding boxes
Code and notebooks are in this repository https://github.com/sjdh/kitti-3d-detection
Motivation for the 3d detection
Autonomous robots and vehicles track positions of nearby objects. These can be other traffic participants, obstacles and drivable areas.
For path planning and collision avoidance, detection of these objects is not enough. To make informed decisions, the vehicle also needs to know relative position, relative speed and size of the object.
The 3d detection task
The task of 3d detection consists of several sub tasks. Objects need to be detected, classified, and located relative to the camera. Finally the objects have to be placed in a tightly fitting boundary box.
Directory structure
The kitti data set has the following directory structure
{training,testing}/image_2/id.png
{training,testing}/image_3/id.png
{training,testing}/label_2/id.txt
{training,testing}/velodyne/id.bin
{training,testing}/calib/id.txt
There are two visual cameras and a velodyne laser scanner.
The two cameras can be used for stereo vision. Overlaying images of the two cameras looks like this
Labels
The label file looks like this
I wrote a gist for reading it into a pandas DataFrame. Here is the parsed table.
2D bounding boxes
The first step in 3d object detection is to locate the objects in the image itself. The corners of 2d object bounding boxes can be found in the columns starting bbox_xmin
etc.
Here the corner points are plotted as red dots on the image
for box in corners:
for corner in box:
cv2.circle(img, corner, 1, (255, 0, 0), 5)
# cv.Circle(img, center, radius, color, thickness=1, lineType=8, shift=0)
Getting the boundary boxes is a matter of connecting the dots
for i, box in enumerate(boxes):
img = cv2.rectangle(img, box[0], box[1], (0, 255, 0))
img = cv2.putText(img, str(i), (box[0][0] + 10, box[0][1] - 4) , cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 1)plt.imshow(img
Code
The full code can be found in this repository