Visual Perception for Self-Driving Cars! Part 3: 3D Object Detection

Learn concepts by coding! Explore how deep learning and computer vision are used for different visual tasks in autonomous driving.

Shahrullohon Lutfillohonov
6 min readAug 29, 2022

This article is part of series. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part5, Part6!

In Part1 and Part2, we talked about 2D Object Detection and 2D Object Tracking for Self-Driving Cars. Although 2D detection methods has showed outstanding results, 3D detection has several advantages over. As it can predict the location, sizes and categories of the objects near a vehicle, more precise information about the environment is obtained for better detection, to estimate the future motion of objects and avoid collisions with better motion planning.

First, we will consider an overview of 3D object detection, then implement one of the SOTA algorithms for 3D detection in Python. Let’s start!

A Brief 3D Object Detection

The main characteristic of 3D object detection is predicting the attributes of 3D objects in driving scenarios using sensory data. Based on sensor types, 3D object detection can be divided accordingly: LiDAR(Light Detection And Ranging)-based, Camera-based or Multi-modal 3D Object Detection. An illustration can be seen in the following figure.

3D object detection for autonomous driving — SourceLink

While multi-modal 3D object detections are simply built-up of several models, it makes sense to talk about Camera and LiDAR based methods in this blog.

Camera-based 3D Object Detection

Cameras are easily available and cheap. And they may collect scene information from a specific angle. Depending of the cameras, it can be Monocular, Stereo-based or Multi-camera 3D object detection. Despite the state of being low-cost and easy accessibility, cameras have significant drawbacks to be widely adopted for autonomous driving. Initially, the 3D structural information of a scene cannot be directly obtained by cameras; they can only gather visual information. Secondly, accurate localization cannot be taken from images. Last but not least, detection is often subject to harsh weather and temporal situations. For example, it is hard to detect objects based on images in rainy or snowy days compared to sunny, bright days.

LiDAR-based 3D Object Detection

On the other hand, LiDAR sensors overcome the limitations by emitting laser beams and measuring the reflective information respectively. A LiDAR sensor produce a range image with each pixel of a range normally has 3 channels: range, azimuth, and inclination in the spherical coordinate system. Range images from the LiDAR are the raw data that could be further transformed into point clouds by converting spherical coordinates into Cartesian coordinates. Both point clouds and range images provide precise 3D data obtained straight from LiDAR sensors. That’s the reason why LiDAR are more suited for object detection in 3D space, and also less susceptible to time and weather variations compared to cameras.

It was a brief introduction to 3D object detection. Please refer to this paper for the full description of 3D Object Detection for Autonomous Driving…

Super Fast and Accurate 3D Object Detection

Enough theory, now let’s get our hands dirty!

We use SFA3D model for 3D detection in LiDAR data. Here is the link to the Original Code on GitHub.

Dataset

For training and testing model, we use 3D KITTI detection dataset. You can download the dataset from the website, bur organizing takes your time. There is another alternative to get the KITTI dataset from Kaggle. Download it from this link or simply via your terminal if you have Kaggle API .

kaggle datasets download -d garymk/kitti-3d-object-detection-dataset

Create new environment and install dependencies

It is helpful to create virtual environment to manage dependencies and isolate our project.

# Create new conda environment
conda create -n (your env name) python=3.9

then activate

# activate the conda environment
conda activate (your env name)

Now, let’s clone the repository and install requirements

git clone https://github.com/maudzung/SFA3D.git SFA3D
cd SFA3D/
pip install -r requirements.txt

Please make sure that you construct the source code & dataset directories structure accordingly:

${ROOT}
└── checkpoints/
├── fpn_resnet_18/
├── fpn_resnet_18_epoch_300.pth
└── dataset/
└── kitti/
├──ImageSets/
│ ├── test.txt
│ ├── train.txt
│ └── val.txt
├── training/
│ ├── image_2/ (left color camera)
│ ├── calib/
│ ├── label_2/
│ └── velodyne/
└── testing/
│ ├── image_2/ (left color camera)
│ ├── calib/
│ └── velodyne/
└── sfa/
├── config/
│ ├── train_config.py
│ └── kitti_config.py
├── data_process/
│ ├── kitti_dataloader.py
│ ├── kitti_dataset.py
│ └── kitti_data_utils.py
├── models/
│ ├── fpn_resnet.py
│ ├── resnet.py
│ └── model_utils.py
└── utils/
│ ├── demo_utils.py
│ ├── evaluation_utils.py
│ ├── logger.py
│ ├── misc.py
│ ├── torch_utils.py
│ ├── train_utils.py
│ └── visualization_utils.py
├── demo_2_sides.py
├── demo_front.py
├── test.py
└── train.py
├── README.md
└── requirements.txt

Visualize the dataset

Visualize 3D point clouds with 3D boxes to make sure our data and source code is working fine so far:

cd sfa/data_process/
python kitti_dataset.py

(Please modify line 87 in sfa/data_process/kitti_bev_utils.py with the following if there is any error)

cv2.line(img, (int(corners_int[0, 0]), int(corners_int[0, 1])), (int(corners_int[3, 0]), int(corners_int[3, 1])), (255, 255, 0), 2)

here is the result

Visualization of 3D point clouds with 3D boxes — Image by Author

Model Training

Now it is time to train model. Refer to this link to see all arguments for training model. Run this for single machine, single gpu. It runs over 300 epochs in default, you can change it at your convenience.

python train.py --gpu_idx 0 --batch_size 64 --num_epochs 300

or this one for distributed data parallel training

python train.py --multiprocessing-distributed --world-size 1 --rank 0 --batch_size 64 --num_workers 8 --batch_size 128

Inference

For custom trained model

python test.py --pretrained_path (path for trained model) --gpu_idx 0 --peak_threshold 0.2

or the pre-trained model was pushed to this repo and use it if you do not define anything

python demo_2_sides.py --gpu_idx 0 --peak_thresh 0.2

The inference data will be automatically downloaded after the above command.

Let’s see the result.

Custom Inference of 3D Object Detection Model on LiDAR Data — Image by Author

Our trained model shows decent precision and it is working smoothly.

Conclusion

Today’s blog, we briefly talked about 3D object detection and its usage depending on sensor types. Because the LiDARs can detect not only objects but also do mapping and planning by generating 3D maps and point clouds, they are robust to different weather and time constraints which makes them dominant for detection in autonomous driving. Recent advances in LiDAR technology enables self-driving cars to function with high accuracy and precision.

References

[1] Computer Vision: Algorithms and Applications (Texts in Computer Science) 2nd ed. 2022 Edition

[2] 3D Object Detection for Autonomous Driving: A Review and New Outlooks

[3] Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds

[4] The KITTI Vision Benchmark Suite

I hope you enjoyed reading. If you have any question or suggestion, please feel free to leave a comment. You can also find me on LinkedIn or email me directly. I’d love to hear from you!

We will discuss further more on visual perception for self driving cars in the following posts.

--

--