Analyze Driving Scenes with GluonCV 0.8

Yi Zhu
Apache MXNet
Published in
4 min readSep 15, 2020

Author: Yi Zhu, Applied Scientist at Amazon

Nowadays driving scene analysis models are trending in more applications. For example, generating street scenes in an AI rendered virtual world, building their own self-driving cars, etc. Recently, OpenBot uses smartphones and a small electric vehicle to build a low-cost robot, which can support workloads like pedestrian following and real-time autonomous navigation. Do you want to build some more advanced features for your own applications or robots? Given the surging interest in autonomous driving applications, we include a new task called depth estimation and more semantic segmentation models in this latest GluonCV 0.8 release.

Depth Estimation

Depth estimation is a long standing computer vision task, which is an important step towards inferring scene geometry from 2D images. The goal of depth estimation is to predict the depth value of each pixel, given a single RGB image (monocular setting) or left-right image pairs (stereo setting). Recently, self-supervised depth estimation dominates the field due to its great performance and no need for annotated data. Monodepth and Monodepth2 are important milestones which have established new single camera depth estimation baselines.

Given Monodepth2’s popularity and strong performance, we provide GluonCV implementation of Monodepth2 in this release. We have pretrained models on KITTI and training logs for all three settings: monocular, stereo and monocular+stereo. All results are reproducible and are similar to numbers reported in the original publications.

To get you started, we provide a number of detailed tutorials, such as:

Once you have the estimated depth and trajectory, it will be easier for your autonomous bot to avoid obstacles. Note that monocular and monocular+stereo setting are included after the release date, so if you are eager to try these out right away, feel free to install the nightly version of GluonCV.

Better/Faster segmentation models

We include two new semantic segmentation models in this release, one is DANet, the other is FastSCNN. DANet is one of the state-of-the-art models on several segmentation benchmarks, and FastSCNN is one of the most popular real-time segmentation models. Following table shows their performance on the Cityscapes validation set.

Our FastSCNN model is an improved variant from our recent paper using semi-supervised learning, i.e., the performance of 72.3 mIoU is better than 68.6 mIoU reported in the original paper. To our best knowledge, 72.3 mIoU is the highest-scored implementation of FastSCNN and one of the best real-time semantic segmentation models. It can run at a speed of 80.8 fps on a single V100 GPU card given input of 1024x2048 video sequences.

If you want to train a segmentation model on your dataset/location, we provide tutorials on how to train and test segmentation models. You can also try our semi-supervised learning method to automatically generate pseudo labels, so that you don’t need to label your own data! We have demonstrated strong cross-domain generalization performance in our paper.

Summary

GluonCV v0.8 includes Monodepth2, DANet and FastSCNN to our model zoo which can help you analyze driving scenes for your own applications. You can use these models for much more than just diving scenes though. One example would be indoor scenes analysis. So open up your mind and only sky is the limit for ideas. Please checkout our tutorials and model zoo for more details.

Acknowledgement

We sincerely thank the following contributors:
@zhreshold, @KuangHaofei, @xdeng7, @ytian8, @FrankYoungchen, @bryanyzhu, @Jerryzcn, @yezqNLP, @LauLauThom, @karan6181, @chinakook, @tkhe, @tirkarthi, @mseth10, @ksindwan, @Neutron3529, @Aktcob, @tmyapple, @chongruo, @xinyu-intel

Please Like/Star/Fork/Comment/Contribute if you like GluonCV!

--

--