Potential Applications of Semi-Supervised and Self-Supervised Learning for Automated Map Making and Autonomous Vehicles — CVPR 2021

Use Unlabeled Data in Machine Learning

--

Authors: Dr. Xiaoying Jin and Dr. Sanjay Boddhu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision and machine learning conferences in the world. In a previous blog, we discussed the recent improvements in perception proposed at CVPR 2021. In this blog post, we highlight some trends and advances in semi-supervised learning and self-supervised learning, along with their potential applications for automated map making and autonomous vehicles.

Perception is a key component in automated map making and autonomous vehicles. In autonomous vehicles, multiple car-mounted sensors (optical cameras, LiDAR, and radar) are used with AI and machine learning methods to extract static and dynamic objects such as signs, lane markings, pedestrians, cars, etc. In automated map making, we use multi-source data to create a digital representation of reality. The multi-source data for map making includes crowdsourced OEM sensor data, industrial-capture vehicle sensor data (LiDAR and street-level imagery), overhead imagery, dashcam videos, and other sources of street-level imagery. Lane markings and road boundaries are used to build a lane model. Lane models, together with signs, poles, and traffic lights help with vehicle localization. Features such as signs, lane markings, traffic lights, stop lines and crosswalks are useful for the Advanced Driver-Assistance System.

Source: The Self-Healing Map from HERE Technologies

Deep learning object detection and segmentation for perception in production usually use supervised learning and requires a huge amount of manually labeled data. In automated map making and autonomous driving applications, labeled data can require millions of images, or more. However, copious amounts of labeled data are very costly to collect, and this limits the applications of supervised learning.

By contrast, humans can learn from only a few examples of a class to begin properly recognizing new examples of that class. In addition, humans can learn general representation of the data such as features, structure, similarity/dissimilarity of the data.

Inspired by human vision, semi-supervised learning and self-supervised learning have been recently hot topics for computer vision. Semi-supervised learning is a machine learning approach that combines a small amount of labeled data with a large amount of unlabeled data during training. Some successful semi-supervised learning methods are based on teacher-student models. The teacher model generates pseudo labels for unlabeled data. These pseudo labeled data are then combined with labeled data to train the student model. Self-supervised learning is a means for training computers to do tasks without humans providing labeled data. It is a subset of unsupervised learning where outputs or goals are derived by machines that label, categorize, and analyze information on their own then draw conclusions based on connections and correlations. Some of the popular self-supervised learning methods are based on representation learning. In this vein, below are extracts of some recent improvements proposed at CVPR 2021.

Meta Pseudo Labels (paper and code)

👏 Meta Pseudo labels is a semi-supervised learning method developed by Pham et al. at Google AI. Meta Pseudo Labels achieves ✨a new state-of-the-art top-1 accuracy of 90.2% on ImageNet✨.

Instead of having the teacher model fixed, the key idea in Meta Pseudo Labels is that the teacher learns from the student’s performance feedback on the labeled data to generate better pseudo labels to best help student’s learning. Meta Pseudo labels achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet by using the ImageNet labeled data and the extra JFT 300M+ unlabeled images for training.

Highlights:

✅ Meta Pseudo Labels is a semi-supervised learning method which can leverage huge amount of unlabeled data with a small set of labeled data for training.

✅ The teacher is trained along with the student based on the performance of the student on labeled data.

✅ Meta Pseudo Labels achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the previous state-of-the-art.

✅ For standard low-resource benchmarks such as CIFAR-10–4K and SVHN-1K, Meta Pseudo Labels even outperforms the supervised learning on full dataset.

Source: Pham et al. Top-1 and Top-5 accuracy of Meta Pseudo Labels and previous state-of-the-art methods on ImageNet with extra data.

Exploring Simple Siamese Representation Learning (paper and code)

👏 Chen et al. at Facebook AI explored Simple Siamese networks (SimSiam) and showed their effectiveness for self-supervised representation learning. The paper received ✨the best paper honorable mention✨ in CVPR2021.

Siamese networks have become a common structure in various recent models for self-supervised representation learning. Siamese networks are weight-sharing neural networks applied on two or more inputs to compute comparable outputs. Giving two augmented views of one image, the recent models maximize the similarity of the corresponding two outputs. Previous Siamese architectures such as SimCLR, SwAV, and BYOL rely on negative samples, large batches, and/or momentum encoder to prevent collapsing. The Simple Siamese networks (SimSiam) proposed by Chen et al. can learn meaningful representation without any of the above strategies. The stop-gradient operation plays an essential role to prevent collapsing in SimSiam.

Highlights:

✅ SimSiam with simplified designs are capable of self-supervised representation learning without collapsing.

✅ SimSiam has the highest accuracy under 100-epoch pre-training on ImageNet linear evaluation comparing with SimCLR, MoCo v2, SwAV, and BYOL.

✅ SimSiam’s representations are transferable to other tasks such as VOC object detection and COCO object detection and instance segmentation. It is competitive among leading methods.

✅ All Siamese network-based methods in comparison to other methods, were highly successful for transfer learning. They are superior or comparable to ImageNet supervised pre-training counterparts in all tasks. Their general success suggests the fundamental role of Siamese networks in representation learning.

Source: Chen et al. Comparison on Siamese architectures. The dash lines indicate the gradient propagation flow. The components in red are those missing in SimSiam.

Want to know more about AI & Machine Learning in Automated Map Making? Follow us and Machine Learning & AI in Digital Cartography.👈

--

--

Xiaoying Jin
Machine Learning & AI in Automated Map Making

Senior AI/ML Engineering Manager at HERE Technologies | AI/ML/DL | Perception and Computer Vision | Geospatial | Autonomous Vehicles