The history of autonomous vehicle datasets and 3 open-source Python apps for visualizing them
Special thanks to Plotly investor, NVIDIA, for their help in reviewing these open-source Dash applications for autonomous vehicle R&D, and Lyft for initial data visualization development in Plotly.
Author: Xing Han Lu, @xhlulu
📌 To learn more about how to use Dash for Autonomous Vehicle and AI Applications, view our recorded webinar with Xing Han Lu, Plotly’s Lead AI/ML Engineer.
Imagine eating your lunch, streaming your favorite TV shows, or chatting with your friends through video call inside the comfort of your car as it manages to drive you from home to work, to an appointment, or to a completely different city hours away. In fact, it would drive you through rain and snow storms, avoid potholes and identify pedestrians crossing the streets in low-light conditions; all this while ensuring that everyone stays safe. To achieve this, automotive manufacturers will need to achieve what is called “level 5 driving automation” (L5), which means that the “automated driving features will not require you to take over driving” and “can drive everywhere in all conditions”, as defined by SAE International.
Achieving this level of driving automation has been the dream of many hardware engineers, software developers, and researchers. Whichever automotive manufacturer capable of achieving L5 first would immediately become ahead of the competition. It could also result in safer roads for everyone by mitigating human error, which causes 94% of serious crashes in the United States. For these reasons, various models, datasets, and visualization libraries have been publicly released over the years to strive towards fully autonomous vehicles (AV). In order to support the development of autonomous driving and help researchers better understand how self-driving cars view the world, we present four Dash apps that cover self-driving car trip playbacks and real-time data reading and visualizations, and video frame detection and annotations.
From KITTI to Motional and Lyft: 7 years of open-access datasets for the AV community
Published in 2012 as a research paper, the KITTI Vision benchmark was one of the first publicly available benchmarks for evaluating computer vision models based on automotive navigation. Through 6 hours of car trips around a rural town in Germany, it collected real-time sensor data: GPS coordinates, video feed, and point clouds (through a laser scanner). Additionally, it offered 3D bounding box annotations, optical flow, stereo, and various other tasks. By building models capable of accurately predicting those annotations, researchers could help autonomous vehicles systems accurately localize cars/pedestrians and predict the distance of objects/roads.
In 2019, researchers at Motional released nuScenes, an open-access dataset of over 1000 scenes collected in Singapore and Boston. It collected a total of 1.5M colored images and 400k lidar point clouds in various meteorological conditions (rain and night time). To help navigate this dataset, the authors also released a Python devkit for easily retrieving and reading collected sensor data for a given scene. It also included capabilities for plotting static 2D rendering of the point clouds on each image.
In the same year, Lyft released their Level 5 (L5) perception dataset, which encompasses over 350 scenes and over 1.3M bounding boxes. To explore this new dataset, they created a fork of the nuScenes devkit and added conversion tools, video rendering improvements, and interactive visualizations in Plotly. In order to encourage the community to leverage the dataset for building models capable of detecting objects in the 3D world, they released a competition on Kaggle with a prize pool of $25,000.
Following the contributions of Motional and Lyft, various automotive companies released their own AV dataset, including Argo, Audi, Berkeley, Ford, Waymo, and many others.
Web-based visualization tools for LIDAR point clouds and bounding boxes
The datasets released by academic and industrial researchers are not only large, but also highly complex and require visualization tools capable of handling their multi-modality. Whereas videos could be effectively displayed frame-by-frame, point clouds and 3D bounding boxes require more sophisticated solutions in order to fully display them. Although you can visualize them in two dimensions using top-down views or by projecting them onto video frames, it’s hard to fully understand everything that’s happening at a given point in time without being able to interact with the data by moving around and zooming in and out.
One library that can be used to solve this problem is deck.gl. Created at Uber and maintained by the vis.gl team, it offers interactive 3D visualization that can be directly integrated with maps through Mapbox. Among the wide range of layer types, you can find point clouds and polygons, which can be respectively used to display and interact with LIDAR points and 3D bounding box annotations.
Although the various tools offered by deck.gl are extremely customizable and can be used in various applications, you will still need to implement everything by yourself in order to build a user interface for visualizing AV scenes. In order to alleviate this process and accelerate the development of custom AV dashboards, the same team behind deck.gl built streetscape.gl (also known as Autonomous Visualization System, or AVS), a collection of React components that lets you build a fully custom AV dashboard in a few hundred lines of JavaScript. When using data recorded in the XVIZ format, you can pre-generate a complete scene and load it into the client’s browser, enabling a smooth playback of the entire segment.
In addition to visualizing point clouds and bounding boxes, you can also create custom objects such as car meshes, record metrics over time like acceleration and velocity, and offer controllable settings to the end user, enabling more fine-grained control directly in the UI.
Although both libraries are extremely useful for AV research, they require the user to have a certain degree of familiarity with JavaScript, React, Node.js, and webpack. This makes it harder for AV researchers and engineers to build customized solutions without a team of developers specialized in front-end technologies, thus slowing down the development cycle of new AV features, and making those visualization tools harder to integrate with existing AV tools and ML libraries already available in Python. For those reasons, we present 3 Dash template apps that can help with R&D of AV software, systems and tools by abstracting, streamlining and unifying the various open-source AV libraries and datasets.
Dash App #1: Autonomous Visualization System (AVS) in Dash
All of Dash’s core components, as well as many popular community-made components are built using React.js, which is the same framework for interfacing with streetscape.gl. Therefore, it was possible to first build a React UI app and then wrap it as a custom Dash component using the Dash component boilerplate. This makes your UIs built with streetscape.gl directly available to Dash apps built in Python, R, or Julia.
For this demo, we built a basic UI inspired by Streetscape.gl’s official demo and contains many metrics and controls to help understand the scene.
In this Dash AVS app, you can play an interactive clip containing a scene, which is a segment of a car trip with data collected from LIDAR sensors, cameras, GPS, the car itself, and from human annotators (e.g. the bounding box annotations). At any time, you can stop the clip to:
- move around by dragging the viewer
- zoom in or out with the scroll wheel
- tilt or rotate by holding CTRL and dragging with the mouse
By using Dash’s own Core Components or community-built components like Dash Bootstrap, you can also create custom components to give the user more control on what is being visualized. You can directly choose various map styles, dataset & scene URLs, and whether you want to use the basic or advanced version of the UI. Given those inputs, you can simply create a BasicUI
or an AdvancedUI
component and pass the URL containing the logs as an argument.
This entire app is written in less than 200 lines of Python code, and can be easily deployed and scaled through Dash Enterprise’s app manager.
Dash App #2: Visualizing the Lyft Perception dataset with dash-deck
Our second app lets you explore a specific scene from the Lyft Perception dataset. You can navigate between the frames by clicking on one of the buttons or by dragging the slider to set the exact time of the scene. Through dash-deck, the interactive viewer displays the point clouds and bounding boxes annotations at a given frame in a similar way to the first app. You can choose between various map views, LIDAR devices, cameras, and toggle various overlays on the image.
The difference between this app and the first app become clear once you look into the implementation details. If you are looking for more flexibility in your visualizations, and want to use a fully Python oriented solution, you might be interested in using pydeck, a Python interface for rendering deck.gl views. Although it requires more tweaking and customization, you can have significantly more control during the development of the app. For example, you can decide which color each point cloud will have, the opacity of the point cloud, the exact shape of a mesh object, the starting position of the camera, and the point of view (which can be orbit, first person or map view).
Another major difference is that Dash AVS requires you to first convert your data into XVIZ before serving or streaming a scene to the user. On the other hand, this app can be easily modified to use any possible data format, as long as you can preprocess the input into the format accepted by pydeck or dash-deck. The reason behind that is that everything is done dynamically and in real-time. In fact, every time the user selects a certain frame, this is what happens:
- The Lyft/nuScenes SDK retrieves the current frame and loads the point clouds, the annotations and the images captured by the cameras
- Pandas is used to preprocess the point clouds
- Pydeck constructs the point clouds and polygon layers
- Pyquarternion projects the point clouds and boxes onto the image
- Dash Deck and Plotly Express respectively render the deck viewer and the image graph
All six of these libraries are accessible through Python. In fact, the first four libraries were not specifically designed to be used in Dash. They just work out of the box because Dash was designed to seamlessly work with most Python use cases. This means that you could easily modify this app to perform real-time processing such as 3D mesh generation or bounding box predictions using PointNet prior to displaying a frame.
Dash App #3: Object detection and editable annotations for video frames
Our third app lets you replay driving scenes from the Berkeley DeepDrive dataset, which contains 1100 hours of driving videos. The video is augmented with 2D bounding box annotations generated by MobileNet v2, which were generated and embedded into the video. In addition to replaying the scene, you can stop the video at any time, run the MobileNet algorithm in real-time and interactive edit or add new bounding boxes at that exact timestamp. Once you are happy with the annotations, you can move on to the next frame or the next scene and download the updated and new annotations as CSV files.
This app is our favorite example of computer vision algorithms being used alongside human annotators to accelerate the data collection process. Since the app is fully built in Python and only uses built-in features from dash-player, Plotly.py, and TensorFlow Hub, you can easily personalize it to use more sophisticated models and fulfill technical annotation requirements that are virtually only limited by your choices of libraries in Python. For example, you could decide to store all the new annotations inside a SQL database (which are automatically included in Dash Enterprise), export them to a cloud storage using a library like boto3, or generate a snapshot of the working session that you can share with other annotators.
More Dash apps for enhancing self-driving cars visualization
In addition to these three apps, we have built various open-source Dash apps that can be used for supporting the development of self-driving cars:
- Dash Deck Explorer (demo — code): This explorer lets you try all possible layers that you can build using pydeck and render with dash-deck. Short description and source code are included!
- Dash DETR (demo — code): This app lets you input an URL for an image and apply real-time object detection using the Detection Transformer (DETR), a neural network model created by Facebook AI Research. The code could be easily reused and integrated into various workflows for 2D object detection, such as detecting pedestrians and vehicles inside images using PyTorch hub.
- Object Detection Explorer (demo — code): Similar to the video detection app, this app lets you replay videos that were previously annotated with a MobileNet detector. Instead of running the model on a frame, it instead reads logs generated by the detector to derive useful insights, such as the number of pedestrians in the image, and a confidence heatmap of objects that are currently present on the screen.
- Dash Uber Rides (demo — code): Monitoring and reviewing self-driving car trips will become an important task in order to ensure the reliability of the AV system in various conditions and locations. This app lets you visualize Uber trips across New York City and offer various controls for quickly narrowing down on a specific subset of the trips.
Are you interested in building applications for autonomous driving, or interested in using advanced visualization components in Dash? Reach out to learn more about how you can take part in the development of next-generation autonomous driving visualization tools!