The Very First Standard We Need for Autonomous Vehicles
Nowadays, there are multiple ongoing standardization efforts in the Autonomous Driving Systems (ADS) domain, whose goal is to try to standardize different aspects of ADS technologies, especially safety validation and verification procedures or approaches. Just to name a few examples: ISO PAS 21448, which seeks to address safety issues relating to electronic systems that govern the safe operation of a vehicle, thereby complementing ISO 26262’s functional safety role. Or the recently published UL4600, which describes safety principles and processes for evaluating fully autonomous vehicles.
All these new standardization activities are definitely needed for the successful and safe deployment of highly automated driving functions; however, we should realize that some even more fundamental standards are required, which will not only make ADS technology development more efficient; but will also make safety validation and verification a much more streamlined process. The first thing we need is to give our autonomous vehicles a consistent, coherent and common understanding of the world they are moving into. To do so, we need a standard for perception, starting from data labeling.
Why a standard for perception?
Every ADS is fundamentally built upon its ability to perceive and interpret the world around itself. No matter which sensor suite is adopted — whether a redundant mix of LiDARs, cameras, radars, and infrared sensors, or just a simple dashcam — the first, fundamental goal of any AV stack is to build a “machine mental model” of the various entities populating the surrounding environment. This task implies a good amount of object detection, recognition, and tracking of the movements of the objects over time. Today this is often achieved through Artificial Intelligence so-called “deep learning” techniques that need to be trained on huge sets of labeled data in order to be able to detect and classify objects for themselves.
The data labeling process is of paramount importance in the ADS data pipeline, since it is the first step in which real and “meaningful” information is introduced into “raw” sensor data. During labeling we assign names of categories of the world to pixels and point clouds. Those categories are exactly what the ADS will learn to recognize.
Today every single organization involved in developing an ADS perception stack is using its very own taxonomy of the aforementioned categories, according to their own unique structure. Some teach their system to discriminate between a bus and coaches, others do not. Some see strollers as “personal mobility devices”, others instead prefer to categorize them as generic objects that can be pushed or pulled, and some others make the choice of just labeling them as an “extension” of a pedestrian. These are just a few of a neverending list of examples. Now, the reasons behind some of such categorization choices can be deeply technical, others are more arbitrary. Nonetheless, it is clear how these profound differences in the world descriptions can be the root cause of a diverse set of significant problems.
First, having profoundly different representations of the environment across ADS systems may hamper the ability to deal with some complex driving situations; especially when multiple autonomous vehicles incur simultaneously in the same scenario and each of them has its own specific “inner representation” of it.
Second, this fragmented situation makes effective data sharing nearly impossible. It’s very hard to extend datasets and share them across organizations if each one utilizes different label categories. This is of paramount importance since we already argued about the key role that data sharing will most likely play in the next future of ADS development.
Third, common safety assessments and benchmarking, especially for perception, are held back. It is very hard to score systems on unified metrics when they are designed to perceive and build a “world understanding” which is significantly different from one another.
Having a widely adopted labeling standard for perception can significantly help in solving the aforementioned issues. Moreover, there are also some other significant practical benefits such a standard can bring.
Today there is a huge workforce all around the globe, especially in developing countries, who — aided by AI algorithms — annotate massive amounts of data collected from sensors using a variety of software tooling. The current situation, in which every OEM, AV stack developer, Tier 1 supplier etc. utilizes its own taxonomy of world categories, with their own names and structure, makes data labeling a very fragmented, time-consuming, and costly task. Each project requires a specific set up, a specific labeling workflow, specific human training, and often even the development of ad-hoc tooling features. This translates into high set up times, higher labeling costs, and most importantly, lower labeling quality. A labeling standard could immensely help in streamlining the annotation process, providing well-orchestrated workflows, standard tooling features and overall a much better measurable output quality and improved safety.
One can argue that Autonomous Driving is in a deep R&D phase today, with a lot of experimenting still going on. Some might argue that developing a perception standard now would be premature, potentially slowing down innovation in the industry while posing redundant constraints.
We already partially saw why this is not the case. The data pipeline lifecycles suffer significantly by this labeling fragmentation both in terms of time and costs. Moreover, it often happens that organizations spend a significant amount of time and effort in the internal development of ad-hoc labeling taxonomies and labeling specifications for perception; in a landscape hard to navigate with no clear, widely accepted guidelines or best practices. Having a standard today could provide these organizations with a “North Star” direction while cutting down the development effort.
It is important to start developing this standard today, during this R&D intensive phase. The evolving nature of this phase means things are still relatively easily shapeable and modifiable. If we start thinking about solving this problem once the landscape and the technology in this field are mature and consolidated, it could be too late and the complexity of developing and driving adoption of such a standard can be exponentially higher. The time for a perception standard of labeling is now.
What might a labeling standard for perception look like?
In order to be really actionable and useful, this labeling standard should be built on three key pillars: flexibility, interoperability, and composability. There is the need to incorporate change in the DNA of such a standard, allowing new knowledge, and thus new entities, categories, attributes and so on to be continuously added as the requirements and the level of perception detail mature, without the need to change and restructure the standard foundations. A labeling standard for perception should be an open, living and evolving piece of knowledge that allows anyone building Autonomous Driving Systems to teach their system to see the world under a unified perspective.
Luckily, we do not need to start from scratch. The OpenLABEL open standard development effort is already happening right now under ASAM, a non-profit organization with a long and proven history in the development of automotive standards with a worldwide footprint. We, at deepen.ai are doing our best to put our expertise at work and help drive the leadership of the OpenLABEL standard development. This effort is aimed at providing value to the whole community of stakeholders gravitating around Autonomous Driving and it is very aligned with the World Economic Forum Safety Pool initiative… And you can join us.
Ultimately, the world is one and we should make sure that our autonomous systems share the same baseline understanding of it through consistent representations.