Why the Self-driving “March of Nines” is a Paper Tiger

7 min readMay 2, 2019

With speculation swirling around who will be the winner in autonomous vehicles, much has been made about the “march of nines,” which for the uninitiated refers to the long tail of rare corner cases that a vehicle might encounter. Think overturned cars, debris on the roadway, tornados, road defects, and any other rare but absolutely real occurrence that a driver might encounter. Thus, a vehicle that is 99.999% “safe” crashes once every 100k miles on average. Each nine we add after the decimal increases safety by an order of magnitude. A vocal contingent of armchair analysts purport this to be a problem that will yet take many years and up to a decade to solve. I believe that based on an analysis of self driving system architecture, this pessimism is misplaced, and that the march of nines is largely a paper tiger, bearing no sharp teeth.

The Human Analog

Human drivers do a remarkably good job at operating motor vehicles, all things considered. Despite poor reaction time, inability to look in more than one direction at once, distractedness, and several blind spots caused by occluding vehicle structure, humans are involved in an injury collision only about once every 1 million miles, and a fatality only about once per 100 million miles driven. This is an injury collision avoidance performance of 99.9999%, and a fatal collision avoidance performance of 99.999999%. But how can this be? Are human drivers not confronted with rare and unusual scenarios as well? Why are we so adept at this? The answer lies in inference; the ability to react to new and unique scenarios by applying learning from general knowledge we have about the world.

We humans know there is a human in the bear suit, would a self-driving car reach the same conclusion?

By way of example, let’s use a specific scenario of driving to pick up your kids from a friend’s house on Halloween. Naturally, you will encounter many intersections in which children-in-costume are crossing. Some of the costumes are so inventive and outlandish that the children may not even be recognizable as humans; completely obscuring faces, arms, and legs to complete the illusion. But we humans recognize these wacky objects to be children because of our cultural knowledge and context of the ritual. But even more impressively, someone with no cultural knowledge of halloween would still avoid hitting a costumed pedestrian, since the (safe) human driving algorithm is fundamentally quite simple:

Travel at a similar speed to other cars around me
Stay in my lane
Stop for or avoid obstacles in my lane
Obey the posted signs (more or less)
Look ahead for behavior that may require me to take action
Slow down when visibility decreases

Humans can easily recognize obstacles of just about any type, so it would stand that a self driving vehicle needs to follow similar rules, and make material improvements to each element of the driving algorithm in order to increase safety substantially.

Human Weaknesses

Having established the quite impressive statistical performance of human drivers, it’s time we discussed our weaknesses. Human weaknesses can largely be divided into a few main categories:

Failure to see other vehicles/pedestrians
Distraction/Inattention
Reckless/Aggressive driving
Fatigue/Impairment
Incorrect recovery actions

Of these, it is worth noting that autonomous vehicles inherently make massive progress on all of these cases, and in some cases solve them entirely, as in the cases of fatigue/impairment, distraction/inattention, and reckless driving.

National Motor Vehicle Crash Causation Survey (2008)

Neural Network Based Vision Systems

Vision-based autonomous vehicle systems, such as being developed by Tesla, make use of deep neural networks to understand and interpret the content captured by a plurality of cameras in real time, together covering a 360 degree view around the vehicle. From a system level perspective, this architecture improves substantially on the human analog, since there are no blind spots to contend with, and the system is able to “see” all around the vehicle at once. But simply having the 360 video data does not solve the problem; the magic is in how accurately the neural network is able to classify the pixels in the video into representational labels (cars, lane lines, pedestrians, cyclists, curbs, driveable space, etc).

https://distill.pub/2017/feature-visualization/

Neural networks are trained by example; being fed a list of labeled video sequences and self-optimizing to reproduce that labeling knowledge on new datasets. In the case of autonomous vehicles, the labeled video sequences demarcate which pixels in the video represent cars, pedestrians, traffic lights, and so on. When also provided with correct and incorrect driver responses to various scenarios, the neural network also learns how to navigate the world based on these video inputs.

Of course in this model, the network is only able to “see” objects for which there is sufficient training data and labels supplied. Examples of people wearing bear costumes, or washed out roads, for example, have a very small sample size insufficient for meaningfully training the network to reliably recognize them. This is where the assumption of Herculean effort to chase down those examples comes into play. How many miles need to be driven to gather 100 examples of people in bear costumes? A billion? Ten billion? It is this reality that the long-tail thesis of “chasing nines” stems from. And if implemented this way, it is indeed an intractable problem. There are simply too many variations and permutations of objects and obstacles that recur too infrequently to possibly train the network to recognize them all. And yet the system must be capable of performing in these unforeseen and unlearned scenarios.

But what this analysis fails to grasp is that there is more than one way to solve this problem. We have been discussing recognizing an object as the primary method of being able to avoid it. But another approach is simply to be sensitive to interruptions in drivable space. Drivable space is a much better bounded classification problem, as there is no shortage of positive and negative training data than can help train the network to recognize drivable space extremely reliably. An arbitrary obstacle in this case would show up as a “hole” in drivable space that needs to be avoided, with no need to recognize the obstacle with any specificity. Because this approach is generic, is does not need any specific training data to deal with previously unseen examples. All the system needs is sufficient training on driveable space in many locations and weather conditions, so that it can classify everything else as NOT driveable. Additional labels are still needed of course for road signs, stop lights, etc, but there should not be a need to encompass a huge variety of specific training data for rare-occurrence obstacles.

An exaggerated rendering showing the difference between two methods of obstacle avoidance

And this is why I suspect that the problem of the “nines” is overblown, giving confidence to the notion that self driving cars are closer to reality than we expect. As long as the system is capable of distinguishing between drivable road and obstructed road, the specifics of the edge cases are not so important and don’t need to be learned explicitly. Of course there are still some finer points to consider, such as distinguishing between a plastic bag and a trailer hitch, but I believe there will be sufficient training data for these more common occurrences. As fleet learning is able to learn more labels over time, the performance will only increase. But I propose that exhaustive knowledge of all possible objects in the world is NOT a prerequisite for reliable autonomous vehicles.

The Real Threat to Autonomous Vehicle Adoption

If the analysis in this article proves accurate, and corner cases are not the juggernaut they are purported to be, what is the true achilles heel of autonomous vehicle performance? I propose that it is public perception of the failure cases. Even if statistically autonomous vehicles end up being an order of magnitude safer than human operated vehicles by the numbers, the incidents that do occur will undergo much scrutiny. It is of crucial importance that failures of autonomous driving systems be seen as reasonable from the perspective of a human operator. For example, a car going off the road due to an oil slick will likely been seen as unavoidable by most drivers. But conversely, completely missing a red light and blowing through an intersection, or plowing into the back of a stopped car, are failure cases unlikely to be seen as acceptable by users, even if the statistic performance is beyond reproach. Developers of autonomous driving technology should pay careful attention to this psychological phenomenon, and ensure that the human analog behavior of the system is extremely robust.

Why the Self-driving “March of Nines” is a Paper Tiger

Written by Maxwell Andrews