Simulation Testing’s Uncanny Valley Problem

Published in

Starsky Robotics 10–4 Labs

5 min readMay 17, 2019

Testing against one’s own assumptions creates security theater.

No one wants to be hurt because they’re inadvertently driving next to an unproven self-driving vehicle. However, the costs of validating self-driving vehicles on real roads are extraordinary. To mitigate this, most autonomous developers test their systems in simulation, that is, in virtual environments. Starsky uses limited low-fidelity simulation to gauge the effects of certain system inputs on truck behavior. Simulation helps us to learn the proper force an actuator should exert on a steering mechanism, to achieve a turn of the desired radius. The technique also helps us to model the correct amount of throttle pressure to achieve a certain acceleration. But over-reliance on simulation can actually make the system less safe. To state the issue another way, heavy dependence on testing in virtual simulations has an uncanny valley problem.

First, some context. Simulation has arisen as a method to validate self-driving software as the autonomy stack has increasingly relied on deep-learning algorithms. These algorithms are massively complex. So complex that, given the volume of data the AV sensors provide, it’s essentially impossible to discern why the systems made any particular decision. They’re black boxes whose developers don’t really understand them. (I’ve written elsewhere about the problem with deep learning.) Consequently, it’s difficult to eliminate the possibility that they’ll make a decision you don’t like.

Hence the enthusiasm for simulation, which amounts to a mechanism for AV companies to reassure themselves their autonomy software will perform as they hope. Working in simulated, virtual environments like versions of what you might see in Grand Theft Auto or The Matrix, developers create what’s known as autonomous driving scenarios (ADSs). They’ll model a certain intersection, populate it with vehicles, add pedestrians and cyclists, vegetation, and whatever else. They’ll give their autonomous vehicle an objective. Enter the intersection and turn left, say. Then they’ll let the scenario run. If the AV navigates the task safely, great. Given this set of circumstances, which have been modeled in simulation, the developers can say, we know the system should react in this way in the real world.

But this approach quickly becomes intractable. The planning software has made the appropriate decision in one scenario. But what happens if you change some element? The number of possible combinations that the system might be sensitive to, that would then in turn change the system’s decision, is exponentially large.

And what about the scenarios that are difficult to imagine? To use an overly simplistic scenario, what if there’s a car in an adjoining lane? What if that car is a lane-splitting motorcycle? What does the software do then?

It’s impossible to think of every potential scenario. More scenarios exist than there are grains of sand in the world. No one can model them all — even if you have the backing of one of the world’s richest companies. In fact, the more resources you spend, the closer you get to the fidelity and the feel of the real world, the more confident you become in your software — and the more dangerous your software becomes.

The curve above shows the relationship that exists for testing AVs in simulation and the development cycle. These virtual environments can vary in fidelity. Relatively low-fidelity simulation looks like this. And this is an example of an impressively high-fidelity simulation. As the simulation grows in fidelity it drops into the uncanny valley, where the simulation is superficially good but functionally bad. Later, in theory, there’s an asymptotic line that should provide a benefit again. But no simulation will ever be able to get there. You’d have to be able to model quantum mechanics, and accurately predict human consciousness and decision-making, in all its randomness. The closer the simulation gets to reality, the more false confidence the simulation creates, and the less safety benefit it provides.

Which raises the possibility, in turn, that the billions of dollars the self-driving industry is spending on simulation is so much security theater. Because reliance on high-fidelity simulation is creating a false confidence. It’s not just the practical impossibility of trying to ensure that the self-driving software’s deep learning algorithm reacts appropriately in every conceivable scenario. There’s also a logical impossibility at play. Situations will arise in the future that are impossible to predict, today. No matter how many billion miles you test in simulation, you’re still testing against your own assumptions. That creates a dangerous feedback loop. After a point, showing good results in simulation will not match with good results in real life. Reality, with all its uncertainty, is nearly impossible to model in a simulation.

People understandably want to be reassured of the safety of the self-driving software that is dictating the operations of an 80,000-pound tractor-trailer heading down the highway at 60 miles per hour. That the people who have created that technology have taken every last precaution to ensure it operates the way it should. Pouring money into simulation has a reassuring effect. But the high-fidelity simulations that achieve a Matrix-level of veracity, actually indicate that the developers know less about the system than they should. If someone is heavily relying on the high-fidelity simulation, they essentially are less sure about their system performance than what is safe.

What Works, And What Doesn’t

Here’s a different approach: Rather than solving one of present day’s hardest technology challenges with an unproven method, limit the scope of the problem. Keep humans in the loop for the most complex scenarios. Restrict your operational design domain, that is, the environments your self-driving software handles, to comparatively easy-to-solve situations, like highway driving. And if something complex happens there, appeal to human supervisors.

That means that you don’t need complex deep learning. Which in turn means you don’t require high-fidelity simulation testing, sidestepping the uncanny valley and ending the security theater.

Instead, because you understand every aspect of your technology’s functioning, you can break its operations into a discrete set of modules, which you validate in a series of simple tests just like every other software and hardware product in production today. This results in a more manageable testing pipeline.

This reigned-in ambition will not achieve the highest levels of autonomy. But the highest levels of autonomy isn’t necessary to establish a compelling business case that can be deployed with current technology, today. For example, goods delivery with trucks — and clearly, I have some bias there. All the same, the benefits are hard to deny. Rather than getting distracted by the rabbit hole of high-fidelity simulation, this new approach sees engineers staying focused on providing the safety and mobility benefits of self-driving technology, today.

Kartik Tiwari is the chief technology officer and co-founder of Starsky Robotics.

Simulation Testing’s Uncanny Valley Problem

Testing against one’s own assumptions creates security theater.

Written by Kartik Tiwari