The (Virtual) Road to Safety

Building worlds where it’s safe to learn from mistakes

At the heart of the self-driving revolution is a push for safety. For those who haven’t heard the numbers, it’s a striking figure: over 1.25 million died globally in road traffic accidents in 2013 alone. 94% of those were caused by driver error. In search of a solution to this problem, companies are pouring billions of dollars into AV research.

Current machine learning algorithms must be trained and tested over massive sets of data. However, real-world testing and data collection requires real vehicles, real drivers, and real time. The race to collect real-world miles has led companies to maintain huge fleets of cars, drivers, and the infrastructure to support them. This presents significant logistical and safety challenges, as even a tiny risk of collision becomes large when multiplied over millions of miles. When the scale required to prove AV safety is taken into consideration — upwards of 11 billion miles according to the RAND Corporation — it becomes an intractable problem to drive all of these miles in the real world. As a result, the technology behind video games and movies is powering state of the art simulations aimed at improving the safety and reliability of self-driving cars. But if we need to drive billions of miles in a simulator, how do we build those miles?

If we need to drive billions of miles in a simulator, how do we build those miles?

It takes weeks or even months to build just a few high-fidelity city blocks using the current manual methods, often requiring daunting amounts of repetitive manual labor.

The cost just for building the world and inhabitants for Grand Theft Auto V. The entire game cost ~$265M.

Enter Parallel Domain and procedural content generation. PD’s software automatically generates the virtual miles and scenarios needed by autonomous vehicles to get it right before they hit the real world. Autonomous vehicle companies are using this software to remove the most difficult barrier to large-scale simulation: building the massive variety of complicated environments that a vehicle might encounter.

Why Automatically Generate Virtual Worlds?

There are some distinct advantages to automatically generating these simulation worlds which are unique to Parallel Domain’s approach. With this technology, virtual world generation has been turned into a giant parameter space that can be tweaked at will by an engineer, a few lines of code, or even by artificial intelligence itself:

  • Prepare for the world that could be, not just the world that exists today: Add a bike lane, scatter garbage, repave the road with new asphalt, advance time so that a tree grows into the road or so that the asphalt weathers and cracks.
  • Isolate the effect of each part of the environment: With perfect reproducibility and a parameterized world, generate multiple runs of the same exact scenario but with one element varying— say the number of lanes, width of the bike lane, or condition of the road paint — allowing for the analysis of how these environmental changes impact the performance of vehicles.
  • Adversarial networks: With a parameter space that generates a given world, we can now use machine learning techniques to actually explore that parameter space and how it impacts a vehicle’s performance. We now have the opportunity to use generative adversarial techniques to automatically pit our world generator against self driving cars, making increasingly difficult environments that attempt to exploit that car’s specific weaknesses. Imagine coming in each morning to a generated map of the car’s strengths and weaknesses from yesterday’s changes.
  • Domain randomization: Sometimes it’s just as important to teach machine learning algorithms what doesn’t matter. With a set of parameters to generate the world, it becomes straight forward to utilize domain randomization in generating massive varieties of data in different conditions to help ML algorithms transfer their learning from the virtual to real world.
  • Unit testing: Until now, it wasn’t possible to generate a full continuum of unit tests at scale. For example, having a suite of 3-road junctions with the roads entering at every angle down to 1 degree increments (or 0.1, 10 — try them all!), then the same for 4-road and 5-road junctions. Now make a new variation of all of those junctions with vs. without a bike lane. With traffic lights vs stop signs. And so on … Pretty soon you have millions of combinations that form a continuous space of test coverage.

The Strengths of Simulated Data

In many ways, simulated driving is the ideal foil to real-world driving. This makes the two a great combination. Sometimes we get the question “when will synthetic data be ‘as good’ as real-world data”? Our answer is that in some ways, synthetic data can be better than real data. Synthetic data has some particular strengths where real data is very weak, and vice-versa. They make up complementary pieces of the AV puzzle and we absolutely need both to build the safest, most reliable vehicles possible. Some areas where simulated data provides specific advantages:

  • Real driving risks having accidents / Simulated driving is completely safe
  • Real driving is slow (a car can only drive so many miles per day) / Simulated driving can be fast, potentially thousands of times faster
  • Real driving is expensive per mile (vehicle maintenance, gas, drivers) / Simulated driving is a fraction of the cost (once the simulator and virtual worlds are built)
  • Real driving requires the management of a fleet of real vehicles / Simulation requires a computer
  • Real driving is, as it should be, usually fairly boring and uninformative / Simulations are packed with challenging situations, allowing cars to learn more from every mile driven
  • Real-world data requires error-prone annotation of the resulting data set, one of the largest sources for error in training ML algorithms / Simulations provide perfectly annotated data every time

Why I’m Excited

It’s an incredible time to be sitting at the intersection of computer graphics and artificial intelligence. While virtual worlds have been around for decades in our films, video games, and simulators, we’re passing through an inflection point. These environments are now being used for an entirely non-human audience — to train and test machines that can learn at lighting speed. That brings with it an entirely new scale of demand for simulated virtual worlds and scenarios. Being able to generate these worlds in a truly scalable and flexible way is critical to reducing our time to deploying safe, autonomous transportation. I can’t wait to see what our partners are able to do with this technology — it’s just the first step in a long journey to revolutionize the way we teach AI to help us live better and safer lives.

Stay tuned for deeper dives in future posts! -Kevin


World Health Organization (1.25 million deaths per year)

NHTSA (humans at fault for 94% of accidents)

Driving to Safety, RAND Corporation (11 billion miles)

Gamespot (much more than 1,000 worked on GTA V)