Shining a Spotlight on the Data Behind Safety Theater

Robbie Miller
Pronto
Published in
7 min readJun 12, 2019

My April 17th post on how safety theater undermines the development of autonomous vehicles prompted a decent amount of inbound commentary, questions and debate. While much of it was private, it was gratifying and refreshing to engage in a substantive conversation on AV safety, especially with people who have different perspectives. What we are most lacking as an industry is an intellectually honest debate and a path forward on the safe development of autonomous vehicles.

One thoughtful public response was from Brad Templeton in Forbes. Even though Brad challenged some of my data, assertions, and conclusions, I am grateful that he embraced this type of dialogue to elevate the current safety conversation. I’ve followed Brad’s writings closely for several years; he is a high-integrity individual and a respected thought leader in our field. One specific assertion in my post that Brad and others in the media and across the Twitter-verse keyed in on was that, based on publicly available data, level 4 autonomous vehicle testing fleets in California crash more than average drivers. This seems difficult to believe and there are several reasons why we should be skeptical. The conclusion that level 4 autonomous vehicle testing fleets in California crash more than average drivers means that the AV industry is putting the public at risk far more than it should be. I want to unpack my thinking and data more here since this is a serious claim.

To give autonomous test fleets the chance to look their best, I decided to use the SHRP 2 dataset, which was age-adjusted in a Virginia Tech study to account for some age groups being over-represented in the original dataset. This data also tries to account for human drivers severely underreporting crashes. Put simply, the SHRP data makes human drivers look worse than the other credible sources of data on traditional driving. (As others have noted, NHTSA data suggests that humans are actually several times better than this. I’ll stick with SHRP to try and help the prototype robots out so that they look the best as possible when compared to human drivers.) An interesting side note: it was reported last week that GM Cruise was also using the Virginia Tech modified SHRP 2 dataset to define their launch milestones.

The SHRP 2 data was first used to compare human driving to AV fleets several years ago after a University of Michigan study pointed out that Google was involved in a higher than average number of crashes. Google responded by commissioning the Virginia Tech study mentioned in the prior paragraph. That study used SHRP 2 to argue that AV testing fleets were safer than human drivers if we took into account unreported crashes. Below is a graph pulled directly from that study. SHRP 2 Age-Adjusted refers to manual driving. Note that references to “Levels” in the graph below do not refer to the SAE levels of autonomy, but rather types of crashes.

Source: Automated Vehicle Crash Rate Comparison Using Naturalistic Data Level 1, Level 2, and Level 3, refer to types of crashes, not SAE levels. “Level 1” are crashes resulting in injury accidents; while “Level 2 and 3” are crashes resulting in property damage only.

An interesting thing to note is that in both the Michigan and Virginia Tech studies, the safety of AV test fleets is supported by referring to a lack of at-fault crashes involving prototype autonomous vehicles. Such claims have been repeated in many reports and media accounts since then. In my previous blog post, I noted how safety theater uses legal liability rather than preventability to argue safe driving. It turns out, self-driving vehicles are significantly worse at avoiding preventable crashes. We all encounter situations on an almost daily basis where a driver could easily cause a crash, even on purpose, and yet not be legally at fault. Anyone who has had a vehicle in front of them brake unnecessarily hard can confirm this.

In my analysis of the most recent data, I tried to use a nearly identical methodology as the Virginia Tech study. I reviewed each AV crash report filed with the DMV and only counted crashes that occurred in autonomous mode or immediately after disengagement. If an injury was reported, I counted it as an injury accident (Level 1 in the graph above). In the Virginia Tech study, they used data from inertial measurement units to differentiate between crash types defined as Level 2 and Level 3. I combined Level 2 and Level 3 crashes into one field as I did not have access to that data. I did not count incidents where a test vehicle was attacked by angry members of the public (which happens surprisingly often), accidentally struck by a golf ball, or other things that might make you think that an autonomous “crash” was not really a crash. In order to get the crash rate, I added up the Total Miles Driven for each fleet (which is provided in the DMV Disengagement Reports from 2015–2018) and divided by injury crashes and property damage-only crashes.

Despite all this, my analysis still shows that in the years after the Virginia Tech study was published, the reported rate of “autonomous mode” crashes in California more than tripled: it went from 8.8 crashes to 27.2 crashes per million miles by the end of 2018. The national average, according to that study was 20.2. And it’s even worse than these numbers suggest because the 27.2 figure for level 4 fleets includes prior years, when fleets used to crash less frequently; while the 20.2 figure for human drivers has improved over time. Also, my figures don’t take into account that most of the testing in California has occurred in Santa Clara County, which has a crash rate that is less than half of the national average in the comparison I’ve outlined. We haven’t even begun to do this analysis in Arizona and other states where the majority of level 4 prototype fleet testing occurs, because there are no reporting requirements on crashes or miles driven.

Source: Human numbers (blue columns) are based on the SHRP 2 dataset with an age-adjustment made by the VTTI Study to account for the overrepresentation of young and elderly drivers. Self-driving car numbers (green columns) are based on 2015, 2016, 2017 and 2018 California DMV disengagement reports to determine total miles driven and the DMV crash reports.

The implication of the data is that today in California, a professionally-trained driver — whose job it is to monitor the safe road performance of a system that is supposed to be capable of driving more safely than a real person — is actually significantly more dangerous than the average driver. It’s not looking good for 2019, either. We don’t yet have the data for miles driven so we can’t calculate the crash rates, but there has been a noticeable uptick in crashes in the last four months when we look at the number of crashes that have occurred this year compared to last.

Notably, just two companies out of the 62 companies with testing permits in California for level 4 test fleets were involved in the 132 out of 156 reported crashes. These companies also by far drive the most miles in California. The takeaway from this should be that if companies are claiming to be so far along in autonomy, they should hold themselves accountable by demonstrating, in a clear, transparent manner, that they have achieved a certain amount of progress to justify this amount of driving on public roads. I was encouraged to see Brad’s recent article in Forbes calling for test programs driving many miles to go beyond government-mandated metrics and be more accountable in getting safety right. If your test fleets with safety drivers are crashing far more than human drivers, it should be clear that you are nowhere near being able to deploy real level 4 vehicles. As a result, the amount of testing should be limited in quantity and constrained only to areas where testing can be performed at least comparable in safety to a human driver.

We clearly need to shift the thinking around testing from one where companies try to show progress by driving lots of miles, to one where they are incentivized to make as much progress on as few miles as possible. This latter philosophy is what we subscribe to at Pronto. I am really encouraged to see that we are not alone. Aurora Innovations put it well in their safety assessment report: “We keep the fleet only as large as our engineering team requires to develop the driver. Others have maximized fleet size in order to maximize the number of miles traveled, assuming that such a strategy will maximize their learning. We see such a strategy as an unnecessary expansion of the fleet’s driving risk. Such a situation creates an excess of data at little to no value.”

At Pronto, we made a decision to not have a level 4 prototype fleet today at all. We believe that a full level 4 system is many years away. Our engineering team can get all the learning it needs to improve our software by developing a commercial-grade level 2 product, Copilot. We have made a conscious choice to build a product that is designed to augment, not replace, a skilled driver. We believe this sets the right incentives and leads to safer roads — a human driver plus Copilot is safer than a human driver alone.

An attentive human driver (Copilot includes a driver monitoring system to ensure attentiveness) is the central safety feature and requirement for a robust commercial product. This demands a much higher safety bar and keeps the system simple, while still providing incredibly valuable learning (since our customers would be driving their trucks on the exact same routes even if they had not purchased Copilot). At Pronto, we are excited to be delivering Copilot to customers soon and making the roads safer for everyone.

--

--