When Software Kills
Software bugs, errors, and oversights have been blamed for hundreds of deaths in recent history. What are the takeaways for software engineers?
The preliminary investigation into two fatal plane crashes involving Boeing 737 aircrafts has brought to light that the company’s proprietary software may have been at least partially responsible for the crashes, in which 346 people were killed.
This is not the only time in recent history that software bugs, errors, or oversights have been blamed for catastrophes. In some cases, hundreds of millions of dollars were lost. In other cases, lives were lost.
How do we ensure that safety-critical systems are supported by reliable software? Is human error always to blame in the event of a software failure? What are the takeaways for software engineers?
A brief history of catastrophic software bugs
Between 1985 and 1987, at least 5 patients were killed (and others critically injured) when a software-controlled radiation therapy system, the Therac-25, inadvertently administered massive overdoses of radiation — over 100 times the prescribed dose.
A similar event happened in 2000, when software written by the American company Multidata caused dozens of Panamanian patients to receive huge overdoses of radiation, 5 of whom died as a result. A total of 9 affected patients died over the next few years, likely as a result of the same overexposure to radiation. The bug happened when different doses of radiation were administered depending on the order in which data was entered into the system — an error that was exacerbated when doctors looked for loopholes to adapt the software’s usability. The doctors who failed to double-check the software’s calculations were indicted for murder.
In 1991, a ballistic missile struck a U.S. army barracks in Saudi Arabia, killing 28 and injuring 96. A software error prevented the missile from being intercepted.
In 1997, a poorly programmed ground-based altitude warning system was deemed partially responsible for a Korean Air crash that killed 228 people. The bugs in the system were acknowledged and corrected by the FAA. Although human error was also a factor in this crash, the ground-altitude warning system failed to alert air traffic controllers that the aircraft was dangerously close to the ground.
In 2018, a software miscalculation in one of Uber’s self-driving cars caused the death of a pedestrian. The car’s sensors did, in fact, detect the pedestrian, but decided that she was a “false positive” — as benign as an empty soda bottle in the road — and kept driving.
Some humans have managed to outsmart buggy software, but this isn’t always possible. In 1983, this man flagged an incoming missile alert as a false alarm after a Soviet early warning system reported that the U.S. had fired 5 missiles at the U.S.S.R. But can we really expect all users to identify software bugs and to make appropriate accommodations, especially when safety is on the line?
Who is really responsible?
A few factors contribute to software-induced catastrophe, but none are as urgent as the inability to report potentially serious bugs and, perhaps, the lack of due diligence in triaging incoming bug reports.
Consider Apple’s FaceTime bug, which allowed users to eavesdrop and spy on others while waiting for them to answer a FaceTime call. The woman who made the report found it so difficult to get in touch with Apple’s security team that she nearly gave up. It wasn’t until a developer wrote about the bug in a now-viral post that Apple took action.
I looked for ways to report software bugs at several large companies — Chase (which handles sensitive data) and Dodge (which has millions of cars on the road) were two of them — and I found no way to make a report.
While bugs in Chase’s software might not present an immediate safety risk to users, a bug in Dodge’s software could put users in danger.
And let’s face it — you can’t exactly call customer service and ask to speak to a software engineer.
We can rely on user reports to learn more about minor bugs and security issues, but what happens when we’re working on large-scale systems that hold human lives in the intricacies of their code?
Should we expect pilots to notice and report bugs or oversights in Boeing’s in-flight automation systems, or should software engineers be expected to conduct frequent, thorough testing and user outreach? Can we expect engineers to guess which features might need an additional set of eyes? Had pilots previously reported having difficulty controlling Boeing 737 aircrafts?
Should those doctors have been indicted for murder if the error in administering the proper dose of radiation was in the code? Are we responsible for testing software as we intend it to be used, or as users might try to use it?
Was there any way for the doctors to request a new feature in their radiation administration software instead of trying to bypass its restrictions? Are engineers responsible if users are able to tamper with software?
Should we expect engineers (who specialize in software development) to identify and question oversights or missing/poorly implemented features that are requested by clients (who specialize in whatever the software will be used for)? How would a developer know that a doctor might try to do X with this software?
Is there enough communication between developers and users of proprietary software?
We can say that users are responsible for what they do with our software, but what happens when safety is on the line?
Who is last in the line of defense?