AI Safety, Leaking Abstractions and Boeing’s 737 Max 8

Photo by Gary Lopater on Unsplash

Present day Air travel is one of the safest modes of travel. Statistics from the US Department of Transportation show that in 2007 and 2016 there were 11 fatalities per trillion miles of commercial air travel. This is in stark contrasts to the 7,864 fatalities per trillion miles of travel on the highway ( You can check the statistics here: fatalities and miles of travel per mode of transport). Incremental improvements to air travel is a marvel of technical innovation. However, when an aircraft accident does occur, we are forced to take notice due to the magnitude of a single event.

Air travel today is at a level of technical maturity that when a plane crashes by accident (i.e not due to man-made causes like terrorism or misfiring of missiles), then it is surprisingly due not to pilot error or physical equipment failure but rather because of a computer error. That is, an aircraft accident is caused by a software bug.

Everyone today is intimately familiar with software bugs. Microsoft blue screen of death and the use of ctrl-alt-delete have been burned into our experiences. Even in better designed operating systems that we find in smartphones, it’s is not uncommon to force a reboot. This is much less common that we often have to look up the procedure, but it does happen nevertheless.

Software is notoriously difficult to make bug free. It is the nature of the beast. This is because, to build bug-free software systems, we need to explicitly list all the scenarios that can go wrong and how, and then test our software for those conditions. Unfortunately, that list tends to be unbounded if our designs don’t restrict the scope of a software’s applicability. In short, software developers are able to manage the unbounded complexity by narrowing the scope of applicability. That is why even the most sophisticated “artificial intelligent” applications work well in the most narrow of areas. It is very easy to get frustrated by the limitations of voice assistants like Alexa. That’s because AI technology has not reached the level of maturity that is required for open-ended general conversation. In short, bug-free depends fundamentally on a narrow scope of application and extensive testing within this narrow scope.

As we build more sophisticated software that has higher degrees of complexity, we need to understand the scope of an application and an ever-increasing scope demands more on the testing of these systems. Thus to understand this complexity better, we need to understand the kinds of automation we are building.

As I mentioned earlier, the USDOT shows that there were over 37,000 fatalities in highway accidents in 2017 alone. Thus it makes logical sense to understand how automation affects the safety of road vehicles. The Society of Automation Engineering (SAE) has an international standard which defines six levels of driving automation (SAE J3016). This is a useful framework for classifying the levels of automation in domains outside that of cars. A broader prescription is as follows:

Level 0 (Manual Process)

The absence of any automation.

Level 1 (Attended Process)

Users are aware of the initiation and completion of the performance of each automated task. The user may undo a task in the event of incorrect execution. Users, however, are responsible for the correct sequencing of tasks.

Level 2 (Attended Multiple Processes)

Users are aware of the initiation and completion of a composite of tasks. The user, however, is not responsible for the correct sequencing of tasks. An example will be the booking of a hotel, car, and flight. The exact ordering of the booking may not be a concern of the user. However, failure of the performance of this task may require more extensive manual remedial actions. An unfortunate example of a failed remedial action is the re-accommodation of United Airlines’ paying customer.

Level 3 (Unattended Process)

Users are only notified in exceptional situations and are required to do the work in these conditions. An example of this is in systems that continuously monitor the security of a network. Practitioners take action depending on the severity of an event.

Level 4 (Intelligent Process)

Users are responsible for defining the end goals of automation, however, all aspects of the process execution, as well as the handling of in-flight exceptional conditions, are handled by the automation. The automation is capable of performing appropriate compensating action in events of in-flight failure. The user however is still responsible for identifying the specific context in which automation can be safely applied to.

Level 5 (Fully Automated Process)

This is a final and future state where human involvement is no longer required in the processes. This, of course, may not be the final level because it does not assume that the process is capable of optimizing itself to make improvements.

Level 6 (Self Optimizing Process)

This is automation that requires no human involvement and is also capable of improving itself over time. This level goes beyond the SAE requirements but may be required in certain high-performance competitive environments such as Robocar races and stock trading.

The automobiles of today have extremely sophisticated software that controls many parts of the functioning of the system. This software works at many levels and at each level the risks are different. Some software works at an extremely narrow scope that we are unaware that it is operating. So for example, a car’s fuel injection system is, in fact, fully automated. We can say this about many of the functions of a car that deals with its engine performance. So for example, many car enthusiasts buy programmers and chips that provide after-market tweaks on a car’s performance characteristics. Failure of any of these kinds of systems can still be fatal. SAE’s standards described above however apply to driving automation and not engine automation. There is a stark difference in automation that affect steering and automation that maintains the smooth running of engines.

Automation such as traction control or car stabilization does affect steering. These are engaged in exceptional narrow conditions to ensure greater passenger safety. Controlled behavior is injected in a situation so that a driver can gain better control of a vehicle that he otherwise could not have done so himself. In this context, a driver is actually momentarily not controlling the car.

There have been many cases of planes falling from the skies due to software bugs. My earliest memory of this kind of a catastrophe is Lauda Air Flight 004 in May 1991. This is when one of the engines reverse trusters engaged in mid-flight forcing the plane to spiral out of control and crash. There was no official conclusion as to the cause, however, the aviation writer Macarthur Job said that “had that Boeing 767 been of an earlier version of the type, fitted with engines that were controlled mechanically rather than electronically, then that accident could not have happened.”

More recently, there is the case of Air France 447 in 2009. The official conclusion was that there was a “temporary inconsistency between the measured speeds, likely as a result of the obstruction of the pitot tubes by ice crystals, causing autopilot disconnection and reconfiguration.” The verdict was that the human pilots were eventually part of the fault due to their inability to react appropriately to the anomalous situation. To say this differently, the pilots received incorrect information from the instrumentation and thus took inappropriate action to stabilize the plane.

There are other cases of computer caused failures. Qantas flight 72, it was determined that the CPU of the air data inertial reference unit (ADIRU) corrupted the angle of attack (AOA) data. Malaysia Air 124 that plunged 200 feet in midflight. The instrumentation displayed that the plane was “going too fast and too slow simultaneously”.

In general, it is the responsibility for the pilots to properly perform compensating actions in the case of equipment failure (known as alternate law). The point though is that computer error due to equipment failure should be no different from regular equipment error and it is the responsibility of the pilots to take appropriate measures. Typically, on equipment error, the autopilot is disengaged and the plane is to be flown manually. This is Level 3 (unattended process) automation where the scope when automation is in play is explicit. In Level 3, a pilot is made aware of an exceptional condition and takes manual control of the plane.

In Level 4 (intelligent process), a pilot must be able to recognize the exceptional condition and is able to specify when automation is applicable. Today, we have self-driving cars that are deployed in narrow applications. We have cars that can self-park and we have cars that can drive in good weather conditions on the highway. These are Level 4 automation where is up to the judgment of the driver to engage the automation. Autopilot in planes are Level 4 automation and is engaged in contexts of low complexity.

Then there is the case of Boeing’s 737 Max 8’s MCAS. This I will argue is a Level 5 automation, this is a fully automated process wherein it is expected to function in all scenarios. Like electronics that control engine performance, fully automated processes aren’t generally problematic, however, when you involve driving (or steering for planes) then it opens up the question of the maturity of this level of automation.

Airbus has what is called ‘Alpha Protection’:

“Alpha protection” software is built into every Airbus aircraft to prevent the aircraft from exceeding its performance limits in angle of attack, and kicks in automatically when those limits are reached.

From the definition, Alpha protection is automation that is always measuring, however, it isn’t always active. It is like a speed limiter that exits in cars today, it is constantly measuring, but is activated only when measurements exceed thresholds. However, what happens when the measurements are incorrect due to faulty sensors? One could argue that this might have been what happened to Air France 447. That is, the automation became active when the pilots did not expect it. Faulty sensors are always problematic, but faulty sensors that can trigger automated behavior can be extremely dangerous.

The Boeing 737 Max 8 has a system known as Maneuvering Characteristics Augmentation System (MCAS). The business motivation behind MCAS is itself quite revealing. Apparently, it is analogous to a software patch that attempts to fix a physical flaw of the aircraft. The Boeing 737 aircraft, introduced in 1968, is an extremely mature and reliable aircraft. The 737 is the best selling aircraft in the world, selling over 10,000 aircraft since its inception. It is has been favored by many short-haul budget airlines that have risen in the past decade. Its main competitor in the Airbus A320, where over 8,000 planes have been delivered since its inception in 1988.

In 2008, a joint American-French company CFM launched a more fuel and cost-efficient engine known as the Leap engine. Airbus fitted their new planes (Airbus A320neo) with this new engine. The reason behind the economy of the Leap engine is due to its much larger air intake diameter.

To be competitive, the Max 8, was retrofitted also with this new engine. However, unlike the A320neo, there was not enough ground clearance for the Leap engine. To compensate for this problem, Boeing reduced the distance between the engine and the underside of the wing. This, however, had the effect of changing the center of mass of the plane. The Max 8 now had the dynamic tendency of raising its nose and as a consequence increasing the risk of a stall.

To paper over this tendency, Boeing developed MCAS. The purpose of MCAS is that it is software dedicated to compensating for this flaw:

Boeing engineers, in turn, came up with another makeshift solution. They developed a software that would work in the background. As soon as the nose of the aircraft pointed upward too steeply, the system would automatically activate the tailplane and bring the aircraft back to a safe cruising plane. The pilots wouldn’t even notice the software’s intervention — at least that was the idea.

Employing software to paper over a plane’s natural instability is not new. Many of the more advanced fighter jets are designed to be unstable to ensure greater maneuverability. The fighter pilots are also trained to anticipate the peculiar flight characteristics of their planes. In contrast, there have been many complaints that pilots of the Max 8 were not properly informed of the existence of the MCAS system:

“There are 1,400 pages and only one mention of the infamous Maneuvering Characteristics Augmentation System (MCAS) … in the abbreviations sections. But the manual does not include an explanation of what it is…”

Perhaps Boeing determined that information about this system wasn’t worth attention by pilots. After all, the intention of the MCAS system was to make the 737 Max 8 to give the same “feel” as the previous model the 737 NG. This is what we call in software circles as virtualization. That is, this is software that renders a virtual machine on the pilot’s user interface to the plane so it feels and acts like another kind of plane (i.e. one that is structurally balanced).

There is a “law” in software development knows as “The Law of Leaky Abstractions” which states “All non-trivial abstractions, to some degree, are leaky.” MCAS is perhaps a leaky abstraction, that is, it tries to creates a virtual abstraction of a legacy 737 NG without Leap engines, to hide an unbalanced airplane. Surely, nothing can leak with this kind of abstraction? It is one thing to abstract away virtual machinery and its entirely another thing to attempt to abstract away physical reality. However, in both cases, something will eventually leak through.

So does the MCAS behave when its abstractions begin to leak? Here is what is reported by pilots of the plane:

“On both the NG and the MAX, when you have a runaway trim stab this can be stopped temporarily by pulling the control column in the opposite direction. But when the MCAS is activated because of a high angle of attack, this can only be stopped by cutting the electrical trim motor.”

How a pilot responds to an abstraction leak can be very different from that of the real thing it is trying to abstract. With faulty sensors, one can turn this off and use one’s understanding of the situation and the plane to make good decisions. However, when one’s understanding of the nature of the plane is virtual and not real, then you just can’t revert to reality. Reality is outside of the pilot’s comprehension and thus a cause to inproper decision making. A virtual trashcan works like a regular trash can in that you can still recover the documents you place in the trash before it is emptied. Reality however is very different than the virtual world, many times there is no undo function!

Then there’s this leaky abstraction when the plane itself has exhibited its own intentions:

But the EFS never acts by itself, so we were astounded when we heard what the real reason was. (…) However, in some cases — as happened on Flight 610 — the MCAS moves by itself.

and this:

MCAS is activated without pilot input and only operates in manual, flaps up flight.

This is because, a virtual abstraction of a real plane, is the same as Level 5 automation! If MCAS is turned off, the pilots will find themselves to be flying an entirely different plane. When you abstract away interaction with reality, you cannot avoid introducing a process that mediates between a pilot’s action and the actual actions of the plane. The behavior of the real plane will depend on the environment that it is in. The behavior of a virtual plane will depend on just the working sensors that are available to render the virtual simulation. Level 5 automation requires a kind of intelligence that is aware of what sensors are faulty and furthermore is able to navigate a problem with partial and unobserved information. The smarts to enable this kind of Artificial Intelligence is simply not available in our current state of technological development.

In short, Boeing has decided to implement technology that is simply too ambitious. Not all software has the same level of complexity. This is not an issue of insufficient testing to uncover logical flaws in the software. This is not an issue of robustly handling sensor and equipment failure. This is an issue of attempting to implement an overly ambitious and thus a dangerous solution.

Air travel is extremely reliable, but introducing software patches as a means to virtualize physical behavior can lead to unintended consequences. The reason that we still fly planes with pilots in them is that we expect pilots to be able to solve unexpected situations that automation cannot handle. MCAS like virtualization, handcuffs pilots from differentiating between the real and the simulated. I would thus recommend to regulators that in the future, MCAS like virtualization should be treated and tested very differently from other automation. They should be treated as Level 5 automation with a more exhaustive level of scrutiny.

Further Reading