As always, thanks for reading. Want Snippets delivered to your inbox, a whole day earlier? Subscribe here.
This week’s theme: a counterintuitive lesson about technology and safety in complex systems that we can learn from the Boeing 737-MAX grounding.
As you may have likely heard already, there was an awful tragedy followed by some pretty intense drama in the aviation world this week, following an Ethiopian Airlines crash that killed everyone on board. The main character here is a particular airplane: the 737 MAX, a new updated version of Boeing’s perennially popular airplane that’s only one year old, that has already seen two fatal crashes in less than a year of global operation — a very red flag in an industry that puts safety above all else, and with a track record to show it. Over the following week, most other countries quickly grounded the airplane, while the US and Canada lagged for several days until eventually grounding it in turn.
Today in Snippets we’re going to revisit an old paper that we talk about on at least a yearly basis, and many of you will remember, because it’s just that good — a short but intensely useful paper on safety, technology and human operation in complex systems that must defend against catastrophic failure. Hopefully by the end of this, you’ll come around to an interesting and counterintuitive conclusion about these airplanes and safety, which is not the angle we’re hearing from the media, but instead is what we’re hearing from pilots and other professionals who have actually had to deal with these kinds of safety issues in real life.
Flying is inherently hazardous. But we do a good job: airplane crashes are rare nowadays. This is not because airplanes are problem-free. It is because we’ve evolved a series of heavily guarded defences against those problems, and pilots have a good understanding of what those problems and defences are. Overall, aviation has a culture that takes safety extremely seriously. Part of this culture is balancing many different kinds of “failure”. Anyone who flies on a regular basis will tell you that small failures happen all the time. Planes have mechanical problems; sometimes there isn’t the right pilot for the right plane on time; computer systems can break down. The airline industry does not function as a flawless system; it functions because of a combination of people, technology, training, regulation, and redundancy that defend against failure. So then what happened with this new plane, and why are people so upset?
Some background knowledge that’s important context: the 737 is a very popular airplane. Part of the reason that airlines like Southwest fly only 737s is so that any pilot in the 737 roster is capable of flying any airplane safely. If you’re at the airport and look out on the taramac, you’ll see a good number of 737s. They’re pretty easy to recognize, because they have one visual characteristic that’s quite funny looking: how low to the ground they sit. It’s a squat little airplane, particularly compared to its rival counterpart the Airbus A320, and the wings are just high enough off the ground to give clearance to the engines which hang beneath them. Jet engines used to be longer and thinner, so this wasn’t an issue; more recently, airplane engines tend to take up as much space as they can get away with.
Recently, airlines have been putting pressure on Boeing: they need an airplane to fly 737-type routes, and they need it to have better fuel economy. Boeing has two options here. It can either design a new airplane from scratch, or it can continue to release new versions of the 737 that have incremental improvements to its fuel economy and other things. For both the airline’s perspective and Boeing’s, upgrading the 737 is quite a bit more attractive. Boeing knows how to make 737s reliably and profitably, and airlines already have pilots who are trained to fly on 737s and mechanics who know how to keep them running safely. So Boeing set out to do yet another version of the 737, with upgraded components and better gas mileage.
If you want to improve the fuel economy of an airplane, there are some little tweaks you can do (adding little aerodynamic “winglets” to the tips of the wings is one way to squeeze one or two percentage points of gas mileage), but by far the most effective way to improve fuel economy is to upgrade the engine. As mentioned before, you may have noticed that airplane engines seem to be getting bigger and bigger, and that’s not an accident: larger engines that can draw in more air will run more quietly and burn less fuel, both of which are good things. On other airplanes, it’s pretty easy to make a “new engine option” version of the airplane (That’s why you see “Airbus A320-NEO” and the like on newer planes.) where you simply swap in a new engine and make necessary adjustments. But on the 737, you can’t do that: the plane is too low to the ground! So Boeing shifted the engine into a new spot along the wing, in order to give the engine more ground clearance and accommodate the airplane’s increased length. Problem solved.
As you might expect, this created a new problem. Engines are heavy, so shifting them from one place to another along the wing will have a pretty meaningful impact on how the plane flies. Now here’s where the human element of the system comes in: pilots and pilot training. One of the biggest concerns that pilots must continually watch for when flying a plane is “stalling”. Stalling is not an engine problem like in a car, but rather something that happens when the plane hits a particular angle against the oncoming rush of air, usually while climbing, where the air stops flowing over the top of the wing’s surface. Given that this rush of air around the wing is where lift comes from, you’ve got a real problem: the plane will start to fall quickly. That’s bad. As you’d expect, airplane manufacturers and pilots have evolved numerous, redundant defences against this happening, both in technology and training.
Now, Boeing seems to have faced a dilemma here: airlines wanted a new airplane with better fuel economy, and the appeal to stick with Boeing’s 737 was partially due to their pilots already knowing how to fly the plane safely. But in order for Boeing to make a new 737 with bigger engines and a more passengers, they needed to meaningfully shift the weight distribution of where the engine sits on the wings. That shift in weight distribution didn’t interact nicely with pilots’ training and instincts around how a few aspects of flying, particularly around climbing safely without stalling. But aside from that, the plane flew pretty similarly. To make pilots go through a complete re-training as if it were a new aircraft seemed both pointless and possibly dangerous: pilots already knew how to fly the existing 737 quite safely. So Boeing decided to do something a little too clever: it wrote some software that made the new 737s feel like the old ones. Specifically, it included a safety system designed to continually nudge the nose of the plane downwards if it sensed that the plane was losing lift during takeoff. Viewed on its own, this isn’t all that out of the ordinary. Planes are mostly flown by software these days, through “Fly by Wire” systems designed with multiple layers of safety and redundancy. So what happened?
Five months ago, in the tragic Lion Air crash that marked the MAX 8’s first catastrophe, what appears to have happened is this: a faulty sensor was apparently giving incorrect readings about the plane’s Angle of Attack, which is the crucial variable that matters in terms of lift and stalling for the airplane. So the plane should never have taken off to begin with. But they did anyway; and while climbing, the incorrect readings led the airplane’s autopilot system to (incorrectly) think it was about to stall, prompting the system to respond by automatically pushing the plane’s nose down. But what happened then? The pilots, not wanting the plane’s nose to drop, fought back against this maneuver by pulling the plane back up, which was interpreted by the autopilot system as “the intervention wasn’t successful; try harder.” So the autopilot forced the plane back lower, leading to a nightmare tug-of-war scenario between the pilots and the plane which the plane eventually won: it entered an uncontrollable nosedive and plunged into the ocean, leaving no survivors.
Tragically, this should entirely have been preventable: had the pilots simply turned the autopilot system off, they could’ve manually taken over and quickly recovered the airplane. Had they been trained better on the new system, they would’ve known to ignore it initially. And had the airline replaced the faulty component, the tragedy would similarly have been avoided. The catastrophic failure came out of the interaction of three broken system components: one a failure of technology, another a failure of operation, and a third failure of training. Most of all, one question loomed over all the other ones: how could a crucial component like this sensor not have any redundant backups? Well, there was a backup — the problem is that the backup was supposed to be the pilot throwing a kill switch. And the pilots didn’t know.
Following this crash, other pilots weighed in, saying that they too were concerned — both by this new unfamiliar behaviour their planes were exhibiting, and also by the sparse documentation and awareness of what to do in such circumstances. As one pilot wrote in a flight report: “I am wondering if any other crews have experienced similar incidents with the auto throttle system on the MAX? Or I may have made a possible flying mistake which is more likely. The [First Officer] was still on his first month and was not able to identify whether it was the aircraft or me that was in error. … The fact that this airplane requires such jury rigging to fly is a red flag. Now we know the systems employed are error-prone — even if the pilots aren’t sure what those systems are, what redundancies are in place and failure modes. I am left to wonder: what else don’t I know?”
So we can start to understand the real problem, which is actually far more counterintuitive than widely been discussed by the public. The story that’s been going around in newspapers and the media has largely been variants of “These planes use too much technology, rather than trusting pilots to fly the plane like in the old days.” In fact, the opposite may be more true: “These new planes rely too much on the pilots for a critical element of their safety!” How so? Well, the critical relationship here is between angle of attack sensor and the automatic pitch adjustment of the aircraft: in a true fly-by-wire system, there would be layers and layers of redundant, electronic safety guards to prevent malfunctioning. In this case, the redundant safety element is the pilot themselves: if they feel the system fighting them or malfunctioning in any way, they just flip a switch and turn it off. Boeing, it appears, committed a grave error — not in trusting technology too much, but in trusting pilots without telling the pilots that they were such a critical part of the system. So if we were coming at it from a pure safety point of view, the correct move might not be to ground the planes — it might be to ground the pilots, at least until they got retrained and re-certified as a safe part of 737-MAX operations. (It also doesn’t help that Boeing had a planned technical fix to address some of these issues in the works, but it got delayed 5 weeks due to the government shutdown. Worse, Boeing did in fact build a sensor to alert pilots as to whether the angle of attack sensor was potentially malfunctioning — but it didn’t come standard. It came as a part of a paid upgrade package. This really pissed people off.)
As a general lesson for the tech industry, and for anyone in charge of developing and running software that operates in complex environments for human operators: safety is a property of systems and the people who operate them, not a property of technology. In the case of the 737-MAX, the problem was that new technology made the human operators less safe while simultaneously relying on them for redundant safety, a critical sin in system design. On a personal note, the most recent flight I took just a couple weeks ago was on a 737 MAX — my flight home from the Bay Area to Toronto is usually on a Dreamliner, but if I take the afternoon flight instead it’s on a MAX aircraft. It’s anybody’s guess what plane I’ll be on next time.
Can Boeing trust pilots? | Mac McCellan, Air Facts Journal *This post is very helpful in particular. Read pieces written by actual pilots!
You may have noticed a trend in many of these recent must-read featured articles, which is that a lot of them are by Taylor Lorenz. Her beat is honestly so much fun: understanding the way that kids and teenagers use social media, and how their rapidly morphing social media lives and personalities is in many ways a kind of ‘adaptive defence mechanism’ that we just totally lack as adults. This week, a pretty awesome piece about how necessity is the mother of invention, and Google Docs became a stealth chat app. (Imagine how frustrated the Google+ team must feel!)
Also, some fascinating updates about Amazon, particularly in light of last week’s proposal from Elizabeth Warren to go break them up. Be sure to check out Zack Kanter’s very long but excellent piece (written by someone who has used Amazon as a customer more thoroughly than just about anyone I know) about where exactly Amazon’s future as a retailer sits today:
You’ve probably already seen this, but the college admissions cheating story that got blown open this week really has everything: graft, celebrities, and gross behaviour all around.
A pretty neat development out of BU: a new material, when shaped in a particular way, cancels sound to a nearly absolute degree.
Hudson Yards, the new mega-development nearing completion in New York City, has its share of critics:
Other reading from around the Internet:
And just for fun:
In this week’s news and notes and Social Capital, the environment has been our mind a lot lately, with climate change being a pressing issue that many portfolio companies, like Droneseed, Aclima, and others are working on. One company that’s perpetually breaking new ground in our understanding of climate science and actionable options is Saildrone, whose Antarctic circumnavigation we talked about the other week is making a real impact on the way we understand our oceans and how they absorb and release carbon dioxide. They recently wrote up a helpful primer on this relationship and how Saildrone is helping, which we’re going to repost here:
The 2019 Saildrone Antarctic Circumnavigation, the first autonomous circumnavigation of the Southern Ocean, endeavors to accomplish a significant list of science objectives, in collaboration with leading research agencies in the US, Europe, and Australia. The Southern Ocean accounts for approximately 40% of the total ocean carbon uptake, but only 20% of the surface area. Vast areas of the Southern Ocean remain unsampled, especially during the stormy autumn and winter seasons when ship-based observations are particularly difficult. Shifts in winds and circulation around Antarctica have already been shown to alter the amount of carbon dioxide uptake from the atmosphere. A full year of observations made with Saildrone unmanned surface vehicles (USVs) could provide critical data about how the region is changing, as well as the biological and physical processes driving those changes.
Scientists from the National Oceanic and Atmospheric Administration (NOAA) and Commonwealth Scientific and Industrial Research Organisation (CSIRO), among others, will use the data collected by Saildrone to study carbon uptake in the Southern Ocean.
Wind and solar-powered saildrones are equipped with GPS and navigational instruments that make them capable of autonomously sailing a set course of waypoints, as prescribed by the Saildrone Mission Control in Alameda, CA. Saildrones are designed for deployment up to one year and return to port on their own. Minute level data is transmitted in real time; second per second data is downloaded upon mission completion.
“Over the past 20 years of making ship-based measurements, we’ve learned that there’s a lot more variability in the amount of carbon the Southern Ocean can take up than we’d previously realized. We need more information to understand the regional changes, and how carbon uptake is changing year to year, but we can’t get that with ships alone,” said Dr. Bronte Tilbrook, a biogeochemist studying ocean acidification and the global carbon cycle at CSIRO. “The advantage of the saildrones is that they can go to areas where there have been very few ship observations. We’re sending the saildrones into regions we just couldn’t before. It’s quite significant.”
Saildrone USVs carry a suite of science sensors to collect in-situ data above and below the surface of the water including air and skin temperature, relative humidity, pressure, Chl-a, salinity, and pH. The ASVCO2 developed by NOAA allows us to measure the difference in the partial pressure of CO2 in the atmosphere and surface ocean, and this is used to calculate the amount of CO2 being absorbed or released by the surface ocean.
Over the course of the 270-day Antarctic mission, saildrones will periodically rendezvous with surfacing SOCCOM floats. SOCCOM floats are deployed from a ship and active for approximately three years. The saildrone will perform cross-validation sampling as close to the SOCCOM float as possible in terms of time and distance.
“The floats are suggesting that the wintertime conditions of CO2 uptake are changing quite a bit more than we understood. There’s an interesting set of data starting to emerge, showing that the overall Southern Ocean sink is more variable in the winter. The floats provide an opportunity to get some data, but we really need to verify it, and we can only do that by making independent measurements when they come up to the surface,” said Tilbrook.
Saildrones and SOCCOM floats present two very different sampling strategies: The saildrone is focused on air-sea interaction at the surface, and the float is focused on sub-surface measurements of the water column. The floats measure the pH of the water and infer the partial pressure of carbon dioxide. The ASVCO2 on the saildrone measures atmospheric CO2; an equilibrator pumps air through the surface of the seawater to bring the air and water into equilibrium in terms of CO2 for a short period of time.
“This mission is a preview of the kind of multi-platform, long-term observing system we could envision for the Southern Ocean. The SOCCOM floats have given us an unprecedented amount of data in this region, which challenged a lot of our assumptions about the ocean CO2 sink. But have conditions in this region the last few years been an anomaly, or not? Continuous, long-term observing is one way to find out,” said Dr. Adrienne Sutton, an oceanographer with the NOAA Pacific Marine Environmental Laboratory (PMEL) Carbon Group. The PMEL Carbon Group has been involved in all Saildrone missions related to CO2 to date.
The Saildrone Antarctic Circumnavigation is one of several ongoing and recent missions related to carbon uptake and data validation. On January 30, Saildrone launched a USV in Newport, RI, on a 30-day mission to study heat transfer and carbon flux in the Gulf Stream and in June 2018, the Saildrone Baja Campaign studied upwelling and frontal region dynamics, air-sea interactions, and diurnal warming effects along the US/Mexico coast to Guadalupe Island and assessed the Saildrone platform for satellite data accuracy and model assimilation.
Have a great week,
Alex & the team at Social Capital