Disaster Special: Jakarta EE, Boeing, Equifax, Firefox, and more
CodeFX Occasionally #71 — 11th of May 2019
three newsletters in a row, I’m on a roll. And this one isn’t even egomaniacal! If not about me, it discusses a related topic, though — disasters.
- Jakarta EE in much detail
- Boeing 737 MAX in detail
- Facebook, Equifax, Volkswagen from 10'000 feet
- Firefox, Amazon as TL;DR
I send this newsletter out some Fridays. Or other days. Sometimes not for weeks. But as an actual email. So, subscribe!
I published the part about Jakarta EE on my blog.
On my way to JEEConf I finally had time to read this:
How the Boeing 737 Max Disaster Looks to a Software Developer
The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE. I…
I tweeted my thoughts and unrolled that thread here. Note, unlike Travis’ thoughts, mine are unencumbered by subject matter expertise. 😋
If you want to read more about this topic, check What can software organizations learn from the Boeing 737 MAX saga? by Phillip Johnston (link points to a particularly interesting section).
One might look at the Boeing disaster and think it’s a one-off case. I disagree, I think while the details are of course specific to this incident, the involved themes are ubiquitous in our economic system and because software eats the world that means we developers often play a role in these tragedies.
In an industry that relies more than anything on the appearance of total control, total safety, these two crashes pose as close to an existential risk as you can get.
Other industries that rely on that appearance: banking, medicine, energy production/distribution, …
Airlines […] loved [the Boeing 737] because of its simplicity, reliability, and flexibility. […] Over the years, market and technological forces pushed the 737 into ever-larger versions with increasing electronic and mechanical complexity. This is not, by any means, unique to the 737.
Ubiquitous story: Simplicity begets success begets complexity.
Most of those market and technical forces are on the side of economics, not safety.
After about 10% of the article, I already don’t know anymore whether we’re talking about
- physical safety of hardware
- e-safety of data collections
- privacy of big tech products
Anyway, these market forces demanded bigger engines (more fuel efficiency) without changing the plane too much (to avoid recertification and having to train pilots). That wasn’t easy, though, and here we come to the technical underpinnings.
The solution was to extend the engine up and well in front of the wing. However, doing so also meant that the centerline of the engine’s thrust changed. Now, when the pilots applied power to the engine, the aircraft would have a significant propensity to “pitch up,” or raise its nose.
In the 737 Max, the engine nacelles themselves can, at high angles of attack [i.e. high angle between the wings and the airflow, e.g. during launch], work as a wing and produce lift. And the lift they produce is well ahead of the wing’s center of lift, meaning the nacelles will cause the 737 Max at a high angle of attack to go to a higher angle of attack. This is aerodynamic malpractice of the worst kind. [emphasis mine]
- cause a problem
- fix with a hack
- deploy & work on next feature; roll D20:
- on 2–20, go to 1.
- on 1, you’re fucked
An airplane approaching an aerodynamic stall cannot, under any circumstances, have a tendency to go further into the stall. This is called “dynamic instability,” and the only airplanes that exhibit that characteristic — fighter jets — are also fitted with ejection seats.
That sounds a lot like “That’s not a good solution”, “But Netflix does it, too!”, but I admit that’s more of an association than a similarity.
Enter MCAS — but on the down low
Boeing’s solution to its hardware problem was software [called “MCAS”].
Let’s review what the MCAS does: It pushes the nose of the plane down when the system thinks the plane might exceed its angle-of-attack limits; it does so to avoid an aerodynamic stall.
[T]here’s the need to keep the very existence of the MCAS system on the hush-hush lest someone say, “Hey, this isn’t your father’s 737,” and bank accounts start to suffer.
😨😰😱 But why?!
If the 737 MAX isn’t built or doesn’t behave like other 737s, the FAA would require recertification as a new aircraft.
- that takes years and costs Boing $$$
- it makes airlines using Boeing’s planes wait long for a competitive plane ($$$)
- it requires pilot training and makes pilots less fungible (yep, more $$$)
Boeing and the airlines using their planes desperately want to avoid that, so they have a common goal: downplay MCAS and its effects on flying the plane.
Everything about the design and manufacture of the Max was done to preserve the myth that ‘it’s just a 737.’ Recertifying it as a new aircraft would have taken years and millions of dollars. In fact, the pilot licensed to fly the 737 in 1967 is still licensed to fly all subsequent versions of the 737.
Now we get to how fragile the software is:
In the 737 Max, only one of the flight management computers is active at a time — either the pilot’s computer or the copilot’s computer. And the active computer takes inputs only from the sensors on its own side of the aircraft.
Pilots can cross-check several instruments, including just looking out the window, but MCAS doesn’t do that.
This means that if a particular angle-of-attack sensor goes haywire — which happens all the time in a machine that alternates from one extreme environment to another, vibrating and shaking all the way — the flight management computer just believes it.
Err… Worse, it takes over the plane without giving the pilots a proper way to override it.
“Raise the nose, HAL.”
“I’m sorry, Dave, I’m afraid I can’t do that.”
(Yes, that’s a quote from the article.)
What the actual fuck?! 😵
So: the 737 Max’ aerodynamic design sucks. Code should fix it, but it sucks, too.
I do not know why those two basic aviation design considerations [multiple inputs, human intervention], bedrocks of a mind-set that has served the industry so well until now, were not part of the original MCAS design.
Regarding the reasons, Travis theorizes:
I believe the relative ease — not to mention the lack of tangible cost — of software updates has created a cultural laziness within the software engineering community. Moreover, because more and more of the hardware that we create is monitored and controlled by software, that cultural laziness is now creeping into hardware engineering — like building airliners.
Move fast and crash things?! Makes sense. But not only development was borked, so was QA (apparently called DER process in airplane construction):
And, when [the two considerations] were not [part of the design], I do not know or understand what part of the DER process failed to catch the fundamental design defect.”
Here, it crosses from stupidity to corruption/maliciousness. It’s possible that, technically speaking, no rules were broken, though:
The rules said you couldn’t have a large pitch-up on power change and that an employee of the manufacturer, a DER, could sign off on whatever you came up with to prevent a pitch change on power change. The rules didn’t say that the DER couldn’t take the business considerations into the decision-making process. And 346 people are dead.
Ouch. To the point.
Keep it simple
Travis closes with a remedy. Spoiler: it’s KISS.
[In the book ‘Normal Accidents: Living With High-Risk Technologies’] Perrow argues that system failure is a normal outcome in any system that is very complex and whose components are “tightly bound” — meaning that the behavior of one component immediately controls the behavior of another. Though such failures may seem to stem from one or another faulty part or practice, they must be seen as inherent in the system itself. They are “normal” failures.
Meaning, with a complex system, you must accept failures as the norm, not the exception. Applied to safety systems:
Every increment, every increase in complexity, ultimately leads to decreasing rates of return and, finally, to negative returns. Trying to patch and then repatch such a system in an attempt to make it safer can end up making it less safe.
And more specifically to MCAS:
It is likely that MCAS, originally added in the spirit of increasing safety, has now killed more people than it could have ever saved. It doesn’t need to be “fixed” with more complexity, more software. It needs to be removed altogether.
Lets hope this particular problem gets fixed without more loss of life. As I mentioned in the beginning, though, I think this is no special case but an instance of a greater pattern, and so I’m not holding my breath for such tragedies to be avoided in the future.
Facebook, Equifax, Volkswagen
Here’s my glib summary of the Boeing debacle: Boeing wanted to make $$$, so it developed unsafe hardware and then added bad software as a fix. People died. The problem is, that pattern repeats itself across various industries:
- Facebook wanted to make $$$, so it created transparent people and then half-assed some privacy settings as a fix. Elections were bought.
- Equifax wanted to make $$$, so it created transparent customers and then added bad security practices as a safeguard. Millions of identities were stolen.
- Volkswagen wanted to make $$$, so it created overmotorized cars and then added software to evade detection. People were defrauded, other people died (statistically speaking).
There are several problems at play here. The obvious one is that these companies not only prioritize revenue (which is their task), but do so to a point where they become unfit to incorporate other concerns. Facebook is probably the most egregious example as it has a privacy scandal about every year and is obviously not capable to prevent that. My opinion is that that’s not by accident, but by design: privacy runs counter to their financial interests.
You might say that the companies don’t really have a choice. If they don’t do it, their competitors will. Or, worse, their competitors already do but got lucky so far. Now we’re in the weeds. Maybe capitalism needs regulation (gasp!) to make sure unalienable rights to privacy, identity, health, and safety are not weighed against revenue?!
Another problem is software developers’ willful participation in these rights violations. Where are the devs standing up against the destruction of privacy and identity (by Facebook, Google, Equifax et al), against risking people’s health (Volkswagen is not the only corp doing this) and lives (Boeing). You can’t tell me that not a single one ever wondered how their software is gonna be used and what effect that has on the wider population. We’re often myopic and stupid, but not to that degree — no, people noticed, but decided not to speak up to keep their pay check.
And here’s me pointing fingers at myself: I give in-house trainings at every company that asks me. Even at Bayer, even when they were already in the process of buying Monsanto (I mean, talking about the frying pan and the fire…). I’m not proud of myself and should reconsider my priorities.
Firefox and Amazon
Wow, this newsletter got so long! I hope reading it was as much fun as writing it and that you’re still with me. But I’m not gonna press my luck much further and will leave you with TL;DR’s and links to what happened with Firefox and Amazon.
Firefox accidentally disabled all add-ons because the organization let the certificate expire that was used to sign them to mark them as being safe. Not signed by a valid certificate meant the browser considered add-ons unsafe, which meant it deactivated them. All of them. Oops. More:
Amazon deprecated path-style (aka V1) request URIs of the form
//s3.amazonaws.com/<bucketname>/key in favor of virtual-hosted style (aka V2) and wanted to stop supporting it after September 30th, 2020. The revised plan is to keep serving buckets created before that date and force V2 only for new buckets.
- Amazon S3 Path Deprecation Plan — The Rest of the Story
- discussion on Hacker News
- Amazon to Disable S3 Path-Style Access Used to Bypass Censorship presents an interesting angle re censorship
And that’s it for now. I hope you’re having a great week! 🌞