The Chernobyl miniseries on HBO was gripping. It was an account of one of the worst disasters in the history of humankind. The series dealt with themes common to people in all walks of life. Yet, there was something about this portrayal that resonates with those in the software engineering community. Parallels from our professional experience hitch themselves to characters or situations in the series. Thus, while I have ‘Chernobyl’ fresh in my mind, I decided to organize my observations in blog posts.
Out of respect to the historicity of the people touched by the events, I must tread carefully. The events that began in Pripyat during the spring of 1986, that are continuing even to this day, are real. The lives lost are real. The tragedy is real. The lessons are also real. To capture these realities I put two simple ground rules in place for my posts on ‘Chernobyl’:
- Don’t oversimplify. Applying lessons from historical events to modern circumstances, despite the best intentions of the author, can stretch credulity. Systems thinking tells us that there is rarely a simple explanation for any failure. Often times its a combination of people, process, and technology. This is no different.
- Don’t disrespect. There is too much tragedy tied into this story to be anything but respectful. Even extracting parallels to software engineering puts the comparisons on shaky ground. But thoughtful sobriety can mine important professional lessons, even be they niche. We must be careful to not drive into the ditch of triviality.
The most dangerous management is that which is formerly technical. — Bryan Cantrill
The above quote materialized in my mind during the final episode of ‘Chernobyl’. The actions of Anatoly Dyatlov reverberated that sentiment. Dyatlov was the deputy chief engineer at the Chernobyl Nuclear Power Plant. The fifth episode opens with Dyatlov, Nikolai Fomin (the chief engineer at Chernobyl) and Viktor Bryukanov (the manager of the entire Chernobyl plant) discussing a planned test of reactor four.
Time management considerations saturated the meeting. For reasons not completely clear, there was time pressure to run a successful test on reactor four. (I’ll write more about this in a later post, but it appears that this test should have been completed years earlier before the reactor went live.) There were also scheduling considerations to ensure that Soviet factories “hit their numbers.” The reactors in Pripyat supplied a significant amount of power for the region of Kiev. A dip in power output would impact the critical end of month push. Finally, the scheduling of the test, whether during day or night shift, comes into play.
The result of the meeting was the worst of all possible outcomes. An incompetent, disengaged management trio determined the fate of a critical infrastructure test. Sound familiar? Do you have similar experiences in the software engineering world? Consider the test as a deployment; they were shipping a big release of the product. The go/no-go decision did not consider the input of the engineers working in the plant. Recall the nightshift’s surprise when Dyatlov shows up for the test that evening. Why is the boss coming in for the graveyard shift? Many plant employees did not even know the test was taking place — even after the explosion occurred.
Dyatlov’s admonitions during the test carry with them the sting of time management gone bad. Dyatlov berated plant engineers with accusations of procrastination. Over and over again and to different engineers we hear the same word mixed in with other insults. We see the urgency of arbitrary deadlines being top of mind for Dyatlov. He played the hero in the meeting earlier in the day by promising to personally see to the test’s success. Now he drives an unprepared team to disaster.
Arrogance is the root of nearly all date-driven hysteria. It is why Cantrill said formerly technical management is the most dangerous. Dyatlov lost empathy for the engineering team he managed. He lost appreciation for the complexity and severity of the systems he managed. Recklessness weaved into decisions leading to the Chernobyl disaster could have only been the product of someone with significant experience in the field.
Do our own experiences in software engineering reflect this? At its best, technical leadership balances the tensions of people management, business objectives (deadlines), and technical considerations by staying engaged in all three. At its worst, it neglects the hard work of remaining competent and allows the vacuum once occupied by technical proficiency to fill with distorted, bygone hubris. The result is a decision making capability that is far worse off than if someone with much less experience approached the same situation. It wasn’t the engineers (Dyatlov or Fomin) during the scheduling meeting who question the tests propriety. It was Bryukhanov, the plant manager and least technical, who calls out the concern of the RBMK reactor being unstable at low power output.
System complexity is a catalyst for destructive arrogance. There have been many times in my career when I observed a system failure that surprised me by its novelty and dismayed me by its magnitude. To quote Rumsfeld, these failures are the “unknown, unknowns.” Building and operating complex systems requires a recognition of our own limitations. We must rest on tried-and-true design principles. Humility is necessary, the very opposite of arrogance.
Examining Dyatlov’s history, one can infer something of the root of this arrogance. This wasn’t the first radioactive accident he was part of. The HBO show’s creator, Craig Mazin, learned during his research that some believed Dyatlov was delusional in thinking he had “mastered radiation.” As if radiation personified, took its best shot at the engineer and left him still standing. His sensitivity to the danger was skewed like the 3.6 rating taken from the under-calibrated dosimeter in the initial moments after the accident.
Obviously, peril birthing from confidence in the control room at Chernobyl would far exceed anything we experience in our own professional history. However, we have all been the victim of our own pretension. We have all underestimated the risk and overestimated our system knowledge. The end result is bugs and production incidents.
Since arrogance is ingrained in the human condition is there anything we can do to guard ourselves and our organizations? My takeaway is we should remember the impact of culture. The most compelling time pressures alluded to in the scheduling meeting were blunt and implied instead of direct and explicit. We see the three men in plant leadership respond to what their leadership chain prioritized. No one was brash enough to spell out metrics meant more than lives. Culture permeated the fuzzy gray area that occupied the space of the “judgment call” in their decision making. Historical precedence and the murmuring values of the civil bureaucrats applied a numbing pressure in the direction of expediency. This game is one of inches. Leaders must apply pressure backward in the direction of quality and humility. Values cure like concrete from years of reinforcing. Expect change to be slow.
Hints during the scheduling meeting indicated Dyatlov was expecting a promotion. A successful test was a means to an end. Dyatlov wanted Fomin’s job; out of the plant and into an office. Fomin wanted Bryukhanov’s job; moving from engineer to manager. Bryukhanov wanted to ascend even higher into the Soviet bureaucracy. There is nothing wrong with desiring a promotion. Yet these promotions share a trending toward disengagement. If you are in engineering leadership, stay engaged. Stay engaged technically and stay engaged personally. Technical engagement will keep you humble. It will remind you daily on how difficult engineering can be. Personal engagement will help you remain empathetic to your team. It will help you assume best intentions instead of the treating the team members as means to an end.