Systems blindness and how we deal with it
In what must have been an impulse purchase, I bought a paperback copy of Daniel Goleman’s book “Focus: The Hidden Driver of Excellence” a couple of weeks ago. I’m still only halfway through this hard-to-follow mishmash of ideas, all supposedly related to the overall theme of focus in a distracted world. There’s this one chapter in the book, however, that stands out to me. The chapter is titled “System Blindness” and it provides enough insights — food for thought — that I don’t regret reading the book. Here’s what I learned.
Mental models and outages
Systems are invisible to our eyes. We try to understand them indirectly through mental models and then perform actions based on these models.
As I wrote before, simplicity is a prerequisite for reliability. The more complex a system, the more difficult it is to build a mental model of the system, and the harder it becomes to operate and debug it.
As a matter of fact, the majority of outages are self-inflicted. We’re thinking about a change we’re going to make, but we don’t necessarily anticipate the negative consequences it might have on the system as a whole. We push a bad configuration or deploy a buggy Docker image and all of a sudden the website goes down. It has happened to all of us. I, for one, have certainly caused my fair share of outages.
For lack of a better word, I always used to refer to “unintended consequences” whenever I talked about these unexpected drawbacks that go along with complexity.
Then came Goleman’s book, which introduced me to the term systems blindness.
What is systems blindness?
Systems blindness is the main thing we struggle in our work. What we think of as “side effects” are misnamed. […] In a system there are no side effects — just effects, anticipated or not. What we see as “side effects” simply reflect our flawed understanding of the system. In a complex system […] cause and effect may be more distant in time and space than we realize.
According to Goleman, one of the worst results of systems blindness occurs when we implement a strategy to fix a problem but ignore the involved system dynamics. Many problems are, unfortunately, too macro or micro for us to notice directly.
One example he gives is that of building more and wider roads to avoid traffic jams, which will eventually lead to even more traffic as people will take advantage of the better travel connections and move further away from urban areas. Another example is global warming. Energy and climate are a system. Everything that we’re doing is part of the healing of that system. It’s a systemic problem; climate meetings and agreements can only do so much.
Likewise, everything that we’re doing affects the success and failure of our software systems. When something doesn’t work as expected, I want you to remember this lesson: There are no side effects, just effects that result from our flawed understanding of the system.
Illusion of explanatory depth
Another phenomenon related to systems blindness is the “illusion of explanatory depth”.
We often believe we understand how something works when in reality our understanding is superficial at best. In our industry, this illusion becomes apparent when trying to explain in depth how technology X works, where X might be: Kubernetes, the TCP/IP stack, Linux syscalls, AES encryption, consensus over Raft, the list goes on and on.
And even if someone has comprehensive knowledge of certain technologies, there’s still the challenge of grasping the dynamics — the feedback loops — of the larger system in which they’re embedded.
Distributed systems are hard for a reason.
Patterns and rules
To some degree, biology is to blame for our imperfect systems understanding:
[In contrast to self-awareness and empathy], there seems to be no dedicated network or circuitry in the brain that gives us a natural inclination toward systems understanding. We learn how to read and navigate systems through the remarkable general learning talents of the neocortex.
In other words, systems thinking — the cure for systems blindness — is a skill that must be learned, just like reading or programming. (Goleman notes that computer games can teach us how to experiment with complex systems.)
At the same time, we humans excel at “detecting and mapping the patterns and order that lie hidden within the chaos of the natural world” because our very survival depends on it.
We live within extremely complex systems, but engage them lacking the cognitive capacity to understand or manage them completely. Our brain has solved this problem by finding means to sort through what’s complicated via simple decision rules [e.g. trusting other people]
Our built-in pattern detector is able to simplify complexity into manageable decision rules.
This is a major reason why we, especially in IT, are obsessed with data (Big Data, anyone?). We strive to make the workings of our production systems visible by gathering and curating enough data points, like metrics and logs, that the dynamics of these systems become palpable. Armed with the myriad of supporting tools available today, we look for meaningful patterns within that data — knowing where to focus in a system is key — and take actions based on these patterns.
It is this observability that, in the absence of perfect understanding, helps us deal with the complexity of our systems.
I would like to thank Goleman for making me realize this connection.
P.S. This article first appeared on my Production Ready mailing list.