Boundaries of failure — Rasmussen’s model of how accidents happen.
Systems always operate at capacity, but how do we avoid pushing them too far?
The law of systems development is that every system operates at its capacity and that people are always looking to get more out of it than was previously thought possible (Cook). Whilst this has done wonders in progressing human development beyond the limits our ancestors would ever have thought was possible, it inevitably comes with a down side. Accidents occur as systems are stretched beyond their operating capacity, sometimes with disastrous results.
The negative impacts of always stretching and stressing a system and running it at its capacity is highlighted in Lean thinking, Little’s Law and Theory of Constraints which demonstrate that a resilient system is one that has some slack built into it. When inevitably something unexpected occurs there is enough buffer or capability in other parts of the system to be able to cover the exposure. When you have a system that is tightly coupled with interactive complexity, then your process can lose control (further reading Charles Perot’s normal accident theory)
Rasmussen developed a model that helps conceptualise this. It has three boundaries, an economic, workload and acceptable performance boundary.
The economic boundary describes the operating envelope in which a business is profitable. If the lights are on the you are inside the economic boundary but there is always pressure to cut costs and do more here.
The performance boundary reflects the human capacity in the system. Human tendency is always to minimise the amount of work we want to do — this is taking shortcuts. Often this is required purely to survive as there is always more work than can be completed.
Both of these push the operating point towards the boundary of acceptable performance. This might be a personal or process safety boundary; or equipment performance.
If you cross the boundary of acceptable performance then the natural response is to push back, often aggressively against the other boundaries. Rules are brought in, workers are told to work harder and not take shortcuts. Lots of resources are thrown at the problem what ever the cost in the short term meaning the business could start losing money.
An important additional boundary on the model is a marginal boundary inside the boundary of acceptable performance. This serves as a guide or alarm when flirting too closely with disaster. The safety margin helps provide slack in the system, and the larger the margin the lower the probability of an accident occurring. These margins are in the form of concrete, easy to measure KPI’s such as speed limits, acting as a proxy for the accident boundary.
Margins however impose a cost as they artificially constrain a system. Over time the tendency is to challenge and push against the margin. Frequently nothing bad happens. So the marginal boundary is challenged again and again and as long as nothing bad occurs, we feel we can move the margin permanently. Importantly, as accidents are not happening we don’t know how far from the accident boundary we are until inevitably, a sequence of low probability events line up to push the system across the accident boundary and an accident occurs.
In the video below, Richard Cook presents a terrific summary of this concept.
Workplaces rely on an intimate knowledge of accidents and failure to actually know where the safe operating boundary lies. If you don’t have accidents you don’t know where this is and inevitably you will push the system until you find the point of failure. As organisations turn over people, especially in leadership roles, it is very hard for conservative boundaries to be maintained, even if there are strong historical precedents for their location. Given strong incentives (especially to increase profitability) leaders will be able to rationalise why this time will be different.
Recognizing hazards and successfully manipulating system operations to remain inside the tolerable performance boundaries requires intimate contact with failure... In intrinsically hazardous systems, operators are expected to encounter and appreciate hazards in ways that lead to overall performance that is desirable. Improved safety depends on providing operators with calibrated views of the hazards. It also depends on providing calibration about how their actions move system performance towards or away from the edge of the envelope. (Cook)
With hindsight the boundary becomes obvious but it is important to recognise that this boundary is not at all obvious before failure occurs so investigations need to realise this hindsight bias.
- Richard Cook — How complex systems fail
- Velocity NY 2013: Richard Cook, “Resilience In Complex Adaptive Systems”
- RasmussenJ_Risk Management in a dynamic society
- Sidney Decker — Drifting into Failure (And youtube summary here… )
- Sinet — Organisational Accidents and Resilient Organisations — Six perspectives
Let me know what you think? I’d love your feedback. If you haven’t already then sign up for a weekly dose just like this.
Get in touch… — linktr.ee/Tomconnor
More like this from 10x Curiosity
- Worst to Best — Lessons from NUMMI — The story of how the Toyota Production system turned around the worst car plant in North America
- Just because… How are you helping to develop people around you?
- The Flywheel Effect
Exploring the power of simple reinforcing loops executed over time
- Positive Deviance and “Bright Spot” Analysis — When solving complex problems, it sometimes pays to start with what is working rather than figure out what is not…
- Looking in the rear view mirror… — Are you aware of the hindsight bias you are applying to your reaction to events that happen in life?