Today I want to talk about what is probably the most misunderstood concept in DevOps terminology : the feedback loops.
As defined originally in “The Phoenix Project” and further detailed in “The DevOps Handbook” — amplifying feedback loops is “the second way of DevOps”. Gene Kim explains in this post that “The Second Way is about creating the right to left feedback loops. The goal of almost any process improvement initiative is to shorten and amplify feedback loops so necessary corrections can be continually made.”
Process improvement is exactly what I’ve been doing for the last decade of my career. And what I’ve noticed is that whenever I start talking to teams about feedback loops I get 4 types of misunderstanding:
- Alert and notification systems are mistaken for feedback loops
- The importance of feedback loops is completely overlooked
- Only the feedback part is noticed, while the loop part of the concept is overlooked.
- Feedback loops are seen as necessarily good
I believe I’ve always been a systems thinker. Even before I learned what systems thinking is. As a matter of fact — DevOps starts with systems thinking. It is the first lense in Deming’s “System of Profound Knowledge”, and it is also the first way of DevOps according to Gene Kim.
And the thing is — feedback loops are one of the basic concepts of systems thinking. So if one fails to understand feedback loops — their chance of getting DevOps right are close to none. Therefore — it is of crucial importance to explain what the loops are and how they influence your organization’s ability to deliver value with speed and quality.
What are feedback loops?
Let’s start with a very basic definition: feedback loops are sets of relationships between entities whereas a change in one entity causes a change in another entity and that change eventually leads to a change in the first entity. A classic example of a feedback loop is compound interest: a change in the principal causes a change in interest, which in its turn affects the principal. And the loop goes on and on.
There are Two Types of Loops
In fact there are 2 basic types of feedback loops. The loop in our compound interest example is a so called ‘reinforcing loop’, also known as ‘amplifying loop’. The larger the principal sum, the higher the resulting interest, which leads to a higher increase in the principal, leading to a still higher interest. That’s how the rich get richer.
Note this also works in the other direction — if the principal sum shrinks (because I’m burnt out, need rest and love traveling — so I start withdrawing my savings) — the effective interest becomes lower which slows down the growth of the principal.
In this example my burnout combined with my love of travel lead to the second kind of a causal relationship — the balancing feedback loop. The more money I withdraw — the slower the principal sum grows, the slower it grows — the less money is left, until I have no more savings, can’t withdraw anymore, have to stop travelling and start working again.
So more withdrawals today lead to less, or no withdrawals in the future.
To summarize: there are 2 types of feedback loops:
Reinforcing feedback loop
Increased value of some variable in the first entity causes an increase in the second entity which causes the first entity to increase even more. We can see this as a loop of accelerating change. It’s important to note that such change can be beneficial (as in the compound interest example) but also destructive and destabilizing.
More leads to more -> leads to ever more. Less leads to less -> leads to ever less.
Balancing Feedback Loop
Increased value in the first entity leads to a decrease in the second entity and that leads back to a decrease in the first entity. This type of loop causes the system to stabilize at a point where there can be no more change in both entities. I.e — I can’t withdraw any more money, because my savings account is empty.
More leads to less -> leads to less -> eventually leads to balance. (Hence the name)
Notification Systems vs. Feedback Loops
The first type of misunderstanding is the most obvious one. When asked about feedback loops we start listing various types of notifications our systems send: “Developers get an email on every failed build.”, “We get an alert when free disk space goes below 80%”. And we also usually acknowledge our alert system needs improvement. It always needs improvement! Getting the correct signal to noise ratio for effective alerting is more of an art than science. (As a side note — if you’re looking for some tips on improving your alerting systems — this document by Rob Ewaschuk is a great start).
Of course this focus on alerting is totally understandable. Software systems are both invisible and increasingly complex. Small errors going unnoticed lead to catastrophic failures in the future. Moreover — emotionally it’s much easier to get negative feedback from a machine than from another human. So it’s quite natural we want to get alerted on the smallest issue before it comes to the attention of our users or downstream work departments.
But we must realize two things:
- The most important feedback loops in our organization aren’t happening within our systems. So we usually don’t get any automated alerts on issues in them.
- Alerts and notifications are just pieces of information. This information may or may not lead to feedback loop generation (or amplification). It’s what we, humans, do with this information that’s important.
Feedback Loops? We have more important stuff to think about!
One of the greatest (and arguably the most prominent) mistakes of any DevOps initiative is not assessing the current situation. This assessment should always happen before you start talking about specific tools, practices or architectural changes. And it should definitely involve value stream mapping exercises. It’s only by understanding our flow of work and the actual bottlenecks that we can arrive at effective solutions.
When mapping the value stream we frequently see 2 mistakes:
- the flow is seen as always going only downstream
- the bottlenecks are seen as properties of specific work stations
That is — the feedback loops and their balancing or accelerating effects are initially ignored. “We need to push the work through! Our IT team isn’t responsive enough! Let’s hire more IT folks! “Bollocks! Let’s move to the cloud and fire all IT folks”
It’s only by careful questioning and re-questioning that the subtleties of interactions between different work stations come into light and feedback loops become visible.
Focusing on Feedback, Ignoring the Loops
So we’ve got our flow of work automated, we have alerts set up, our signal to noise ratio is close to perfection. We’re “bringing the pain forward”, as Humble and Farley have taught us.
And we’re even acting on the pain — we stop all we’re doing to fix the issues we’re alerted on, to make sure no defects seep through downstream. Feedback is happening! But is this a loop? We’ve learnt there are 2 types of loops. So which type is that? Is this loop accelerating our speed of development or just balancing the amount of defects?
What I’ve observed more than once is the following pattern — folks get continuous integration in place, they build, test and deploy on every commit. Suddenly a lot more builds or deployments start failing. But instead of researching the root cause (which could be anything, from badly written tests, to unreliable infrastructure, to non-existing refactoring practices) one of the following happens:
- Developers lower the frequency of commits. Commit batches grow in size, the amount of feedback gets reduced, disasters get delayed, releases stay as painful as before,if not worse.
- Tests get turned off. “They always fail. We need to ship now, we’ll look at the tests later”
- CI gets turned off. “We don’t really need to build/deploy on every commit. Once per day is good enough.”
So we’ve enabled feedback, but ignored the fact that the resulting loop is a balancing one. It causes our system to stabilize at the same point it was before. We may have achieved more visibility but it led to us being slower and less agile. Which makes the whole effort of establishing CI/CD a wasted investment.
We’ll Amplify Feedback Loops To Streamline Delivery
This misunderstanding has to do with the definition of the Second Way itself. It totally omits the fact that not all feedback loops are good!
In the previous paragraph we saw how a feedback loop resulting from automated alerts can make your continuous integration ineffective or even kill it altogether. But even worse are the feedback loops that don’t originate in our CI system. Because they are often much less obvious and more ingrained into the organizational culture.
As an example take the following situation:
A team initiates an attempt at implementing TDD. In TDD you write tests first so they initially always fail. The amount of red on development dashboards goes up. The R&D manager is measured by the amount of green builds, so the team gets brought in for questioning. They have to explain why they need this and how this will benefit the product in the long run. Somewhat reluctantly the manager lets them continue trying this for another month.
More effort gets invested in tests and the team needs to get proficient at test-writing. All this takes time, so the productivity drops. Most probably — a temporary drop, but the management has already committed to a release timeline. So the team leader gets a clear message — that month we promised you — it will have to happen later. Let’s release the version first.
Most probably this happens again and again — every time the release schedule wins over professional mastery and the striving for quality. Until all the teams realize quality is not a priority and stop trying. More quality leads to less productivity (in the short run) which leads to less quality.
A balancing feedback loop has just killed your attempts to build quality into the project.
Effective Feedback Loops
We’ve looked at 4 types of typical misunderstandings regarding the role of feedback loops in DevOps. We’ve seen that not all feedback loops are good. But how do we understand them correctly? How do we make sure we amplify the correct ones — those that give us more quality, more speed and agility. And how do we identify and neutralize the bad ones — those that keep us back, those that interfere with the flow?
Here are some recommendations that we’ve found working fine:
- Make sure all your automated notifications are:
- Acted upon
If notifications are sent but no change in process occurs — you need to review the notifications. Either there are too many of them, or they don’t convey sufficient useful information or they aren’t addressed correctly.
2. Uncover already existing feedback loops by conducting regular value stream mapping exercises and dialogue sessions where you discuss what you currently do instead of what needs to be done.
3. Check if your feedback system is actually creating positive change or just stabilizing the status quo and preventing disasters.
Some systems need stability more than they need change. If your systems are unstable — focus on amplifying the balancing loops. Until you’ve reached the desired stability. As an example — if your delivery pipeline is constantly failing because of unreliable scripts, insufficient disk space or flaky network — definitely — add alerts for all of these and fix the issues uncompromisingly.
But once stability is achieved — start looking at implementing reinforcing loops. E.g — building a staging pipeline each time there’s an infrastructure change. This reinforces managing infra as code and making a lot of small modifications instead of going for large scale, big bang migrations.
4. Focus on human communication-based loops rather than on automated ones.
Some good questions to ask:
- Is there something that works fine even without automation? A good example of this can be pair coding practices or code review procedures that actually lead to better code without any automation. This is a great reinforcing feedback loop for code quality. Maybe it can be enhanced even more?
- Are there tasks stuck in backlog or in postponed state for ages? What is holding them there? There’s a good chance that an evil balancing loop is in action.
Feedback loops are in fact a powerful tool for analyzing and optimizing the processes of software delivery. But as any tool — it’s only as good as our ability to use it.
And if you’d like to really understand how to apply systems thinking to DevOps (or anything else) — you should probably start from the source — Dana Meadows’ primer.
Originally published on Otomato blog at https://otomato.link