Tests and physical state: an analogy
DISCLAIMER: this is an ultra geeky, trippy post. Well, trippier…
Having tests that break all the time (vapor) is bad, but having tests that never break (solid) is usually wasteful. Tests should be fluid enough (liquid) to break on occasion, as a result of system evolution.
Dynamical systems and Chaos Theory
Mathematics, this distant cousin (or antecedent, take your pick) of IT, is a frequent provider of insights into ways to model software solutions. And sometimes, it can offer quite compelling analogies.
A particularly interesting area in that space is called Dynamical Systems and has been the plaything for founders of our field, such as von Neumann (if you don’t know him, check him out — dude was impressive). Oh, and it spawned the whole trendy line of study named Chaos Theory. Yeah, that talk about butterflies and stuff.
At the core of it (and pardon my oversimplification of the mathematics), is the idea that if you represent a system’s state as M and define a function f(M) → M’, that you can iterate over time. In other words, given a discrete state, there is a function that, applied to it, outputs the next step in system evolution.
Many interesting implementations exist out there. Edward Lorenz, for example, created a virtual, simplified simulation of atmospheric events. And discovered that small precision errors could lead the system to a completely different state within a few interactions. The mind blowing conclusion is that a small error in measurement in a chaotic system can lead to a completely inability to predict behavior over a medium/large time window. This is something that leads many to claim precise weather forecast over more than a few days is a no-go.
Another interesting case is John Conway’s Game of Life, where a matrix of dots can be either dead (0) or alive (1) and may change their state based on neighboring cell statuses. The animation below shows a bit of that system into action.
Dynamical systems and physical state
As dynamical systems go, folks came up with an interesting analogy in the physical state of matter:
- solid: the system evolves quite quickly to a static state and freezes there. In a Game Of life example, this would mean all cells quickly lock into a dead or status.
- liquid: in this state, the system will eventually reach equilibrium, but it will a) take a while and b) will be cyclic. These cases represent interesting behavior, as complex patterns emerge from the interaction of simple components.
- vapor: in this state small oscillations generate unpredictable behavior within a few steps. Bounded systems will still eventually reach some kind of equilibrium (as the state space is finite), but it will be such a long loop it will be almost indistinguishable from no stabilization at all.
Tests and physical state
So yeah, this is where we come back to talk about software development. The very same analogy can be made when talking about automated tests. And similar conclusions can be reached…
Ever had those cases where a test will fail on occasion, with a simple action such as re-running the build taking them back to green? Yeah, we all had it — and those flaky tests are vapor. And quite nasty at that.
The big problem behind these cases is that they encourage a dangerous, slippery slope: since test failure may not indicate a problem, people tend to be lenient about it. And this can easily drive teams to miss issues hidden by a broken test or, even worse, to start accumulating test failure, even for robust cases, in a classic broken window situation.
Seriously, if there is one thing to keep from this post is that teams should adopt a zero-tolerance policy for flaky tests.
At the end of the day, most of these frailties come from bad practices such as:
- depending on race conditions: here’s a classic example — you are testing a component with asynchronous behavior. An ever-recurring idea is “I know, let’s put a timer and wait for the action to process!”. The problem is, you will either need to specify a large timer (or bunches of them, in case you are testing a bunch of operations), which will slow the execution, or fine-tune it, which will open the risk of failures in some environments (“Hey, it works on my computer, so why does it fail on the build server?”). The solution to that is the implementation of any variation of an observer pattern: wait for the real async outcome to happen before moving forward.
- lots of shared state: happens both on socialized and isolated tests. On isolated cases, reliance on shared global values (static values or state hidden behind annotations in the Java world, for example) tends to complicate mocking. On socialized tests, reliance on shared, persisted data (a test checks the read of a specific record/document, another one updates it) leads to these situations. The solution is to either reset global state during test setup and/or to guarantee no mutable state is shared.
- reliance on weak assumptions: these are frequently one of the cases above, but also happen often with over reliance on external dependencies. Your tests may assume, for example, another service is always up and will always contain a fixed set of data your are based on. The solution to these cases is the creation of fake services for test execution.
“Wait, solid is good, right? There’s even an acronym that we’ve learned as a must-have!”. Well, SOLID principles are great, but if a test literally never, ever breaks, then a question is begging to be asked — what value does it provide?
The major downside of a stagnant test is that it is a performance detractor: each second you add to test execution is an age in today’s development sense of time and an encouragement to reduce its frequency.
Solid tests may be born this way: if they are exercising code without really validating its quality, they may be just checking availability, instead of full functionality. Increasing your asserts makes these cases more relevant and dynamic.
A more justifiable case is when a test freezes over time. This is usually an indication that their target (AKA subject under test — SUT) has matured into a stable (perhaps even stagnant) state. In such cases, a good approach it to perform an architectural refactoring, moving the SUT into a separate component/microservice.
This is the ideal, middle ground solution: eventual test breakage being an indicator that things are evolving and that yeah, eventually we screw up.
How do I detect these cases?
A pretty good question by now would be “hey, my team does a great job at not breaking the build — in this case, all of our tests would be considered solid”. Fair statement, but the point is: this is not something you should assess on a build server, but on local dev boxes.
This will probably require some level of manual analysis (unfortunately a common need these days), but with the advent of tools such as industrial logic’s TDD dashboard, one can hope to see a new generation of development metrics tools coming to our aid.