In this article I look at the relationship between TDD and complexity. I suggest that there is a correlation between the complexity of a system and the return we get from investing in tests. I also suggest that tests are often used to test code which is free of complexity and that the RoI of those tests is much lower. Finally I describe how we can use this knowledge to build test suites which give us the most value.
The primary value of TDD
There are many positive benefits of test driven development, but for me its primary benefit is regression testing. It is about writing code in such a way that when the requirements grow we can be confident that the code we have just changed supports the old requirements as well as the new ones.
In TDD circles there is a saying.. “As the tests get more specific, the code gets more generic”. At the beginning of the development cycle the tests and the production code look very similar. The test says ‘it should do this’ and the code says ‘do this’. It might feel like wasted effort.. Like writing the code twice.
We then add more test scenarios which execute the same code as before but with different expected outcomes. As we do this the tests and the production code start to diverge. They no longer mimic each other. The tests remain a list of expected behaviours which are very specific, but the code starts to become more general, making use of conditionals, loops, recursion, concatenation, mathematics and so on to get the job done. They start to look nothing like each other. An algorithm emerges.
When code grows like this that suite of tests really starts to come through for us. It’s difficult to reason about algorithms. It’s difficult to keep a mental map of all the expected outcomes and how our changes will affect them. And when we have a trusted suite of tests for our code, we can change and refactor it without fear because those tests are there to protect us. They also provide us with good documentation. Just when the code starts to get difficult to understand, they provide the extra clarity we need.
Consider the following example. Let’s say we’re writing software which converts numbers to english words. We might start test driving the algorithm with the case: 0 -> ‘zero’. That’s easy enough to get passing. We just return ‘zero’. The next obvious test case is 1 -> ‘one’. How do we get this to pass? We could write something like this:
return number == 0 ? “zero” : “one”;
Of course this won’t work for the case of 2 -> ‘two’, or 3 -> ‘three’. At some point we realise that rather than continuing to add conditionals, we need to generalise. We determine that the ‘easiest’ way to get these first cases passing is to build a dictionary. Our code becomes something like this..
Now I could just run this code and know it’s working, but anyway let’s crack on. What’s the next test case? The TDD process would suggest that we should implement all test cases for 0 up to 20 since each requires a new entry in the dictionary. Are those tests worth while? We’re just writing the result twice, once in the test, and once in the code. Why? To check our spelling? To me it feels like waste.
The next most interesting test case is 21 -> ‘twenty one’. This test case seems more worth writing since it shows that our code can recognise when it should be combining multiple word names together to form the larger numbers. The code branches at this point. We now have multiple paths of execution depending on the inputs. The tests that guide this kind of intelligence in the algorithm feel more worthy somehow.
I’d recommend you give this exercise a go yourself if you haven’t already. You’ll notice how those first few tests seem a little wasteful, but before the end you’ll have built up a suite of more helpful tests that you rely on. The algorithm is quite complex, and you won’t be able to keep all those paths in your head as you make changes. Even if you can you’ll never really know unless you manually go through each scenario. You’ll be glad you have those tests.
Why is it that there seems such a difference in value between those early tests and the later ones? The answer, I believe, is that the value increases with complexity. Cyclomatic complexity relates to the number of branches that are possible between when the code starts execution and when it finishes. It’s in the number of conditionals that are evaluated and the number of paths through the code, and is directly linked to the number of test cases for a particular operation. Those early tests were of code without any complexity at all. A single path of execution. We could just run it, and see it work. The later ones drove the need for branching and multiple paths of execution as the complexity increased.
The most apt analogy for TDD I have ever heard is that of double-entry book-keeping in financial accounting. Why does this practice have value? It has value because at any given time we need to know if there’s a discrepancy between the the two sides of the balance sheet. Each side contains a lot of information. Adding the numbers on each side and comparing the totals is a relatively quick process in comparison to matching up each of the appropriate debits and credits individually.
Imagine each side of the balance sheet contained only one number. Would the process have value? No. Imagine that each side had a thousand numbers but they were entered in exactly the same order, and were added on exactly the same row as their counterpart on the other side. Would you ever need to total the numbers? No. You wouldn’t even need two sheets. You would spot any problems as you entered the details.
Now imagine the real world example that there are thousands of numbers being added to each side of the balance sheet in completely random orders at different times. Varying numbers of out-going items for each incoming item. Now do you want the practice? Of course you do.
What’s the difference? It’s the scope of the work required to consolidate everything. It’s the same as having hundreds of paths through your code and having to manually test everything periodically. Whether it’s code with high complexity or a mass of numbers - it’s just too time consuming. The scope of the information you’re dealing with is too high to be able to simply see errors. You’re blind to what’s going on and you need a quick way to know if something’s going wrong. If it is, then you investigate. The cheaper the test is, the more frequently you can check, and the easier it will be to narrow down the source of the problem.
Complexity and Scope
There is a correlation between the complexity and the scope of the system under test. A single function might have a couple of paths of execution, which alone is not very many. However if that function calls another which has two or more paths, then the complexity of the higher level function increases. Some paths are mutually exclusive, but a lot of the time the paths combine.
You cannot remove complexity from a system by breaking that system down into smaller components. You can make it easier to manage, but the complexity is still there. The larger the scope, the higher the complexity. The higher the complexity, the more value we find from the tests.
Designing test suites with knowledge of scope and complexity also has interesting benefits. Many who are new to TDD make the mistake of writing a new suite of tests for every class in their software, and then find it difficult to refactor their designs. The tests, being coupled to each class, lock-in those classes and the overall design. If you later realise a different arrangement of classes would be preferable, you’re out of luck. The tests won’t help you make that change. In fact they’ll make it more difficult to change than if you had no tests in the first place. The scope of these tests was too small.
The other problem with shallow tests like these is that each time we test a component in isolation we make assumptions about the way in which classes should work with each other. The problem there is that if those assumptions are wrong, the tests will pass, and the system will be broken.
Once we learn about the shallow test problem we correct our approach, and write tests which call higher level code. We also allow the execution of collaborating and lower level classes in each test. We increase the scope of the tests.
Some have seen this solution to the shallow test problem and suggested that the ideal scope for our tests to target is the user interface, and to execute all the code in each test including framework and external components like the UI and the database.
Unfortunately the RoI of system testing is too low to do much of this. They’re slow, they’re brittle, they require careful environment setup and teardown, and they’re much more difficult to write and maintain than unit tests. Moreover, testing through the UI for every scenario in your code is just unnecessary. Things like button click events and sql queries are tested fully in the first few system tests. Running through that part of the system for subsequent tests is just waste.
Regardless of whether we’re writing system tests or unit tests, eventually, as the scope increases, the test set-up becomes difficult to manage. It becomes more and more difficult to set up the state and inputs required to execute new branch points which are buried deep in the code.
The trick of course is to find a balance between testing at too high a level, and too low a level. We start high and move in to smaller units only when we need to, and only once we’re sure of the design of that smaller component. In other words we should write tests only against architecturally significant boundaries within our software. Boundaries which we are confident will exist for some time, and which cover a suitable scope.
Over the last year or so I have been using the thoughts above as a heuristic to determine, if, when and how I should be writing my tests. When I start the TDD process I follow the 3 rules. Always! But I don’t usually start test driving my code until I see a conditional or two. Even then, if those paths result in a UI interaction, I will usually prefer either manual or automated system testing over unit testing. I find mocking external systems results in tests which ultimately prove nothing, and which lead to design damage, so I like to avoid them.
I have noticed many benefits since taking this approach..
I write less tests, and the tests I do write are easier to read and so much easier to write without all those mocks. I have the opportunity to write unit tests which by-pass much of the boilerplate code which is used to communicate with the framework, or the database, and focus on scenarios and logic. I find that my tests are not coupled to frameworks or libraries any more since I never leave any logic in framework modules (like MVC controllers). I find that, since I’m testing at a much more suitable level of granularity, I’m no longer doing dependency inversion just for the sake of testing. My components are much more cohesive. And of course, I’m finding a significant boost in speed of development.
More importantly I think is that I do not seem to be experiencing any negative impacts.. I do no more debugging than I used to do, and don’t find regressions any more frequently than before. I still have confidence in my changes, and actually, most of the changes I make seem much easier because I’m not re-writing my tests as my design changes.
I only started experimenting with this after watching Gary Bernhardt’s Boundaries talk. I found it inspiring and I highly recommend you watch it.