Zone’s head of .NET development, Andy Butland, looks at how the issues of unit testing play out in a software development agency environment…
Unit testing — the method of writing automated, code-based, fast and reliable tests for software — is a technique we make comprehensive use of in .Net development at Zone, to ensure the correctness of the code we write and to protect against possible regression issues caused by code we’ll write in the future. Most check-ins of new features will come with a set of tests attached and it’s a focus of code reviews to ensure that they’ve been created and updated where appropriate.
Unless specific steps are taken to publish metrics on the number of tests and code coverage percentages, the unit test suite for a solution can be something that goes relatively unnoticed to the wider team, as it’s generally only visible on developers’ machines and continuous integration servers. Its absence will almost certainly be noticed indirectly though, through increased numbers of failures in manual QA testing and regression issues.
Despite the approach being considered all round as demonstrably a “good thing”, I’ve realised in discussions with colleagues and my own use of the technique that there remains a fair bit of nuance around testing in practice, and I’ve been writing about this on my sporadically populated personal blog in a series on “practical decisions in testing”. You can follow the link here if interested in reading further, but in this article, I’ll be summarising those posts, and reflecting particularly on how these issues play out in a software development agency environment.
Trading off value with difficulty
A first, and perhaps obvious, point to consider is that the unit tests don’t come for free. They take some effort to write, refactor and maintain as amends to the production code are made. While I strongly believe they should be considered simply part of the craft of professional software development, and as such we should (and do) avoid asking stakeholders if they’d “like tests with that”, we also can’t avoid the fact that the effort and value of testing varies in different contexts.
This diagram, specifically considering unit testing but also applying at higher levels of testing, shows how we might ensure we focus effort in the most valuable areas. The horizontal axis shows the value of the tests and the vertical how difficult it is to write the tests:
Tests around code that’s complex and encapsulates business logic would be considered high value, while testing very trivial code won’t add so much benefit, if any. Code that’s easy to test will have little or no dependencies that may require mocking or stubbing to ensure our tests are fast and reliable. Sometimes code can be particularly difficult to test, often when tied to platform components that prove troublesome or even impossible to treat in this way.
We can then consider four quadrants:
- Bottom right — easy to test and high value. This is the sweet-spot where we should be looking to focus and get close to 100% code coverage.
- Top right — high value, but the effort in testing is higher. We should aim to do the best we can here, but sometimes we might run into something that just proves too difficult to test to justify the effort.
- Bottom left — easy to test but not much value. We still might look to add tests here for many cases, unless the value is almost zero, given it’s easy to do and so doesn’t take much effort.
- Top left — difficult to test and not much value. Here’s where we’ll justifiably spend the least focus.
An important additional point though is that the four quadrants aren’t static. Over time, based on choices we make in architecting the solution, there are two directions where we have influence — shown on the diagram with the large blue arrows.
- We can improve code coverage by pushing up along on the line of the diagonal. Areas of code that sit just over the border, in the white area above the diagonal line, are where we’ll get the most value for our effort.
- We can increase the testability of our code-base, making the parts that are difficult to test, less so. Even when tied to platforms that aren’t making testing easy in places, we can still use techniques like introduce wrapping classes, and drawing out much of the logic that is platform agnostic into more testable constructs.
Code Coverage Metrics
Via the use of tools that analyse the code of a solution, metrics can be calculated that give an idea of the “health” of a code-base from an ongoing maintenance perspective. One of these is test code coverage, calculating the percentage of lines of code, when you consider all the branching and routes through the program, that are covered by an automated unit test.
A healthy metric doesn’t completely diagnose a healthy code base but, all things being equal, it’s a good sign. The most important thing, at least for any application that’s planned to be around for some time, is the trend, with the line either moving up, or with an existing high value being maintained.
We might ask the question though; shouldn’t this value be 100%?
In the most straightforward sense of this question, I’d argue no — as illustrated in the previous section, there’s clearly areas of the code, property “getters and setters” being the classic example, that you could unit test, but wouldn’t get much if any value in doing so, and hence the effort is unlikely to be worthwhile. Similarly, there may be areas where the difficulty of testing makes the cost of doing so prohibitively high.
There’s a more subtle point here though too, in that as developers we have the ability to mark certain areas of the code as removed from consideration from the code coverage metrics. Where we know we simply aren’t going to write a test, we can apply an attribute to mark that class or function as off-limits to the analyser.
However, when deciding what code should be covered by unit tests, the decision isn’t black and white. Some, probably most, code would be better coupled with tests and so the tests should be written along with the feature. On the other hand, as mentioned, testing some code clearly gives little value, and hence this can immediately be marked for removal from analysis (ideally with a comment justifying the decision).
The grey area in between though, is code that is hard to test, and there’s some, but not so much, value in doing so — the upper left quadrant as shown in the diagram above. In these cases, developers and stakeholders might well decide there are more important things to be spending time on right now — such as new feature development or paying off other technical debt. As time goes by though, that cost/benefit trade-off may shift, as will the set of competing priorities. Given that, it seems best to keep the decision on “should we add tests for this” open, not closing it off by excluding the code from coverage, but rather keeping it under review and reconsider doing so, when time and resources may allow.
Considering a Unit
One of the potential costs of testing, above and beyond the immediate effort of writing them, is maintaining them moving forward. If a set of tests are closely tied to a given class structure, there may be quite a task involved in rewriting them to match a planned refactoring. And if the effort in doing so actually deters or prevents a team from making this refactoring, that otherwise they would want to do, then there’s a problem — the tests are actually getting in the way of maintaining a healthy code-base.
A way to avoid this is to give careful consideration to what we define as a “unit”.
For many developers, it’s the class, and the mark of good coverage is to ensure that all public methods — and via them, the private ones — are covered by tests. In doing so, there will certainly be a well-tested code base.
There can be a downside discovered in this though, in that, as described, it serves as a deterrent for refactoring at the inter-class level.
I got to thinking about this having watched a talk shared by a colleague, given by Ian Cooper. In that he promotes considering unit as something that might be higher level than a single class. In doing so, you keep the same level of code coverage but do so by testing the “outer” public methods of the class aggregation, retaining your ability to quickly refactor without so much burden in updating tests.
To illustrate, say you have a class “A”, with various public methods that are tested. “A” may well have various dependencies, which in the case of those that are either brittle or slow, are mocked.
On reflection, we decide class “A” is too big and an extract class refactoring looks useful, creating class “B”, which then becomes a dependency of “A”. “A”’s methods that use the functionality of “B”, now make calls to it via a level of indirection. And as we had good test coverage on class “A”, we’ve been able to do this refactoring with the backup of the tests, thus being confident we haven’t introduced any new issues.
Following the “unit as a class” approach, we might then look to write tests for “B”, but actually, we don’t need to do this. Class “A” is considered the public API that needs testing, whereas class “B” is an implementation detail, to be considered from a testing perspective no different from private methods in “A”. Should we refactor further — perhaps collapsing “B” back into “A”, or splitting its responsibilities further — there’s again no need for changes to tests.
Similarly, there’s no need to create an interface for “B” and mock it when testing “A” — we can, and should, just call into its methods.
Essentially we’ve expanded the idea of “a unit = a class (with all dependencies mocked)” into a unit being a “unit of runnable code without slow/brittle dependencies”.
That’s not quite the end of it though, as whilst it’s justified and actually beneficial to avoid writing unit tests for class B’s public methods, depending on how this class gets used in the future, that may not always be the case. Let’s say for example class “C” is developed and also takes a dependency on “B”, calling into its methods. Class “C” has public methods that are unit tested, so we’re still all good from a code coverage perspective. We can even extend “B”, adding some feature that “C” needs, and with the tests on “A” and “C” we remain fully covered.
There comes a point though when the importance of “B” to the code-base has increased beyond it’s original “helper” status. With more classes referencing and depending on it, it’s been upgraded to more of an “application service”, and somewhere along this path we’ll likely hit a point where we do want to have tests dedicated to B.
Layers of Testing
Developer written unit tests are only the first layer of the testing pyramid of course. Above that we have other automated testing, such as browser driven, end-to-end tests and then manual, exploratory testing by QA personnel. A focus is to ensure these tests are arranged in a true pyramid — rather than an ice-cream cone — with the majority of testing going on at the lowest, unit test level, where they can be fast, repeatable and non-brittle.
In terms of then determining what to test at the higher levels, it’s important there’s good communication between developers and QA, to ensure that the things that can’t feasibly be covered in unit testing are handled at the higher levels, but also that there’s not unnecessary duplication of effort.
A good example that came up around this recently was around form validation, which in this case, was rather complex — including for example fields that were required when other fields were completed with certain values. At the unit test level, we had good coverage of this, ensuring that the “form model” — created in code following a form post — was valid or invalid in all the expected scenarios. What’s missing here for full test coverage though are the parts at either end of the form post life-cycle when running on the website for real:
In particular, we’re not actually submitting a form from the browser, and we’re not saving the result to a database at the other end (as we’ve mocked this part). So it’s important those parts are covered via higher-level, automated or manual tests.
We likely though don’t need to cover all the form validation scenarios in those higher level tests — as the value versus effort trade-off quickly tails off after the first one. Rather we can rely on the comprehensive set of unit tests to ensure the validation rules are set up correctly, and a smaller set of higher-level tests to check the outer layers and confirm we’ve got all the components wired up correctly.
Finally I wanted to consider the topic of test driven development (TDD). It’s a technique that tends to divide developers into two camps: those who swear by it, and those who don’t touch it (even if they are on board with unit testing itself). Even within Zone, whilst I expect we’re all in agreement with the value of unit testing as an approach, I’m sure there are differences of opinion and practice when it comes to working in the TDD style.
Speaking personally, I’ve used it on occasion, and found some value, but on balance find I fall mostly into the “tests after”, or “spike and stabilise” camp, and the method of using tests to drive the design of the code isn’t something that’s really become part of my usual programming workflow. Rather I’ll write some code that looks like it’ll do the job, and then write tests to verify, providing the verification that code is correct. And then, with the tests in place, the red-green-refactor approach can be adopted to improve the code quality whilst maintaining working functionality.
Even for those that aren’t TDD adherents though, I think there are at least a couple of situations where we can adopt it and I find myself doing so.
- Even with automated testing, our QAs still raise bugs! Many raised in web development projects are things that server-side, unit testing won’t help with — rendering issues on different browsers and devices for example. A true logic error is rare; as it should be with a solid unit testing approach. If one does appear though, we should take that as an affront to the test coverage — we’ve clearly missed something — and so the first thing to do is to rectify the fact that there’s a missing (or incorrect) test case, and to create a failing test for that. Then we can move on to amend the code to fix the bug, ensuring the test (and all others) go green, and the bug won’t reoccur.
- When working with an application feature that’s close to pure logic, like an algorithm, getting into a TDD approach is easier, as there’s no need for mocking dependencies or awkwardly writing much test code that won’t even initially compile. Here, creating tests and then code to make the tests pass iteratively, building up the logic and considering the edge cases as we go, can certainly be a productive way to work.
As is often the case with software development best practices and principles, with these topics around unit testing, we’ve seen there’s often not a black and white rule to follow. As usual, judgement and experience are called in in making practical decisions around when and where to test.