My team works on one application, it has decent (80-90%) code coverage and yet every month we have regressions detected during manual QA process, which slip through automated tests. Why is that?
Well, because code coverage doesn’t warrant good tests. Test coverage information only helps with unit-tests and almost useless for everything else. Moreover, the mere presence of the coverage number can be harmful!
What was wrong with our code coverage?
Our SDET team measured code coverage for end-to-end UI tests (those that would click on buttons in the app on users’ behalf). Some time ago management asked SDETs to measure the coverage and to achieve a certain number in it. So they did. In fact, to maintain a high number they have to update the tests every time the UI changes (and it changes with every release for a mobile app) so they spent more time (and money) fixing tests and reacting on false-positives than on increasing the actual test coverage (management even went that far to recommend measuring coverage for manual tests, which fortunately never happened, and we have different guidelines now so I can safely tell this story).
Why exactly high-coverage-test doesn’t detect regressions?
Let’s take a small library I’m working on as an example. Without the test for the root-class CadabraImpl the coverage looks like that
Something is definitely not covered in VariantKt class, but let’s add the integration test for root-class CadabraImpl back and see…
VariantKt is a simple class that happens to be used in the larger one for which an integration test is required.
Integration tests always invoke constructors of different classes, call some methods, etc. to prove that classes work well together, but do nothing to prove their correctness, yet these calls will be detected by code-coverage reporting tool. The same happens with UI tests: even that application startup alone can produce significant coverage numbers, all the sub-modules initialized at startup will look like they’re covered. So integration and UI tests essentially become assertion-free tests in disguise if the code coverage interpreted as a measure of completeness because such tests never (as they should) have assertions to assess the behavior of all inner classes.
What the coverage is all about then?
It’s been a lot written on the topic (e.g). In short, similarly to Dijkstra’s famous phrase “Testing shows the presence, not the absence of bugs” we can say that code coverage shows the absence, not the presence of tests, and it definitely says nothing about their quality.
In the example above TDD style or bottom-up approach would do a better job: during the smaller class construction phase we can test it properly, make sure it works and only after that write some integration tests, the final coverage number would be same but our confidence in tests would be much higher.
Focusing on unit tests and covering small pieces at a time helps ensure every single part is working well, and there is no unused code (if it’s not covered maybe we should delete the code instead of adding a test). And not measuring coverage for integration and UI tests allows us to focus more on the actual purpose of such tests — integration aspects and observed behavior.
Can we still have coverage?
There may still be a reason or two to measure code coverage for non-unit tests or even strive for a high coverage number and modern tools can alleviate the problem by showing not only the number and covered lines/branched but also the “source” of coverage (e.g) but measuring a single number without understanding and properly communicating the meaning of it most likely will produce yet another “dashboard promoting ignorance”.