Why Most Unit Testing is Waste — Tests Don’t Improve Quality: Developers Do

Scrapyard — pexels.com

Why Most Unit Testing is Waste” By James O Coplien was really attractive when I found this article on the internet while I was on-boarding Test-Driven Development. Everyone was talking about benefits and how good it gonna be if we adopt it.

I believe that nothing in this world is ever completely one-sided. There are black and white, good and bad, as well as TDD. I had got a lot of information on the good sides of it. I was looking for how to make it right and the bad sides of it so I know what do I need to be careful.

The author is a consultant probably on Java areas as he mentioned about JUnit and Maven. He also mentioned about punch cards that he was raised in such programming culture so I presume that he is quite experienced in software development.

When I started reading his article, I thought he might be an old man who didn’t know about new things like TDD, Agile, DevOps, etc. But I was wrong. He knows about TDD, Feedback Loop, Exploratory Testing, Regression Tests, XP, Continuous Integration, Lean, “Fail Fast”, Scrum, Product Owner all of these things are in his article.

Since the original article is pretty long (21 pages) so I have put my key takeaways below for you to digest easier (hopefully). But before you start reading my summary, make sure you understand the difference between Unit, Integration, and Functional Tests or you can check out my story:

1.1 Into Modern Times

  • In the old time (he used “FORTRAN days”), unit testing was very helpful when programs are procedural and modular by splitting into multi-layered smaller chunks.
  • When it comes into OOP era, it is impossible to reason about run-time behavior of code by inspection alone.
  • Classes encapsulate data and actions and it could behaves differently based on its current state and its context environment.
  • We don’t test classes and we don’t test objects — unit of functional test in OOP is method. Mocks provide the context of the environmental
    state which the method under test depended.

1.2 The Cure is Worse than the Disease

  • Unit testing is not just an issue in OOP, but the combination of it, agile, software development, and a rise in tools and computing power
    has made it de rigueur (fashionable).
  • Here is what he got from his recent client, Richard Jacobs at Sogeti (Sogeti Nederland B.V.):
my team told me the tests are more complex than the actual code.
I did make code for testability purposes but I hardly wrote
any unit tests. However I was renowned for my code quality and
my nearly bug free software.
  • Unit tests would have to be orders of magnitude larger than those in the unit under test.
  • Few developers admit that they do only random or partial testing and many will tell you that they do complete testing for some assumed vision of complete.
  • Notions such as: “Every line of code has been reached,” which, from the perspective of theory of computation, is pure nonsense in terms of knowing whether the code does what it should.
Be humble about what your unit tests can achieve, unless you have an extrinsic requirements oracle for the unit under test.

1.3 Tests for their Own Sake and Designed Tests

  • A client in northern Europe, their developers were required to have 40% code coverage for Level 1 Software Maturity, 60% for Level 2 and 80% for Level 3, while some where aspiring to 100% code coverage.
  • Large functions which was impossible to reach 80% coverage were broken down into many small functions for which 80% coverage was trivial.
  • This also meant that functions no longer encapsulated algorithms. Since those lines of code are no longer adjacent to the one you are concerned about. That sequence transition now took place across a polymorphic function call — a hyper-galactic GOTO.
If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity.
  • People confuse automated tests with unit tests. Remember, though, that automated crap is still crap.
  • If you’re going to automate, automate something of value. You’ll probably get better return on your investment by automating integration tests, bug regression tests, and system tests than by automating unit tests.
  • A smarter approach would reduce the test code through formal test design: that is, to do formal boundary-condition checking, more white-box testing, and so forth. That requires he unit under test be designed for testability.
  • I advocate doing this at the system level where the testing focus should lie; I have never seen anyone achieve this at the unit level.
Tests should be designed with great care. Business people, rather than programmers, should design most functional tests. Unit tests should be limited to those that can be held up against some “third-party” success criteria.

1.4 The Belief that Tests are Smarter than Code

  • Programmers have a belief that they can think more clearly (or guess better) when writing tests when writing code i.e. there is more information in a test than in code. That is just nonsense.
  • If your coders have more lines of unit tests than of code then they may be lacking in analytical mental tools or in a discipline of thinking, and they want the machine to do their thinking for them.
If you have a large unit test , evaluate the feedback loops in your development process. Integrate code more frequently; reduce the build and integration times; cut the unit tests and go more for integration testing
  • It’s much easier to avoid putting bugs in than to take them out. But let’s be clear that there will always be bugs. Testing will not go away.
If you have comprehensive unit tests but still have a high failure rate in system tests or low quality in the field. Carefully investigate your requirements and
design regimen and its tie to integration tests and system tests.

1.5 Low-Risk Tests Have Low (even potentially negative) Payoff

  • Good testing is based on careful thought and on basic principles of risk management. if the testers don’t have rudimentary skills in this area, then you are likely to do a lot of useless tests.
  • Testing does not increase quality; programming and design do. Testing just provides the insights that the team lacked to do a correct design and implementation.
  • If the probability of the test passing is 100%, then there is no information — by definition, from information theory.
  • If we can’t predict at the outset whether a test will pass or fail then each test run contains a full bit of information. The information comes from failed tests.
Look at the tests that have never failed in a year and consider throwing them away. They are producing no information for you. The value of the information they produce may not be worth the expense of maintaining and running the tests.
  • Having too many unit tests would decrease their velocity, because
    every change to a function should require a coordinated change
    to the test.
  • If you had written their tests in such a way that you didn’t have to change the tests when the functionality changed. That means the tests weren’t testing the functionality, so the tests have no value. Throw them away.
  • Only tests that have business value are those that are derived from business requirements. So one question to ask about every test is: If this test fails, what business requirement is compromised?
  • If you don’t know the value of the test, then the test theoretically could have zero business value.
  • Tests do have a cost: maintenance, computing time, administration, and so forth.
If you cannot tell how a unit test failure contributes to product risk, you should evaluate whether to throw the test away. There are better techniques such as exploratory testing and Monte Carlo techniques. Don’t use unit tests for such validation.
  • There are some units and some tests for which there is a clear answer to the business value question. One such set of tests is regression tests; however, those rarely are written at the unit level but rather at the system level.
Test key algorithms for which there is a “third party” oracle for success, rather than one created by the same team that writes the code. “Success” here should reflect a business mandate rather than the opinion of a team member called “tester”

1.6 Complex Things are Complicated

  • Statistics may tell you the wrong things. A test may pass 99.99% of the time but one test in ten thousand that fails might kills you.
  • One of client that was wonder about why tests weren’t working in his team, because they had worked for him in an earlier job. He sent his early version of paper and he replied:
I am an avionics engineer whose career started as an embedded software developer with one foot in the hardware development. We were highly disciplined while we were working on security systems for banks, penitentiaries, fire houses, police stations, emergency services, chemical plants, etc. It had to be right the first time all the time.
  • It is insufficient to reproduce the state for just the module or class containing the function or method under test: generally, any change anywhere can show up anywhere else in a program and requires that the entire program can be retested.

Reference: Perry and Kaiser, “Adequate Testing and Object–oriented Programming,” Journal of Object-Oriented-Programming 2(5), Jan. 1990, p. 13)

  • My definition of code coverage is the percent of all possible pairs, {Program Counter, System State} that your test suite reproduces; anything else is a heuristic.

1.7 Less is More

  • The classes he was testing are code. The tests are code. Developers write code. When developers write code they insert about three bugs per thousand lines of code — which includes the tests.
  • With such bugs, we find that the tests will hold the code to an incorrect result more often than a genuine bug will cause the code to fail!
  • Watch what your developers do when running a test suite: they’re doing, not thinking (like most of the Agile Manifesto, by the way).
  • If you have 200 tests — or 2000, or 10,000 — you’re not going to take time to carefully investigate and re-factor each one every time it fails.
  • The most common practice is to just overwrite the old test golds (the expected output or computational results on completion of a given test) with the new results.
  • Today’s fast machines give the illusion of being able to supplant the programmer’s thinking; their speed means I don’t take the time to think.
  • If a client reports a fault, and I hypothesize where the actual bug lies and I change it so the system behavior is now right, I can easily be led to believe that the function where I made the fix is now right. But that’s just bad science. It’s necessary to re-run all the regressions and system tests as well.
If the tests were higher quality than the code because of a better process, I would advise the team to improve their process so they take the smart pills when they write their code instead of when they write their tests.

1.8 You Pay for Tests in Maintenance — and Quality!

  • One technique commonly confused with unit testing, and which uses unit tests as a technique, is Test-Driven Development.
  • With TDD, you’ve introduced coupling — coordinated change — between
    each module and the tests that go along with it.
  • Removing them before you ship doesn’t change their maintenance behavior. (And removing them before shipping may even be a bad idea)
  • An assertion failure meant that something in the program was very wrong and that it was likely that the program would produce the wrong result.
Turn unit tests into assertions. Assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that.
  • There are some unit tests that just reproduce system tests, integration tests, or other tests. Get rid of unit tests that duplicate what system tests already do. If the system testing level is too expensive, then create subunit integration tests.
  • Unit tests gave the developer more immediate feedback about whether a change broke the code instead of waiting for system tests to run. Today, with cheaper and more powerful computers, that argument is less persuasive.
Check your test inventory for replication. Create system tests with good feature coverage (not code coverage) — remembering that proper response to bad inputs or other unanticipated conditions is part of your feature set.
  • If you needed a unit test because it was impossible to exercise that code unit from any external testing interface. If your testing interfaces are well-designed and can reproduce the kinds of system behaviors you see in the real world then…. delete the code!
  • Reasoning about your code in light of system tests can be a great way to find dead code. That’s even more valuable than finding unneeded tests

1.9 “It’s the process, stupid,” or: Green Bar Fever

  • The most serious problem with unit tests is their focus on fixing bugs rather than of system-level improvement.
  • There are two potential goals in testing. One is to use it as a learning tool: to learn more about the program and how it works. The other is as an oracle.
  • This is why it works to walk away from the terminal for a while.
  • Debugging is the use of tools and devices to help isolate a bug. Debugging is not testing.
  • Unit tests can be a useful debugging tool.

1.10 Wrap-up

  • Go back to Richard’s words:
I did make code for testability purposes but I hardly wrote
any unit tests. However I was renowned for my code quality and
my nearly bug free software.
  • Maybe Richard is one of those rare people who know how to think instead of letting the computer do your thinking for him
  • Lack of widely available computing equipment forced people to think.
  • The author was raised in a programming culture with code was on punch cards that you delivered to the operator for queuing up to the machine and then you gathered your output 24 hours later. That forced you to think — or fail
  • They had to do it right the first time.
  • What worries him most about the fail-fast culture is much less in the fail than the fast.
  • Debugging isn’t what you do sitting in front of your program with a debugger; it’s what you do leaning back in your chair staring at the ceiling, or discussing the bug with the team.
  • There’s something really sloppy about this ‘fail fast’ culture in that it encourages throwing a bunch of pasta at the wall without thinking much
  • The fail-fast culture can work well with very high discipline, supported by healthy skepticism, but it’s rare to find these attitudes surviving in a dynamic software business.
  • Sometimes failure requires thinking, and that requires more time than would be afforded by failing fast. No one wants a failure to take a long time.
PO should conceive (and at best design) the system tests as an input to, or during, Sprint Planning.__ Jeff Sutherland
  • There’s a lot of advice, but very little of it is backed either by theory, data, or even a model of why you should believe a given piece of advice.

Author’s Summary

  • Keep regression tests around for up to a year — but most of those will be system-level tests rather than unit tests.
  • Keep unit tests that test key algorithms which there is ascribable business value.
  • If you can do either a system test or a unit test, use a system test — context is everything.
  • Design a test with more care than you design the code.
  • Turn most unit tests into assertions.
  • Throw away tests that haven’t failed in a year.
  • Testing can’t replace good development: a high test failure rate suggests you should shorten development intervals.
  • Rewarding coverage or other meaningless metrics can lead to rapid architecture decay.
  • Be humble about what tests can achieve. Tests don’t improve quality: developers do.

My Closing

I tend to agree to some of his points but I still believe in unit tests that they can really provide good feedback to developers during development. But the maintenance is still in doubt and especially for legacy codes.

Lastly, I tend to agree with this quote:

Don’t blindly follow process for process sake. Use your experience and keep and open mind about what’s important to your products needs. __ Elliot Chance

Related Articles