The Economics between Testing and Types

I’ve seen several never ending debates on the effectiveness of testing vs types. I’m a bit surprised how often these end up in a false dichotomy between the two choices. You of course can test with types and without types, or use typing with or without tests. If the goal is simply to find the most defects in a given system just use both. If you ask a child if they want chocolate cake or ice cream the smart ones will say both, but some times you can get them to fall for the false dichotomy and just choose one.

Most of the arguments are if one approach is more efficient, that question is less clear cut since there are lots of variables involved that are likely project specific so generalizations about what’s better seem hard to nail down. Are “types worth the trouble” or are “test worth the trouble?” The answer of course is “it depends…”

So this post is somewhat motivated by this talk from John Harrison that made it into my reddit feed this week and this other reddit post about Typescript. At the end of the talk 37 min into the video someone asks, why John is hand wringing about the level of confidence of results produced by several automated theorem proving systems versus just finding lots of bugs. He gives a very interesting answer about how counting bugs is minimally easier for most managers to understand. He also says that bugs like the Pentium FDIV bug cost Intel 500 million dollars, and so long as there is even a reasonably low probability that his hand wringing over mathematical foundations will stop a repeat, Intel can easily justify his salary.

This may sound a bit tongue and cheek, but it really is the essence of the “is X worth the trouble” debate. You really need to quantify the cost of not finding a bug with a given method vs the cost of effort put into finding the bug.

I can completely believe that there are some cases where both typing and testing are not worth the effort. I’ve written tons of throw away scripts that just need to work once or twice, and I don’t bother with unit tests and hack them up in the easiest thing I can find. Some times the scripts don’t have to work for all inputs! They just need to work well enough so I can fix the exceptional cases by hand. Other times, I know the code is just a stop gap and is going to or ought to be deleted in a week or so. In these cases, I’ll get code review from a collogue manually test the primary scenarios and move on.

There are cases where types are the only viable mechanisms. Specially, for publicly exported interfaces. If I make a change to a publicly exposed interface that 10+ teams depend on, there is no way I can reasonably expect all those teams to have written enough testing to catch the fact I renamed some method. Some would say, maybe I shouldn’t be making breaking changes to public interfaces, that’s another topic for a later post, but let’s assume not everyone lives in world where everything is a micro-service talking over a well versioned REST protocol.

There are other cases, where types simply aren’t up to the tasks, and tests are the only way to verify things. For example, if I need to ensure that my component logs diagnostic data so I can debug it when it starts failing for reasons beyond it’s control, I can’t think of anything other than writing a test that simulates a failure and scraping the logs to ensure the output is as I expect. Perhaps this can be captured with a dependently typed IO Monad.

I also sometimes need to verify a system sticks to hard real-time deadlines. A full system test with integrated hardware and software is often the only way to verify this sort of thing with any degree of confidence. Verification folks will point out the absence of a failure doesn’t guarantee correctness. I won’t disagree, but often this is the only economically viable approach, as I don’t have the ability to hire an army of PhD to fully model the system and do a timing analysis. Also, if the system is designed correctly at a high-level you often can convince yourself that the variance between worst case and average case timings are with your limits, so an average case measurement of timing that is within the deadline is both necessary and sufficient. Getting this design right takes some doing. The good thing about real-time systems is they have to be simple, or else you have very little hope of them actually working.

Recently, in my day job, I had a choice of rewriting a bit of code in C++, Bash, or Python. The script enumerates some directories and compresses them and then uploads them S3. I started with Bash because the original script that almost did what was needed was written in Bash. I discovered my newer version needed to parse some JSON output from a REST API to avoid the need to store persistent S3 credentials. I was tempted rewrite the whole thing in C++. As JSON parsing in Bash while doable seemed more trouble than using something with better JSON support.

I decided to do something “horrible” and rewrite just the bit I need in Python first, before I eventually migrate the whole thing to a single Python script, with good unit tests. I suspect if I could pull in a Go dependency into our system, I would have just done that as well. However, that’s a non starter and Python is perfectly up to replacing my one page Bash script.

The real question is how much effort I’ll put in to writing unit tests. Since this code will likely be used almost daily and there aren’t that many code paths, so normal usage will get high-code coverage. Likely, I’ll ignore any tests until, someone reports an issue late in our release cycle that I’ll feel guilty about not finding earlier. Until, then I’ll go do the next burning thing on my infinite queue of things to do.

The Crystal Ball

I will say one thing, the cost of using types has been steadily going down as we get more advanced type systems. The costs of test execution has been going down as we do more and more automated testing, however I haven’t seen lots of technology that makes writing tests that much easier. So, if all things are created equal using types tomorrow will make more and more sense if you are strapped for resources and have to choose between testing and types. You should of course do what works for you, but we all should be lucky and be able to have chocolate cake and ice cream. In fact as we get better at types and tests, we’ll have more opportunities to do both.

Show your support

Clapping shows how much you appreciated Daniel C Wang’s story.