I’m a big fan of unit tests. Really. Big fan. Super big. Massive. Like, “make unit tests great again” massive. That.
So when I was working on my FB project (https://cgsql.dev) of course I invested in a lot of tests. In fact there is about as much test code as there is code code. And I wouldn’t have it any other way. That stuff saves my life on a daily basis because when you’re working on any code, but maybe especially on a compiler, it’s really easy to hork things up. So the code has 100% line coverage and lots of things covered with an assortment of inputs to get interesting permutations of that. It is awesome, it helps a ton. The proof is that I get about 2 bug reports a month on 23k lines of code that turn out to be actual defects rather than pilot error. Of course, being an average, sometimes there’s more, like say, this month…
The usual objection I hear about doing high levels of code coverage is that “it’s not worth it to get the last few percent” there’s a “diminishing returns” argument. In my experience that’s bunk because:
- the 1,000th unit test is not any harder to write than the 100th, in fact it’s often EASIER to write because the scaffolding is already there
- the 1,000th unit test is not any less likely to find problems than the 100th, in fact it’s often MORE likely to find bugs because it’s probably exploring code that rarely runs, but still needs to be correct
- and frankly it’s just not that hard to get to 100% coverage of at least lines (what I did) or blocks (what I wish I could do, long story)
The other argument is that you just don’t get much value because you still aren’t fully protected even at 100%. And that is very true, but not being fully protected doesn’t mean you should then just give up. Most safety rails can’t prevent every possible accident but they are still a great idea. But let’s get back to this month, and it’s not even over yet, and what’s gone wrong. It’s the two things that unit tests can’t ever save you from. [Forgive me for ending a sentence with a preposition.]
1: Correct isn’t what you think it is
Yeah, sometimes the problem is that the code does exactly what you intended it to do and yet, that is just wrong. Maybe you didn’t quite understand the problem properly, maybe you had some incorrect notion of how an API is to be used. In a unit test you could easily mock out the API that you’re calling because unit test, and that’s what you do in a unit test. But then you’ll never know you did it wrong. If you want an example of that, you need look no further than this beauty “LIKE expression escape doesn’t mean what we think it means”.
In this particular example, the problem was that in the SQL
x LIKE y ESCAPE z the meaning of that escape string isn’t that it escapes the just the ‘%’ character and therefore ‘%’ as an escape is a good idea so you can do ‘%%’ … nope… it escapes any character and so making ‘%’ the escape breaks everything because now ‘%’ doesn’t mean wildcard ever. Not that I understood this… at least not before that bug report came in. It used to sort of work but when the underlying API became compliant the whole thing broke.
Now lucky for us there was also a mini-integration test that actually tried the thing on real SQLite and so an experiment run by a Good Samaritan on SQLite trunk found it. So not a total loss. But a great example of how you need other kinds of testing help you find the inherent weaknesses in your unit tests.
2: Your Imagination Fails You
Sometimes the problem is that there just isn’t enough variety in your test cases. You think you’ve covered all the important patterns. You hit 100% coverage. You maybe even went “beyond 100%” to add more variations that you thought were interesting and challenging, and important to get right… But you just forgot one, or seven, or a whole class of them.
If you want an example of this problem, look no further than “Right side of left associative operators at equal binding strength got wrong parens!!” When I first saw this report I was gobsmacked. How could I possibly have missed this? It’s been broken like this for nearly 3 years. What the hell?
There are lots of tests for order of operations and so forth but clearlly we never tried this pattern
x - (y - z) in which the parens must not be lost even though everythign is l to
+ in binding strength. Compare with
(x + y) + z where it’s ok to lose the parens yielding prettier codegen
x + y + z but
x - y - z is a disaster… Getting the parens right in general is very important but especially so because this thing is generating C code from SQL code and SQL has different order of operations than C, so in general you can need extra parens. We had to know exactly where parens go in order to get SQL expressions like
x | y & z correct because SQL does those left to right, so the generated C must be
(x | y ) & z. Perfect parens were ante for us.
Why didn’t we catch this problem sooner? Well, every single one of the order of operations tests had the delicate bit on the left, and the code is symmetric, so no extra test cases were needed to cover both sides and there you go. 100% not good enough. We needed some cases where the delicate bit was on the right.
But again, after making this change, which while being quite small, is potentially quite invasive, I was pretty confident nothing was broken by this because there was good coverage of both sides now and no other tests broke. And there were a lot of them.
Feeling humble this week. But still glad I have the tests that I have. And still trying to think of ways to find more that would help. Having fewer tests never crossed my mind.