From Types to Tests — What We Lost in the Move to Dynamic Languages
I have a confession to make: I kind of hate writing tests. Thats not an easy thing to say because tests are universally considered to be a Very Good Thing. Maybe even The Best Thing. Employers and colleagues like to be reassured that you write lots of tests and are thus a good engineering sort of person.
Why the aversion? Maybe because thats not how i learned to code. Thinking about a bunch of tests/descriptions before I’m even sure exactly what I want to do or how to do it just doesn’t feel natural to me. I’ll probably always feel more comfortable diving into the “actual code” first or thinking about how that code might work.
Discovering Dynamically Typed Languages
Long before I hated writing tests I also wasn’t very fond of writing out type annotations and wrestling with compilers. Maybe not as much as I didn’t like writing tests but enough that when I first saw/used Python it was a magical thing.
foo = 666
foo = "I am foo!"
Whats not to like about that? You don’t have to waste precious keystrokes typing out a bunch of annoying types. These were some kind of smart variables and this was obviously a very smart language that was clearly more advanced than the C/C++, Java and Haskell I’d thusfar seen.
Some small part of the endless burden of telling the machine exactly what to do had been lessened. As a far more experienced friend at the time said to me “When the Gods coded reality, they did it in Python”.
Naturally I wasn’t thinking about whether this sort of thing might “scale” to large programs. I’d never really written a really large program. I’d never really worked on one. I don’t think I’d even heard anyone discuss the concept. Though I knew enough to know you eventually forgot every piece of code you wrote I generally never had to maintain anything either — uni assignments were handed in and that was that.
In time people began to use scripting languages for more and more ambitious applications. Frameworks such as Rails exploited the meta-programming potential of languages like Ruby to reduce boilerplate and do magical things. Around the same time TDD was becoming a thing, led by people like Kent Beck (who rediscovered it in an “ancient” textbook). TDD was an approach to software development that promised to lead to better code and also offered some of the same sense of security that type safe languages had:
- greater ability to add features and make changes to existing code without breaking things that previously worked — “regressions”
- a form of documentation
- a way to convince other developers that their code worked.
Tests weren’t just about developing working code, they were about maintaining, refactoring and extending it. Test suites ran continuously to reassure everyone that things were going fine.
From Types to Tests — What Were the Consequences?
This is where things are going to get a little more subjective. Time to cover my ass with a bunch of disclaimers.
Tests Are Good — Testing Is Good
Let me be clear about that up front. I guess there is the inherent potential for a post like this to piss people off and I don’t want it to be a “types vs tests” thing. Though admittedly I will skirt awfully close at points. In general though I’m of the opinion that both type checking and automated testing improve software quality.
I’m Going to Be Waving My Hands around a Fair Bit
- There are a lot of subtleties that I’m not going to consider — strong/weak typing, nuts and bolts stuff about different types of tests (beyond a very general unit/feature distinction) etc. There are dependent type systems in which types can be dependent on values — e.g. one can specify the type of all Integers less than 5 etc — but we won’t deal more with that here.
- I’m not going to go particularly deep into tests or types so if you are a guru, afficiando or an enthusisast of either camp you may feel somewhat shortchanged, but hopefully not too misrepresented.
- For simplicity’s sake you can assume i’m talking in the context of some vaguely defined functional language. I’m more trying to make a point about real world practices I suppose.
Basically what I’m trying to say is — I’m trying not to write a book here.
Types: What They Are & What They are Good At
In type theory, every “term” has a “type” and operations are restricted to terms of a certain type
Back in the day mathematicians were trying to establish a solid philosophical foundation for mathematics. They knew Maths was super-useful but distressingly it rested on kind of vague foundations. The challenge was to boil down mathematics to its essential principles and the holy grails was to use those principles to prove that mathematics itself was consistent. A system for writing Mathematical proofs that could write a mathematical proof that its proofs were correct.
The idea of types was introduced to avoid a distressingly common type of mathematical paradox that repeatedly cropped up in early attempts to formalise mathematics by philosophers such as Bertrand Russell. Basically the liar’s paradox embodied by “This sentence is false” and its various forms. The same idea (“the diagonal argument”) was later used by Gödel as part of his Incompleteness theorems that made people feel sad about the whole idea.
By restricting logical statements such that they could only take certain types as arguments then these paradoxes could be avoided. i.e. if the statement “(x) is false” is unable to take other statements as arguments (and thus can’t make paradoxical statements about itself).
So basically, in order to make these formalisations correct mathematicians added restrictions about how logical propositionas could be used by creating types and insisting functions could onlly accept certain types. These mathematical formulations formed the basis of our modern typed languages.
Types in Programming Languages
In programming languages, a type system is a collection of rules that assign a property called type to various constructs a computer program consists of, such as variables, expressions, functions or modules. The main purpose of a type system is to reduce possibilities for bugs in computer programs by defining interfaces between different parts of a computer program, and then checking that the parts have been connected in a consistent way.
Its not hard to see that types do similar things in modern programming languages. With types we can declare that functions only take certain values as input. In low level languages such as C this chiefly mainly serves to prevent errors in terms of memory allocation etc but in languages with more sophisticated type systems these restrictions can be much more meaningful and mostly serve to prevent errors from creeping into our code.
Its difficult to talk about this without invoking a circular definition but types are generalisations of values. e.g. We could say “1, 34, 87, 4” (though of course the inductive definition of natural numbers would be more concise) but instead we can just say integers.
Its perhaps helpful to consider what type checking would involve if it wasn’t somethings done for us automatically. A type checker reads through your codebase, keeps track of what is said about what types are associated with which functions/variables and ensures that none of these language components are being assembled and used in a way that we have declared is incompatible. In many ways its kind of like an automated code review.
As we build functions, modules, libraries and applications out of these smaller, typed components this information accrues to describe the system from the micro to the macro scale. Types therefore are fundamentally suited to describing and defining programs and how they are constructed.
Tests: What They Are & What They are Good At
Usually a testing library involves one or more declarations in the form of strings coupled to one or more assertions concerning the output for one or more inputs. More often than not these inputs involve a few edge cases, a couple of expected values and a couple of unexpected values to ensure that our code returns or does not return these values as the case may be.
Again its instructive to consider what testing automates in terms of if we had to do it manually. Tests automate the process of repeatedly executing a piece of code after every change and comparing its output (relative to a few selected values) to some values we consider to be correct or incorrect. They don’t “read the code” like our type checker does. They run it.
Tests, therefore, are good at checking that our code produces desired output for a few cases and doesn’t produce undesirable output in other cases. Tests excel at constraining what a program does.
So far so good.
If we accept that types and tests have fundamental differences are useful in different situations, what then might be the consequences of doing away with one and going all in on the other?
First lets just see if we can extract and enumerate some fundamental differences from our earlier descriptions
Types Describe Source Code, Tests Check Behaviour
This is the big difference IMO — types and tests live in two fundamentally different worlds.
Types fundamentally describe/define code. From the smallest building blocks through to full-blown applications. Type annotations allow type checkers to get to know something about our code.
Tests check for desired/undesired behaviour. They do so by comparing expected and actual outputs for a number of inputs. They “live outside” our code, they don’t have any knowledge regarding how it was put together — they only know whether it produces the right answers or not.
General vs Specific
Types “talk about” groups of values while tests consider specific values. I think this is a less profound distinction than the code/behaviour distinction previously mentioned but its worth mentioning — especially when we seek to maximize code coverage.
When All You Have Is a Hammer — the Consequences of Using One Tool Everywhere
OK, so much for the differences. What are the consequences of doing away with types and emphasizing tests?
Tests are Expensive
Tests take a relatively long time to write. In terms of raw LOC they also pack a punch — test suites can rival or exceed the codebase they were built for. All this “test code” has to be reviewed by other developers, just like the “code that actually does stuff”. More code to look through, maintain, and wonder what the hell that person meant when they wrote it etc.
More costs: Its often pretty easy to write tests that aren’t particularly fast -the nature of a test is that it has to run code — not just read it. Developers have to wait for these tests to run and pass on CI servers (not just on their machine) before their code can be merged or a failure sends them to the back of the queue again. Tests then spend the life of your application being run on some CI server hundreds, thousands, millions of times which is of course another cost.
Again — tests have benefits. The most objectionable part about this for me is that the development/maintenance cost of tests is something that is almost never discussed or acknowledged, no matter how onerous they may become. Its just the price of doing business.
Types are Pretty Cheap
Type annotations are relatively terse. Languages with type inference can even work many of them out without you adding anything in terms of annotations.
Type checking itself is fast — feedback is more or less instantaneous. Type checking is environment independent also (saying “it type checks on my machine” wont get you slapped upside the head).
Tests & Behaviour Coupled to Implementation
This one touches on an often debated question in TDD. Testing public APIs vs private methods/functions. Testing interfaces rather than implementations is encouraged by most but in practice its a complicated question.
Part of that may be attributable to a desire to maximise correctness in the absence of types. This is achieved by maximizing code coverage — the attempt to cover every code path, however that may be defined, with a test. If this goal is to be achieved with any degree of realism then it becomes unavoidable that our implementation and our tests become tightly coupled. Again, I think this temptation is exacerbated by an absence of type checking, where any piece of code without some sort of test is the source of anxiety.
This is less of a problem with third party code that has its own tests. But in the case of our own code its pretty standard to just run all tests, from units to features and everything in between for every change. So while some tests are independent from implementation, in practice people don’t often make the distinction and thus it becomes meaningless in regards to test suites.
Types & Implementation
Types naturally define interfaces -the inputs and outputs of functions, modules, libraries. Theres no confusion in terms of accidentally type checking an implementation thats no longer used. If we swap out a piece of code its not like the wrong types will be checked because we didnt update things. If types dont match up then code needs to be changed so that they do.
Theres no undue coupling between interface and implementation.
this leads on to the next idea….
Tests are Fragile in the face of Changing Code
There are a few reasons for this IMO:
- They are inherently specific. When the desired behaviour of your code changes, so must your tests.
- Specific aspects of behaviour, the thing that tests are good at describing, change relatively frequently
- The proliferation of testing throughout the codebase means changing any piece of code almost always means changing a test
Its the last point that is the problem IMO, in that it makes tests more fragile than they need to be (one could label any fragility regarding points one and two as necessary and unavoidable).
If we restricted ourselves to feature tests — this wouldn’t matter. A breaking test would only mean that our app no longer did what we intended. But with code coverage maximized a test almost certainly has to change whenever code changes. And the semantic meaning of a failing code is diluted — does it mean that your code is broken or does it mean that a test is broken?
Types are robust in the face of changing code
They make more general assertions. Changes to your app that involve changing what sort of things your app accepts as input or returns as values are relatively infrequent. e.g. Its entirely possible that you will want to recommend people wear pink polo shirts with popped collars next year but its unlikely you would recommend that they wear bacon or garbage bags. I imagine.
Conversely its for this exact reason that they are less suited to nailing down specific aspects of behaviour — ensuring that the collars popped message is still getting out there.
Tests aren’t Actually that Much Help When Refactoring Code
If you accept the preceding points then this isn’t very surprising.
- tests break both due to changed-but-correct-code and due to actually breaking code
Types & Refactoring
Type checking shines when it comes to things like code refactoring. When reshuffling functions/components you can have confidence that the guts of our resulting code still makes basic sense.
You only have to look at the sort of tooling that is available for statically typed languages to see how much help type information can provide here. When dealing with languages like Java/Objective C/C# etc its possible to use IDEs with vastly superior auto-completion, documentation etc. Its much, much harder to recreate this sort of thing for dynamically typed languages.
Program testing can be used to show the presence of bugs, but never to show their absence!
Edsger Dijkstra EWD249 “Notes on Structured Programming”
One could make the argument that one of the biggest attractions of both type checking and testing is the psychological sense of security they offer — the ability to go home at the end of the day with the sense that we haven’t just pushed come catastrophic error into the codebase. Sadly neither type checking or automated testing are guarantees against this happening as the experience of pretty much any human being who has used modern technology can attest.
However, I can’t resist pointing out another way that tests can fail in this respect. Again — the tight coupling between code and tests is involved.
At the end of the day whatever misconceptions or misplaced assumptions that might be built into your codebase are in all likelihood, also built into your tests. Which in all likelihood were written by the same people who wrote the tests. Thats why its so common that some particularly annoying/competent UA Tester is able to break a feature whose code and impressive looking collection of accompanying tests, has been reviewed and approved by your whole dev team. Because your tester is bringing a wholly different set of assumptions to the table whereas you may have never anticipated that anyone would want to enter their email address backwards under a full moon.
It is a bit unfair to pick on tests for stuff like this but it illustrates their limitations — they are only testing problems you can conceive of. Your tests are only as smart as you are.
Types & Confidence
“Statically typed programs and programmers can certainly be as correct about the outside world as any other program/programmers can, but the type system is irrelevant in that regard. The connection between the outside world and the premises will always be outside the scope of any type system.”
Basically Hickey is emphasizing that, while types may be able to help prove that a program is internally consistent with respect to a given set of rules, they can’t really say anything about whether it is actually correct with respect to the outside world. Only tests, and a good understanding of what correct behaviour actually is, can do that.
Still, when you can be confident that large parts of your application are doing what they are supposed to do that obviously frees up a lot of time to concentrate on getting the other stuff right.
As Another Form of Documentation
Tests are good at a certain type of documentation — again, what the code does. If you wanted to pick a map function from a filter function you’d have more luck looking at the function’s tests than at their type signatures.
Types document a different use case — how code relates to other code. If you accept the idea that types are talk about code then this seems pretty natural. Even in a dynamic language one has to be aware of whether a function accepts an object or a string as an argument and what it returns as a value.
Both have their strengths and weaknesses here and in a lot of cases both are going to be poor substitutes for documentation written by humans for other humans purely with the intention of explaining what something does.
- Types tell type checkers about code and are most helpful for things related to knowing about code and how it relates to other code
- Tests make assertions about the behaviour of code and are best suited for this purpose.
The ideal situation IMO is that type checking is used to check the internal consistency of your code and that automated testing ensures that it is spitting out the expected answers. i.e. Types “patrol” the internal interfaces between code while tests police the interaction with the outside world more or less.
Just to briefly touch on the expressiveness of different type systems when discussing this subject it was recently put to me that your tests should begin where your type system stops. In other words, when your type system can no longer say anything about the correctness of your program thats where testing should pick up the slack.
OK…But Seriously — Types vs Tests? Who Wins?
For those who want a Batman vs Superman type answer
We do! If we use both. 1 million points to House Gryffindor! Yay!
…Hmmm. While again I stress that both are invaluable additions to the modern development process I do think that types seem to be more useful in a wider variety of applications whereas tests tend to do only one thing well.
What about downsides to types? Meta-programming is much more difficult in a typed system — code that generates code. A sophisticated, expressive type system also takes a bit of effort to learn and master.
I think that the existence of a test is almost always better than the converse. However, as I’ve argued, tests are only really useful to check behaviour and due to their dynamic nature, the developer time they absorb they are costly. In short, IMO, tests these days are used to compensate for the absence of static typing with unsatisfactory consequences that many of us have simply become accustomed to.
At this point in time I honestly believe many languages/ecosystmes are test-heavy, if you will, in their approach to code quality control. You may not agree that ill considered test coverage can have net negative effects on development — that theres no such thing as too much testing — but I think many developers who might not have had types to make use of would be forced to agree that type checking adds significant value if that was to change.
Its inevitable IMO that people will increasingly rediscover the value of an explicit type system as Front End/JS Code becomes more complex. Add type checking to immutable data, unidirectional data flow and other functional techniques that will be increasingly relied upon to minimise developer error. Add well-chosen, automated tests to the above and we are doing about everything we can to write bug-free, maintainable code.
These blog posts either express similar thoughts or came to mind during the writing of this:
While there are no silver bullets, there’s an awful lot of low-hanging fruit just lying around. Let’s pluck it! We can make major improvements in our software quality, even with minor adjustments to our coding style. Code can be easier to reason about, with vastly less ways to fail, at a very low cost.
Testing in Ruby requires a great deal of effort that doesn’t seem to get acknowledged much…Basically what I’m getting at is that Ruby testing requires both a big upfront and continuing mental investment, the knowledge is not very portable, and the test code is subject to just as much bit rot as the rest of your code…It’s simply unrealistic to rely on developers to always shoulder the burden of controlling Ruby’s dynamic typing, silly putty flexibility with tests. It’s like having to rely on someone having to rebuild the guard every time they take out the chainsaw.
Big thanks to Rob Howard for an early proof read — with much help on the functional/typing side of things. Also thanks to my Mum for telling me “this is way too long” and finally teaching me something about web development.
Of course if you thought this whole thing was terrible its mostly their fault.
Still, thanks team.