I Need Tests

The Green Bar. My best friend

I’m not a tester, I don’t claim to be one, and I don’t think I want to be one, but I need tests.

What I am is a developer who loves to deliver code and wants to be fully confident that what he delivers is high quality work that follows specifications with no defects. In order to do this, I need to write tests.

I have seen really complex pieces of software without any tests and not exactly following what I’d consider “clean code”, and I always walk away impressed. After all, they usually essentially work (although I usually only look at them to fix bugs) and have been in use for quite a while. It might just be a limitation of mine, but unfortunately, without tests, I would really struggle to get an application of that size out the door with any confidence.

So, what tests am I talking about, and why do I think I need them? Quite a few of them, and of different types.

To begin with, I need many many Unit Tests. To be completely honest, I get an uneasy feeling when writing some production code (by that I mean the code I actually deliver) for which I’ve not seen a test fail. I feel even worst when I have to retrofit tests to some code, and I see them green right way.

I started my experience with Test-Driven Development a couple of years back. I remember that when I first started with it, I honestly hated it. It seemed crazy to write tests for something I yet didn’t know how it would work, that I would have probably ended up throwing away once I changed my design. It felt wasteful and I didn’t feel any benefit doing it.

After doing enough, it finally clicked, and I find it hard to go back. I even write Learning tests to understand how to use libraries and applications that I’ve never used before (a post on this sometimes soon).

So why do I need them? I love Unit Tests because they give me confidence that I’m writing something that behaves exactly like I want it to, and because they give me so much control. Using test doubles I can return bizarre edge case scenario data to my classes and check that they react appropriately. I can throw exceptions from libraries that shouldn’t, and make sure that our app is prepared for it. Moreover it makes it simple for me to communicate with anyone looking at my code what I think functions should behave like and why. Unit Tests justify why I have error handling code around calls to other classes. They justify why I put in a bit of extra logic to handle some edge case.

What’s more, that allows me to start implementing code by writing whatever first pops into my head, as long as it passes the tests. It might be disgusting and complex or stupid, but it passes the tests. Once that’s done, I can then merrily refactor my code to make it look fantastic, knowing that I won’t break anything. It’s quite the load off my shoulders to be able to focus separately on getting code to work first, and then making it look decent enough for other people. I think that code is all about communication, and I find it extremely convenient to be able to think separately about getting it to work first, and then making it clear to other people.

Now I have a whole lot of passing Unit Tests. I’ve written what I hope to be nice, well readable and communicative code. Unfortunately I’m not done.

I’ve yet to see our application start up and actually do anything. Also, for example​, I’ve likely left some error logging at various levels in our application but I still don’t know whether the sequence of logs produced in error case make it obvious where to find the problems - and if it’s not obvious from the logs, it’s not good enough.

What I need now, are Component Tests (or Application Tests, or Functional Tests). These are tests that usually run in an entirely separate process from your application, and test it from the outside, via its natural ports. I work with backend services that communicate over HTTP, so for me, these tests are a set of HTTP requests hitting our application and checking what happens. The fundamental thing about these tests is that the application is up and running just as it will be once it gets to production. There’s no injected code that fakes the database or test library that makes all HTTP requests resolve instantly. The only difference between the app now and when it’s in production is that, using configurations, I’m pointing it to a stub HTTP server, and that the database that I’m using for testing is in my complete control, so I’m free to wipe it or fill it with garbage, as my tests need it to.

One thing to note is that I have far less of these than I have Unit Tests. It’s not even close. For a given feature I might have a few dozen Unit and maybe five or six Component tests. These are much higher level checks. They usually verify the happy path for a feature and a few unhappy cases. For example, say we need to retrieve some data from service A and combine it with data from service B. I’ll probably have a test for when both A and B return the correct data, one for when one A responds but B fails, and one for when both fail, to exercise the error handling paths.

I run these on my machine before I commit any code, but they obviously also run in a step in the delivery pipeline. I’ve seen two approaches for this: deploy to a Dev environment and run tests over HTTP, or just spin up the app within Jenkins and test there. While running these in Jenkins is probably the fastest approach, it introduce some issues. For one, you’ll probably see in the job logs entries from Jenkins itself, your test framework, your HTTP stub server, and potentially your database (if you run all these locally). Worst, you won’t have logs either of these, and then finding bugs at this stage will be a pain. Moreover, you’ll have no confidence that the infrastructure around your application works when you deploy to some environment. You’ll likely want to ensure this all works before deploying your app to any environment where it is expected to be up and running all the time. If you decide to deploy to a development environment, you’ll have to take care to deploy your application and your HTTP stub server as if they were two different applications, to ensure that your setup matches production as closely as possible.

With these tests done, I have now enough confidence to commit my code and have our pipeline deploy it to some test environment. Here also is where I’m happy to hand off matters into the hands of the capable tester in our team. The next step to test our application would be to run a subset of the Component Tests in an environment where our app is deployed along all its dependencies deployed by the teams who own them. This step is fundamental to find out whether the contracts between the various applications are working as expected. I call these either Contract tests or Integration tests. These are usually a subset of our Component tests, plus maybe a few different scenarios added for exploratory testing. Ideally we’d run all Component tests as Integration too, but some cases are likely not to be feasible. Take our previous example: it might not be possible to force our downstream services to return errors.

I’ve been insisting a lot on writing my own tests, but I am happy to leave these to the testers in our team. Why? Simply because writing and maintaining these, is really hard. They require a lot of coordination effort between teams to figure out how to test scenarios and how to generate and maintain all the test data needed to cover these. It requires a lot of specialist skills that I simply do not possess, and they require enough effort to have a dedicated role. I could to it, if needed, and as a matter of fact I did in one of my previous teams, but I feel like it’s not my speciality, so I think that it’s not the best use of my time. Of course, it doesn’t mean that I simply hand off all responsibility to our tester and just walk away. Creating and curating these tests is still a team effort. Discussing what cases can and should be tested is part of preparing the tickets we will pick up during development. It shouldn’t be uncommon to pair with them when issues arise and need fixing. Moreover, if our tester is on holiday, we should know how to look after the tests and understand and investigate failures.

With these tests out the door, the team should now be confident that our app does what it should and cooperates well with all its neighbours. Also, we’ve already deployed to some environment, which means that we also have confidence that our pipeline works well.

The only thing missing is checking that it our app does all of this stuff, and it does it quickly enough. The last step missing is Performance tests (or Non-Functional tests). As the name suggests, these tests send a whole lot of load to my application, to check whether it responds quickly enough, does not get overloaded and starts returning errors, or falls over. Another useful application of these tests, is to have some load running on our application while we deploy it. This gives us confidence to deploy our application to production without having to introduce downtimes. If we know that our deployment process ensures that traffic is always routed to live and healthy instances, we can deploy at any time, knowing there won’t be errors or outages.

Ideally, I’d have two separate runs at this. One is just for our application and for our team. We deploy this to some environment with the same size as production and point our instances to some stub HTTP servers. This is useful to test how our application responds under load when our dependencies get slower than usual (which is easy to do using things like Wiremock or Saboteur).

The other run of these tests is to verify the performance of the system as a whole. This is done in an environment which is the same size as Production, where all teams deploy their applications. The tests are run against the entry points of the system, and load is really distributed throughout. Like for Contract tests, this requires a lot of coordination, so it makes sense to have a dedicated person taking care of this. It might be the same tester taking care of the Contract tests, or a dedicated person. Just as before, these tests are primarily looked after by the tester, but are owned by the whole team. Everyone should know how they work, and be able to maintain them and investigate problems if the tester is unavailable.

With all this done, I can finally sleep easy at night. There are a whole lot of other considerations to be done, such as security and things like ensuring your infrastructure works as expected (e.g. logs are aggregated correctly and in good time, there’s connectivity everywhere it’s expected to be, we can get reliable metrics from our services) and more.

I think there’s a lot more than writing code to produce good software, and a whole team of people are needed to do this. These are the things that I need day to day to be able to say that I’m “done”.