Pipelines and tests

A while back I received an email from a friend, that went like this:

We are implementing automated tests and want to put them in our CI/CD pipelines. How do you suggest we proceed?

Since automated tests and CI/CD pipelines are fairly broad terms, I’m taking a step back here and will talk about different ways these two can be combined.

Build pipelines

Besides doing the obvious (building the code), a build pipeline is supposed to run sanity checks in your application, for quick feedback on issues.

A classic build pipeline will:

  • Build your code. Scripting languages will usually still have a build step, focused on minification;
  • Run static code analysis. This can be before, during or after a build, depending on your stack;
  • Run unit tests (more on that below);
  • Tag your repo, so you can trace your build/commit history;
  • Push the build artifacts (eg: binaries, images, documents) to a registry;
  • Report relevant metrics (eg: test results, warnings) to a tracking system;

Since the question is about tests, what should we run at this step then? The answer is everything you can stomach. But not more than that :)

In a couple of previous posts I argued that there are cases in which you can make functional/system tests the base of your strategy, which is great as long as they are fast.

So, the tests to execute in this pipeline should be:

  • Fast: developers are waiting on the result of those tests. And we don’t like waiting :)
  • Locally runnable: subtle criteria that can hamper integration. If developers cannot run their tests from their machines, there is no way for them to do proper TDD. Even if that is not your cup of tea, having to push code and wait for build results is a considerable distraction.
  • Relevant: if a test is not fully relevant (see this post for more), then it is detracting from your dev speed (need to maintain it, need to wait for it to execute). Consider deleting these tests or, if you can’t, move them to a slower pipeline (more on that below).
  • Consistent: flaky tests are a bane in the CI world and end up being ignored by devs (“oh, that thing broke again? just re-run the build and it will eventually pass”). While flakiness is a problem in any automated test, this is an absolute nonono at this stage.

But what if you have tests that fail to match one of the criteria above? If you absolutely need them (eg: your architecture forces you to have this really heavy setup, which kills part of your speed), then consider multiple build pipelines, one focused on CI (fast builds for fast feedback), the other focused on completeness (full tests, create build artifacts, etc). In this case, your deployment pipelines will use the output of this secondary pipeline, which is usually implemented as a nightly build.

Deployment/release pipelines

From a high-level perspective, this is what a deployment/release pipeline usually looks like:

  • Pull the build artifacts from the registry. Yes, this is where we get the output of the build pipeline.
  • Deploy them in the target environment. By far the most complex step in this flow, luckily for me not the focus of today’s post.
  • Tag the code again. This tag lets us know what got deployed where with a simple look in our source code.
  • Run health checks and tests. More about this below.
  • Monitor it. Like, monitor it real up close;

What sort of tests a team should run here?

  • Non data-impacting tests: tests that leave debris behind or that modify/delete existing data are not good candidates here. This restriction can mean a risk to your test relevance (eg: how can you check your upload functionality is working well?), but there are ways to mitigate that: you may be able to use a sample data set, even in prod (as in “here’s a sample blog post for you to play with”), you may have a test rollback strategy (risky business, so best used as a last resort) or you can have some prod data you saved just for this occasion.
  • Functional tests: if you are still doing a lot of mocking (seriously, dude, WTF?), this is not an option at this stage, as you now want to assert your system is up and running properly. You may stub some external services, in case you are deploying to a dev environment…
  • Front-end tests: while asserting your back-end is all nice and fine, this isn’t usually what the user sees, is it? So while some back-end tests may still make it into the final mix, the focus should be on tests that start at the UI level.
  • Contract tests: if you are exposing an API, you should consider using consumer-driven contracts. And while those tests can and should run in the build pipeline, this is also a good moment to assert your API is not breaking any consumer expectations (and avoid the release in case it does).
  • Exploratory tests: your system is up and running, for realz, and now it’s the moment to pull of the gloves and run the nastiest tests you can: those formulated by someone who’s really looking to break your system. Nope, this is not a suggestion for a quality last approach, but the recognition that some creative testing may go a long way in catching unthought-of scenarios.
  • UX tests: ok, things are running, but do they fit nicely? Did all those design decisions your team made work out? This is the moment to find out and leveraging practices such as private betas can go a long way into gathering real world feedback on the experience your system provides.

Please notice that contract, exploratory and UX tests are better done before a full-blown release (or be prepared to face the backslash). This can be achieved by either deploying first to a non-prod environment or a combination of techniques such as rolling deployments, A/B testing and feature toggles.

For the sake of simplicity, I merged the deployment and release pipelines, as many teams tend to do it. We might explore the difference between these two in a later post.