Reducing Serverless Integration Test Runtime on CI — From 40 minutes to 7

Rob Cronin
Apr 10 · 6 min read

Some napkin maths: If a 3 dev team ran their tests 5 times per ticket for 2 tickets a day, then every CI minute costs 130 dev hours per year

Image for post
Image for post

Recently I was working on a Serverless project which ran Integration tests on its microservices in its CI pipeline.

In particular we ran BDD (Behaviour Driven Development) tests which for those unfamiliar might look like this.

Scenario: I update the job of a user
Given a set of users
| name | job |
| Rob | Developer |
| Simba | Outcast |
| Woody | Friend |
When I update the “job” of the user named “Simba” to “King”
And I fetch the list of users
Then I will see a list with the following attributes
| name | job |
| Rob | Developer |
| Simba | King |
| Woody | Friend |

These scenarios were written in a business friendly language where each instruction was tied to a function that carried out the logic it expressed.

The tests ran against an actual environment and made real calls to our service.

Therefore, we needed a new environment every time we ran our tests to avoid concurrent tests making conflicting calls (e.g. one test deletes a user while another expects to see that user).

Luckily with a Serverless approach creating new environments is straight forward.

While these tests are valuable, the common counter argument is that they take too much time to setup, seed and run.

Before and After

We ran into this issue on our project where our Integration tests were taking over 40 minutes to run on our CI.

The negatives of this are obvious: backend features took significantly longer, a flaky test could cost us hours, code reviews became a bit laxer to avoid re running the tests etc.

So we tackled this problem and managed to reduce our BDD Integration test time from roughly 40 minutes to under 7.

CI Flow before optimisation
Image for post
Image for post
CI Flow post optimisation

We used a few different tactics to achieve this

  • Reusing existing environments (~17 mins saved in deploy/teardown)
  • Parallelising the BDD tests (~15 mins saved in test)
  • Config Tweaks (~2 mins saved overall)
Image for post
Image for post
Breakdown of CI time before and after

Reusing existing environments/stacks (~17 mins saved in deploy/teardown)

The creation of our stacks was taking up to 18 mins per CI build. Granted this is an above average create time as we were assigning a custom domain name to each of our stacks (which has other benefits).

Regardless, the creation of a new stack takes significantly longer than an update to an existing stack, indicating we could save time by reusing stacks.

Lock Table of Available Stacks

Image for post
Image for post
DynamoDB Table of Available Stacks

The number of builds running at any one time would be subject to change so in order to be flexible we created a “lock” table of our stacks. In essence, it would allow you to claim a stack that is currently available and lock it to prevent another build from also using it. This ended up saving us roughly 17 minutes per CI build.

How it works

Image for post
Image for post
https://github.com/robcronin/bdd-lock-table

You deploy the Serverless microservice with 4 lambdas and DynamoDB table from this repo bdd-lock-table. This table will store your available stacks.

When your CI runs and needs a stack, it checks if there are any available stacks for its repo (by calling a lambda on the /get-available route)

If there is:

  • It claims a stack (/claim-stack) and marks it as not available (using a transact write to ensure only one job can claim a stack at a time)
  • It then updates the stack with its code and runs its tests
  • After (success or fail) it marks the stack as available again (/release-stack)

If not:

  • It creates a new entry in the table with a random stack name (/create-stack)
  • It creates a new stack with this name and runs its tests
  • After (success or fail) it marks the stack as available so that future builds can also use this new stack (/release-stack)

After your CI has run for a few days/weeks with normal development, it should have experienced a busy period and therefore have enough stacks created (which are cheap with Serverless) that any new job can always claim an existing stack.

The deploy required for any Integration test will now nearly always be an update deploy rather than a create deploy.

Implementing it for your project

I’ve created two repos with more information about how to set this up.

  • The code to create your own lock table service can be found in the bdd-lock-table repo. It contains a postman collection and also a sample script to run in your CI build which will execute the above logic
  • An example setup of a repo that uses the table in its CI can be found in the sls-bdd-python-optimised-ci repo

Parallelising the BDD tests (~15 mins saved in test)

As our service grew over time, so did the length of time the actual tests took to run and so we looked to parallelising these tests.

Parallelisation of any job running on CI can be a quick win, not just for long running integration tests.

Implementing in CircleCI

We were using CircleCI which provides a good toolset to enable parallelisation. The “parallelism” keyword will run the same job on x number of containers.

Note: it will depend on your plan whether this is available to you (see my napkin maths at the top of this article to see if the upgraded plan is worth it to you).

BDD-Tests:
docker:
— image: circleci/python:3.7.4-node
parallelism: 10
steps:

— run:
name: Run BDD tests
command: ./ci-bdd.sh

CircleCI provides a “tests” command which will deal with ensuring each container runs different tests.

First, it allows you to specify a glob pattern to determine which files are relevant for testing.

There are a number of provided methods for splitting the files among the containers. We chose the “— split-by=filesize” method since file size is generally a good indicator for length of time a BDD test will take.

If you had a “yarn test:bdd” command you can then run the following in your “ci-bdd.sh” script which is run by each container but each one would then run different tests:

circleci tests glob “features/**/*.feature” | circleci tests split — split-by=filesize | yarn test:bdd

In our case we had 10 parallel jobs which would each claim an existing stack, do an update deploy, run 10% of the tests and then release the stack.

Image for post
Image for post
Time taken for each parallel container

This cut the deploy and test part of our pipeline from 20 minutes to about 6.

Conclusion

We have cut the time to run tests on our Pull Requests from about 40 minutes to 7 minutes.

If each developer did one backend ticket a day this would save each of them about 15–20 working days a year!

We had an extreme case but in most cases, these two methods can shave valuable minutes off your CI pipeline.

Try them out to see how they help you

Serverless Transformation

Tools, techniques, and case studies of using serverless to…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store