System under control — how to automate integration tests

Published in

Bumble Tech

21 min readApr 1, 2021

Nobody wants to discover a production service laying down and not working, or even failing to behave as expected, with errors and incorrect data processing. On one occasion, we — Bumble Inc, the parent company operating Badoo and Bumble apps — decided to greatly reduce such inappropriate behaviour through the use of integration tests.

In this article, I share not only our experience in developing the framework for integration tests for distributed systems but also the general principles and nuances involved in making one. I will highlight tasks to be solved, pros and cons and also describe solutions. I hope you will get a clear picture of the main steps needed for integration testing, for connecting them together into a common scheme and see that integration testing is not as complicated as it might seem at the first sight. My overall aim is that you will find this article helpful for creating your own framework for integration tests!

I will sometimes refer to implementation in Go as this was the language we have used, but I don’t think this will stop anyone from catching my meaning.

System design

Every system that is growing reaches the point when it can no longer consist of only one service. As monolith is heavy and unable to be infinitely vertically scaled, it is usually divided into a number of services. The subsequent increase in the number of services results in more connections and complicated interactions between system parts. Distributed systems can suffer if a particular service is unavailable, while bugs in any service can affect several others as well. To prevent this, errors need to be identified prior to the production, for example during integration testing.

These were the stages we went through in our department. Not long ago our system became distributed, and we had to provide guarantees that it would work as expected.

I also give you a brief overview of our system. In this article, we consider the system as an example of one to be covered by integration tests. The approaches and actions described are general and will fit any distributed system.

We had one of these important services under our jurisdiction — M. It was the initial source providing information on our users and the interaction with others. M was self-sufficient, extensive, reliable, and was even covered by unit and functional tests.

The service comprised the Front and a few shards (S1…SN):

However, as the number of tasks increased, M was unable to cope on its own as well as it had done previously. This is how it got friends — i.e. other services. We separated some of M’s logic parts and wrapped them into Go-based services (Search and Supervisor), further enhancing them with Kafka and Consul.

This is what it looked like:

In a short time, a simple C-based service transformed into quite a sophisticated structure of 5 components and many more instances. In the end, we finished up with a distributed system and a list of questions on top:

Does functionality comprise several services work?
Will the system operate with a specified configuration?
What happens if one of the services returns an error?
What will the system do if one of the services is unavailable? Will it return an expected error, retry the sending, choose another instance or send a query to it, or will it return cached data?

We knew how the system should work but were our expectations realistic?

Just because it takes a lot of time and money to test things manually, testing can’t be abandoned — errors at the production stage come at a really high price, and investigation of such errors takes a really long time. In a nutshell, what we needed was automation.

But at the same time, there was another bottleneck — we were developing components, which was keeping our testing department wildly busy. This meant that new features were taking longer to be delivered to production.

So, we decided to create automated integration tests for M and Co and so kill two birds with one stone: automatic integration error detection prior to the production stage, and reduced the feature delivery time to production. For this, we were going to set developing integration tests in collaboration with the testing department.

Note: of course, we knew that integration tests came not only with all the obvious benefits but with some equally obvious disadvantages like longer test running times, instability, and writing complexity… But this didn’t put us off.

Requirements

As the decision to develop a “brand new” framework had been made, the next step was to determine its requirements. We compiled the following list:

It needs to be lightweight, i.e. with minimum abstractions and ease of adding new tests.
There needed to be quantifiable time (where possible, short) for test completion. The infrastructure needed to support quick deployment and test running.
The system needed to start in various configurations. The framework had to support the configuration of every service, the starting of various service sets (subsystems) and the running of tests on each of them separately. Going from simple to complicated, once we had made sure a minor subsystem worked as expected, we then had to complicate it, test it again, and so on.
It needed to be Go-based, as this was the language our development team used. We really love it, and our testers mastered it in no time and are currently using it to compile the framework and integration tests.
It needed to run third-party services (like Kafka and Consul). If we use third-party instances in the pre-production stage environment, integration testing can affect its status. This will result in the system behaving in an unstable way that our colleagues won’t expect. In addition, other departments’ actions will also impact the results of our integration tests, and it will take us longer to investigate the crashes. Stability and reproducibility of the tests can be enhanced by isolating the two environments so we wanted to use independent running instances in our testing environment. As a bonus, it made it easier for us to use any service versions and configurations, to test hypotheses faster and avoid agreeing to other departments’ modifications.
It needed to operate with this infrastructure: stopping Kafka/Consul/our services, excluding them from the network or adding them to the network. We needed increased flexibility.
It needed to start on different machines including those of the developers, QA engineer and CI.
Test crashing needed to be able to be reproduced. If a tester sees the test fail on their machine, a developer should be able to reproduce the error at their machine, with minimal effort. We were keen to avoid non-compliance in libraries and dependencies at different machines (including CI services).

We decided to use Docker and wrapped our services into containers — the tests were to establish their own network (Docker network) for every run and add containers to it. This is good isolation for a test environment.

Build the infrastructure

Design layout (bird’s-eye view)

Goals have been set, requirements have been established, now to design the framework. First, we’ll draw the architecture from a broad perspective before we dive deeper into the detail. The framework for integration tests (unlike unit tests) will deploy the infrastructure prior to every test, give access to it while testing, and also ensure its data cleaning afterwards. We needed an opportunity to work with the infrastructure during testing, to implement multiple scenarios and test the system’s performance in different configurations.

The modules of our framework provide everything necessary for data generation, support waiting for query execution, answer checking, work with the infrastructure, and many other things. As the framework is written in Go, we decided to apply Go testing and put the tests into independent files.

To set up and clean the environment, we use the Testify module. It supports creating a suite that will define the following functions:

SetupSuite. Called prior to the run of all tests for this suite. It is here that we arrange the environment.
TearDownSuite. Called after completion of all tests for this suite. It is here that we clean up the infrastructure.
SetupTest. Called prior to every test for the suite. It is here that we can make some local arrangements for the test.
TearDownTest. Called after completion of every test in the suite. Because during a test we can deploy additional services or change configuration of the existing ones, this function is handy for resetting the environment to its state default for the current suite.

Starting in the container

Basically, our test environment consists of services. Let’s see how we can run a service in our infrastructure. According to requirements, we decided to start a service in a container. We use the testcontainers-go module, which in fact is an extension between Docker and our Go-based tests.

We send a query to this module with our service’s characteristic. What we get is the container structure and a full range of options — we can start or stop the container, check its status, include it into/exclude it from the network, and so on. All these things are under the hood of testcontainers-go.

Other programming languages have their modules, too. They can even operate by the same principle.

Operational environment

It’s not enough to run a service in a container. We also need to arrange a testing environment for it.

Firstly, we create a catalogue hierarchy at the host.
Then we copy all the data that our service needs (scripts, files, snapshots, etc.) in the corresponding directories.
Next, we create a default configuration file and add it to this hierarchy.
And finally, we mount the hierarchy root from the host to a Docker container.

Thus, our service has access to all the data arranged. When it starts, it will have a default configuration file and all the necessary scripts (as long as it has them and needs them for operation).

Configuration

Often, a service is run with different settings on servers. That is why just testing default configuration will fail to cover some scenarios. Moreover, a service must be able to be tuned during tests to decrease regress testing time (i.e. fetch data period or timeouts).

We used quite a simple solution here.

At Entrypoint, we set environment variables, arguments, and the prepared configuration file. As soon as the container starts, it runs everything provided at Entrypoint.

After that, the service can be considered configured.

Example:

Service address

So, a service is running in the container. It has a working environment and a certain configuration for testing. How can other services be found?

It’s really easy in a Docker network.

When creating a container, we generate a unique name for it. We use the container name as a hostname.
The ports are known in advance as we already prepared the configuration files at the previous stage and indicated the ports for our services.

We start our test outside the container to reduce the overhead, but they can be run in containers, too — this way the tests will know the addresses of services, as provided above.

If the tests are run at a local machine, they can’t talk to a service using its container name, as addressed by the container name within the Docker network is an abstraction of Docker itself. We need to find the port number at the local host that matches the service port in the Docker network. After starting the container we’ll make the inner port of the service comply with the external port at the local host. And we will use the latter for testing.

External services

Your infrastructure must have some third-party services — for example, databases and the service discovery. Ideally, their configuration should match the one in production. A simple service like Consul with a one-process configuration can be started with testcontainers-go. However, there is no need to agonise in a case with a multi-component server like Kafka with several brokers and a need for ZooKeeper — you can just use Docker Compose.

Usually, integration testing doesn’t require extensive access when dealing with external services, and it makes Docker Compose a convenient thing.

Loading stage

The container is running. Does it mean the service is ready to accept our queries? Generally, no. Many services have an initialization stage, which can take some time. If we don’t wait for the service to load and start testing, the results will be unstable.

What can we do here?

The easiest thing is to use sleep. After starting the container, we wait for a period of time, and once this has elapsed we think the services are ready to operate. This is not a good method, as all the tests are run at different machines and the service loading speeds can increase or decrease.
Open service ports when ready. As soon as the service passes the loading stage and is ready to accept client queries, it opens ports. For the testing environment, this is a signal that tests can be run. However, there is a nuance — when creating a container, Docker instantly opens an external port for the service even if the latter hasn’t yet started to listen on the corresponding internal port within the container. That’s why all the tests will connect at once, and their attempt to read on the connection will result in EOF. When the service opens an internal port, the test framework will be able to send a query, and only then we can consider the service ready to operate.
Request the service status. In reply to the status request the service instantly opens its ports and returns “Ready” when loaded and “Not ready” if not. In our tests, we’re going to request the service status from time to time, and as soon as it returns “Ready,” we can skip straight to the testing stage.
Register in a third-party service or database. We register services in Consul. These are things that can help:

As soon as the service appears in Consul, it is ready. The service status can be followed by means of a blocking query with a timeout. As soon as the service is registered, Consul returns information about the service status change.
The service status can be analysed through control of its check. The framework for integration testing receives information about the new service from Consul (as just described) and starts following its status changes. As soon as the statuses of all service checks change to “passing” the service can be considered to be ready to operate.

Approaches 2 and 3 above imply the existence of certain recurring operations until the condition is fulfilled. Such operations have a standby phase in between. It is shorter than when Approach 1 is used, and in this way there’s no need to depend on the operation of a specific machine, nor do we have to keep up with the service loading speed.

However, in all four approaches above, the waiting period for a service to become ready is limited by the maximal permissible time of its starting in any environment.

Starting all services

We’ve reviewed how to achieve a state of readiness for a service to operate, how to start it and check if it’s ready for operation. We know how to run our own and third-party services and know service addresses both within the test environment and from the tests.

In which sequence should we start our services? A perfect option is to avoid austere sequences. This way, we can start services in parallel and, essentially, cut times for the infrastructure establishment (container starting time plus service loading time). The fewer the dependencies, the easier it is to add a new service to the infrastructure.

Every service needs to be ready so that after it starts it requires no third-party services it requires in the test environment. That’s why every service must know how to wait for them to appear. Of course, we need to leave out the deadlocks when Service A and Service B are waiting for each other to become available. In such cases, problems can also occur at the production stage.

Infrastructure usage

At testing

When running tests, we really feel like getting into our infrastructure and taking some time to play with it. If not now, then when?

Modifying the service configuration

For this, we need to stop a service, configure it as we have done at the infrastructure setup stage, and then run it. It should be remembered that any configuration change makes things longer because of the overhead, caused by a double start — when the configuration is changed during the test and when it is rolled back to the previous configuration at the end of the test. It is worth thinking twice if we really feel like modifying the service config at the moment. It might be a good idea to group the tests for the same system configuration under a single suite.

Adding a new service

Adding new services has become easy as pie for us. We’ve learned how to create services at the infrastructure setup stage. Here, the scenario is just the same — we arrange an environment for a new service, run a container and use it for testing.

Working with the network

Adding containers into/excluding them from the network, container operation pausing and unpausing along with iptables help us to simulate network errors and check how the system reacts to them.

Post testing

If we add a new service within one particular test, we don’t need to pass it on to the next test — we have to be polite. It is the same with data. Tests can be run at random — and must not affect each other. The test runs must be reproducible.

If the service config has been changed, we roll it back to the previous (default) config.
If a new service has been added, we remove it.
If any changes have been made in the network (iptables, container pausing, etc.), we delete them.
If any data has been added or modified, we implement the cleaning procedure. Here, a mechanism is essential that would determine its finalisation — just to make sure everything is completed properly. For instance, if we need to clear data at a third-party database service, sending a query for deletion is not enough. We need to make sure it has been implemented (not got stuck in the queue while another test has been started) and has addressed the data that is to be deleted any moment.

After completing the test suite all the services within the infrastructure cease their operation, containers are killed, and the test network is dismissed. If the test suite hasn’t finished before the timeout expires or where there is an error, we do similarly. The infrastructure remains only when the framework receives a clear indication to retain the containers after the test run (like for debugging).

Debugging

Yay! We’ve learned to arrange the infrastructure and run the tests. Seeing the first integration results is nice, but things can be really different. It is time to deal with crashes. The first thing that comes to a tester’s mind is to take a look at the service logs.

Let’s say that a suite contains an immense number of tests, and one of them hasn’t received the expected answer from the service. Somewhere in the bowels of the service log, we have somehow to find the piece that matches the timing of the crashed test. There is a handy and easy tool for doing that — markers.

Firstly, we add the “log_notice” command to the service and, receiving it, it records to its log the message from the query.
Prior to running the test, we send “log_notice” comprising the test name to all the services running. And we repeat the same thing once the test is complete.

Now, we have markers within the log, and we can easily restore the course of events and reproduce the service’s behaviour as required.

What if a service hasn’t been able to start and hasn’t managed to make a record to the log? Well, it will have recorded some additional info to stderr/stdout. The “docker logs” command facilitates obtaining data from standard input-output flows, and this can help us see what has happened.

Let’s say that data from the log is insufficient for localising an error. The time has come to turn to some more serious methods!

If we set the framework configuration to save the infrastructure after running all the tests within the suite, we’ll get full access to the system. We will be able to check the server status, obtain data from it, send various queries, analyse service files on the disk, as well as use gdb/strace/tcpdump and profiling. Then, we’ll form a hypothesis, recompile the image, run the tests and identify the root of the problem iteratively.

For adjustment not to be a stressful bug-catching thing, tests have to be as reproducible as possible. For instance, if the data is generated randomly when there is an error we need to get information on the seed and/or on the data requested.

Acceleration

Nobody really feels like waiting for years for integration tests to complete. But, on the other hand, you can always have a cup of coffee and get on with other interesting things at that time.

What can be done to speed up the testing?

We can group read-only tests and start them in parallel under a single test (this is really easy to do in Go, thanks to goroutines). These tests will work with an isolated data set.
We can offer an extensive service configuration — this way, in the tests we’ll be able to set smaller timeout values, which in turn will cut the testing times.
Services can be run in the necessary and sufficient configuration. For example, if in some instances at the production stage the service is run with 4 shards, whereas a certain test only needs the fact of multi-shardedness, just 2 shards will do.
Several testing infrastructures can be run at the same time (providing the resources permit). In fact, this is a parallel test suite running.
Containers can be reused.
We can ask ourselves if a container really needs a new service or whether a mock will suffice. And the mocks aren’t interfacing mocks we use in unit testing, but independent servers. A mock pretends to be one of our services and knows how to follow its protocol. Other services running in the current testing infrastructure can’t tell it apart from an original service. A mock lets us set a behaviour pattern for a real service without actually running it in the container.

At testing, we run a mock at a certain address. The services already running in the current infrastructure will recognise this address through the config or the service discovery (in our case, Consul) and will be able to send queries to it.

The mock will receive the query and call the handler we’ve set for it. This is how this piece of the Go code will look in the test:

handler := func(request *serviceProto.Message) mock.Result {
    statsRequest, ok := request.(*serviceProto.StatsRequest)
    // Checking request
    return mock.Result{
        Msg: PrepareResponse(statsRequest),
        Action: mock.ActionWriteResponse,
    }
}serviceMock.Start(listenAddr, serviceProto, handler)defer serviceMock.Stop()

The handler from our example believes that it has received a statistics query and processes it in accordance with the test logic, then prepares an answer and instructs the server on the required action — if the answer is to be sent instantly or with a delay; if it is not to be sent at all; if the connection is to be closed.

Control over the server’s activities, be it tearing the connection down or slowing the sending, affords an additional option of checking how the tested services react to the network malperformance. The server fulfils the requested actions, packs an answer from the handler and sends it to the client. As soon as the test is completed, the mock (the server) is deleted by the defer function.

We use mocks for all our services because they really help save us time at testing.

Implementation

For those who are curious about implementation details, here they are!

Our framework is located in the same repository with the tested services — in an independent “autotests” directory, which comprises several modules:

“Service” facilitates the necessary setting of every service — for its running, stopping, configuring, or obtaining information on its data.

“Mock” contains an implementation of a mock server for every non-third-party service.

“Suite” comprises overall implementation. It knows how to work with services, how to await loading, how to check service performance and much more.

“Environment” stores data on the current test environment (services running) and is responsible for the network.

There are also some auxiliary modules and ones helping with data generation.

Besides the framework modules, we had 21 test suites available at the moment for M service when this article was compiled, in particular, the smoke test suite. Each of them was able to create its own infrastructure with the necessary service set. Tests were stored in the files within the test suite. We have about ~1980 tests for M service and it takes about 1h to build binaries, create containers and run tests (the test phase lasts for about 54 minutes).

Starting a specific test suite takes something like:

go test -count=1 -race -v ./testsuite $(TESTS_FLAGS) -autotests.timeout=15m

As we wanted to transfer the services of our colleagues from other departments to our framework, the core functionality of the framework was located in the general core repository.

QA

How do testers use the integration framework? They don’t have to deploy all the services manually. Integration tests will do that for them and help to establish the necessary infrastructure. If there is no suite compiled for the infrastructure planned, they quickly add it themselves.

Once the testing environment is all set up, QA engineers implement the most complex scenarios in a test. While working, they have access to all the service logs and files, which is handy when making adjustments and understanding what the system is experiencing.

Apart from checking how tests run with a certain code section, they can also indicate certain service versions and run integration for them.

To speed things up, our developers write positive tests immediately, and then the testers take on the more complex cases. This is how collaborative development of the tests and the framework looks like.

Continuous integration

We wanted to run integration tests automatically every time a service is built.

Embedding integration tests into a continuous integration process (CI) turned out to be a doddle. We use TeamCity, and the framework code is located in the same repository as the service code. Firstly, the services are collected and the images compiled, then the framework is built, and finally, it is started.

We’ve shown TeamCity how to use the testing framework output to check which tests have already been completed, and which haven’t. After the run is over, it shows the number and list of the failed tests. Data from all the services after running of every suite are stored and posted on TeamCity as the artefacts of a certain build and run.

Summing up

Here are the results of all the work.

Life has got easier. Fewer integration issues now live to see the production stage, which results in more stable production.
We’ve learned to start diverse infrastructure and cover more scenarios in less time.
We work with the infrastructure during testing. This way, we get more opportunities for the implementation of various test cases.
We catch more bugs at the development stage. Developers themselves write positive scenarios, immediately finding some of the bugs and eliminating them. The bugs’ round trips have become shorter.
Testers no longer have to compile positive cases. QA engineers can focus on more complex scenarios.
No more blocking at the testing stage, when tasks for various services are developed in parallel, all redirected to QA engineers at once.
We compiled MVP for the integration testing framework quickly, in a couple of weeks, because the task turned out not to be too labour-intensive.
We’ve been using this framework for more than a year.

In general, the framework saves us some time and we feel more confident. We keep enhancing it and expanding its scope, adding integration tests for the company’s other services.

However, integration testing comes with certain disadvantages which need to be taken into consideration, too.

Longer test running. The systems are sophisticated, and the queries are executed in several services.
Instability, as the system is built of asynchronous components. This can and should be dealt with, and we’re working on that, too — our share of unstable tests is approaching zero.
Test writing complexity. You need to understand how a system operates in general, what its expected behaviour is and how it can be crashed.
Infrastructure in the tests and at the production stage is subject to contradiction. If not all the services are in containers at the prod, the test environment can’t 100% match the production stage. In fact, some of our services at the prod are not in containers, but until now we haven’t had any issues with their testing in containers.

The main question you have to ask is whether you really require a framework for integration tests.

If the number of services in your project is continuing to grow or has already done so, if connections between them keep multiplying and you need to automate the testing procedure, then it can be a good idea to implement integration tests.

Hopefully, this article has given you an insight into the challenges encountered on this path, as well as into some methods of dealing with them.

Good luck!