System under control — how to automate integration tests

Kseniya Yakil
Apr 1 · 21 min read

Nobody wants to discover a production service laying down and not working, or even failing to behave as expected, with errors and incorrect data processing. On one occasion, we — Bumble, the parent company operating Badoo and Bumble apps — decided to greatly reduce such inappropriate behaviour through the use of integration tests.

In this article, I share not only our experience in developing the framework for integration tests for distributed systems but also the general principles and nuances involved in making one. I will highlight tasks to be solved, pros and cons and also describe solutions. I hope you will get a clear picture of the main steps needed for integration testing, for connecting them together into a common scheme and see that integration testing is not as complicated as it might seem at the first sight. My overall aim is that you will find this article helpful for creating your own framework for integration tests!

I will sometimes refer to implementation in Go as this was the language we have used, but I don’t think this will stop anyone from catching my meaning.

System design

Every system that is growing reaches the point when it can no longer consist of only one service. As monolith is heavy and unable to be infinitely vertically scaled, it is usually divided into a number of services. The subsequent increase in the number of services results in more connections and complicated interactions between system parts. Distributed systems can suffer if a particular service is unavailable, while bugs in any service can affect several others as well. To prevent this, errors need to be identified prior to the production, for example during integration testing.

These were the stages we went through in our department. Not long ago our system became distributed, and we had to provide guarantees that it would work as expected.

I also give you a brief overview of our system. In this article, we consider the system as an example of one to be covered by integration tests. The approaches and actions described are general and will fit any distributed system.

We had one of these important services under our jurisdiction — M. It was the initial source providing information on our users and the interaction with others. M was self-sufficient, extensive, reliable, and was even covered by unit and functional tests.

The service comprised the Front and a few shards (S1…SN):

However, as the number of tasks increased, M was unable to cope on its own as well as it had done previously. This is how it got friends — i.e. other services. We separated some of M’s logic parts and wrapped them into Go-based services (Search and Supervisor), further enhancing them with Kafka and Consul.

This is what it looked like:

In a short time, a simple C-based service transformed into quite a sophisticated structure of 5 components and many more instances. In the end, we finished up with a distributed system and a list of questions on top:

  • Does functionality comprise several services work?

We knew how the system should work but were our expectations realistic?

Just because it takes a lot of time and money to test things manually, testing can’t be abandoned — errors at the production stage come at a really high price, and investigation of such errors takes a really long time. In a nutshell, what we needed was automation.

But at the same time, there was another bottleneck — we were developing components, which was keeping our testing department wildly busy. This meant that new features were taking longer to be delivered to production.

So, we decided to create automated integration tests for M and Co and so kill two birds with one stone: automatic integration error detection prior to the production stage, and reduced the feature delivery time to production. For this, we were going to set developing integration tests in collaboration with the testing department.

Note: of course, we knew that integration tests came not only with all the obvious benefits but with some equally obvious disadvantages like longer test running times, instability, and writing complexity… But this didn’t put us off.

Requirements

As the decision to develop a “brand new” framework had been made, the next step was to determine its requirements. We compiled the following list:

  • It needs to be lightweight, i.e. with minimum abstractions and ease of adding new tests.

We decided to use Docker and wrapped our services into containers — the tests were to establish their own network (Docker network) for every run and add containers to it. This is good isolation for a test environment.

Build the infrastructure

Design layout (bird’s-eye view)

Goals have been set, requirements have been established, now to design the framework. First, we’ll draw the architecture from a broad perspective before we dive deeper into the detail. The framework for integration tests (unlike unit tests) will deploy the infrastructure prior to every test, give access to it while testing, and also ensure its data cleaning afterwards. We needed an opportunity to work with the infrastructure during testing, to implement multiple scenarios and test the system’s performance in different configurations.

The modules of our framework provide everything necessary for data generation, support waiting for query execution, answer checking, work with the infrastructure, and many other things. As the framework is written in Go, we decided to apply Go testing and put the tests into independent files.

To set up and clean the environment, we use the Testify module. It supports creating a suite that will define the following functions:

  • SetupSuite. Called prior to the run of all tests for this suite. It is here that we arrange the environment.

Starting in the container

Basically, our test environment consists of services. Let’s see how we can run a service in our infrastructure. According to requirements, we decided to start a service in a container. We use the testcontainers-go module, which in fact is an extension between Docker and our Go-based tests.

We send a query to this module with our service’s characteristic. What we get is the container structure and a full range of options — we can start or stop the container, check its status, include it into/exclude it from the network, and so on. All these things are under the hood of testcontainers-go.

Other programming languages have their modules, too. They can even operate by the same principle.

Operational environment

It’s not enough to run a service in a container. We also need to arrange a testing environment for it.

  • Firstly, we create a catalogue hierarchy at the host.

Thus, our service has access to all the data arranged. When it starts, it will have a default configuration file and all the necessary scripts (as long as it has them and needs them for operation).

Configuration

Often, a service is run with different settings on servers. That is why just testing default configuration will fail to cover some scenarios. Moreover, a service must be able to be tuned during tests to decrease regress testing time (i.e. fetch data period or timeouts).

We used quite a simple solution here.

At Entrypoint, we set environment variables, arguments, and the prepared configuration file. As soon as the container starts, it runs everything provided at Entrypoint.

After that, the service can be considered configured.

Example:

Service address

So, a service is running in the container. It has a working environment and a certain configuration for testing. How can other services be found?

It’s really easy in a Docker network.

  • When creating a container, we generate a unique name for it. We use the container name as a hostname.

We start our test outside the container to reduce the overhead, but they can be run in containers, too — this way the tests will know the addresses of services, as provided above.

If the tests are run at a local machine, they can’t talk to a service using its container name, as addressed by the container name within the Docker network is an abstraction of Docker itself. We need to find the port number at the local host that matches the service port in the Docker network. After starting the container we’ll make the inner port of the service comply with the external port at the local host. And we will use the latter for testing.

External services

Your infrastructure must have some third-party services — for example, databases and the service discovery. Ideally, their configuration should match the one in production. A simple service like Consul with a one-process configuration can be started with testcontainers-go. However, there is no need to agonise in a case with a multi-component server like Kafka with several brokers and a need for ZooKeeper — you can just use Docker Compose.

Usually, integration testing doesn’t require extensive access when dealing with external services, and it makes Docker Compose a convenient thing.

Loading stage

The container is running. Does it mean the service is ready to accept our queries? Generally, no. Many services have an initialization stage, which can take some time. If we don’t wait for the service to load and start testing, the results will be unstable.

What can we do here?

  1. The easiest thing is to use sleep. After starting the container, we wait for a period of time, and once this has elapsed we think the services are ready to operate. This is not a good method, as all the tests are run at different machines and the service loading speeds can increase or decrease.
  • As soon as the service appears in Consul, it is ready. The service status can be followed by means of a blocking query with a timeout. As soon as the service is registered, Consul returns information about the service status change.

Approaches 2 and 3 above imply the existence of certain recurring operations until the condition is fulfilled. Such operations have a standby phase in between. It is shorter than when Approach 1 is used, and in this way there’s no need to depend on the operation of a specific machine, nor do we have to keep up with the service loading speed.

However, in all four approaches above, the waiting period for a service to become ready is limited by the maximal permissible time of its starting in any environment.

Starting all services

We’ve reviewed how to achieve a state of readiness for a service to operate, how to start it and check if it’s ready for operation. We know how to run our own and third-party services and know service addresses both within the test environment and from the tests.

In which sequence should we start our services? A perfect option is to avoid austere sequences. This way, we can start services in parallel and, essentially, cut times for the infrastructure establishment (container starting time plus service loading time). The fewer the dependencies, the easier it is to add a new service to the infrastructure.

Every service needs to be ready so that after it starts it requires no third-party services it requires in the test environment. That’s why every service must know how to wait for them to appear. Of course, we need to leave out the deadlocks when Service A and Service B are waiting for each other to become available. In such cases, problems can also occur at the production stage.

Infrastructure usage

At testing

When running tests, we really feel like getting into our infrastructure and taking some time to play with it. If not now, then when?

  • Modifying the service configuration

For this, we need to stop a service, configure it as we have done at the infrastructure setup stage, and then run it. It should be remembered that any configuration change makes things longer because of the overhead, caused by a double start — when the configuration is changed during the test and when it is rolled back to the previous configuration at the end of the test. It is worth thinking twice if we really feel like modifying the service config at the moment. It might be a good idea to group the tests for the same system configuration under a single suite.

  • Adding a new service

Adding new services has become easy as pie for us. We’ve learned how to create services at the infrastructure setup stage. Here, the scenario is just the same — we arrange an environment for a new service, run a container and use it for testing.

  • Working with the network

Adding containers into/excluding them from the network, container operation pausing and unpausing along with iptables help us to simulate network errors and check how the system reacts to them.

Post testing

If we add a new service within one particular test, we don’t need to pass it on to the next test — we have to be polite. It is the same with data. Tests can be run at random — and must not affect each other. The test runs must be reproducible.

  • If the service config has been changed, we roll it back to the previous (default) config.

After completing the test suite all the services within the infrastructure cease their operation, containers are killed, and the test network is dismissed. If the test suite hasn’t finished before the timeout expires or where there is an error, we do similarly. The infrastructure remains only when the framework receives a clear indication to retain the containers after the test run (like for debugging).

Debugging

Yay! We’ve learned to arrange the infrastructure and run the tests. Seeing the first integration results is nice, but things can be really different. It is time to deal with crashes. The first thing that comes to a tester’s mind is to take a look at the service logs.

Let’s say that a suite contains an immense number of tests, and one of them hasn’t received the expected answer from the service. Somewhere in the bowels of the service log, we have somehow to find the piece that matches the timing of the crashed test. There is a handy and easy tool for doing that — markers.

  • Firstly, we add the “log_notice” command to the service and, receiving it, it records to its log the message from the query.

Now, we have markers within the log, and we can easily restore the course of events and reproduce the service’s behaviour as required.

What if a service hasn’t been able to start and hasn’t managed to make a record to the log? Well, it will have recorded some additional info to stderr/stdout. The “docker logs” command facilitates obtaining data from standard input-output flows, and this can help us see what has happened.

Let’s say that data from the log is insufficient for localising an error. The time has come to turn to some more serious methods!

If we set the framework configuration to save the infrastructure after running all the tests within the suite, we’ll get full access to the system. We will be able to check the server status, obtain data from it, send various queries, analyse service files on the disk, as well as use gdb/strace/tcpdump and profiling. Then, we’ll form a hypothesis, recompile the image, run the tests and identify the root of the problem iteratively.

For adjustment not to be a stressful bug-catching thing, tests have to be as reproducible as possible. For instance, if the data is generated randomly when there is an error we need to get information on the seed and/or on the data requested.

Acceleration

Nobody really feels like waiting for years for integration tests to complete. But, on the other hand, you can always have a cup of coffee and get on with other interesting things at that time.

What can be done to speed up the testing?

  • We can group read-only tests and start them in parallel under a single test (this is really easy to do in Go, thanks to goroutines). These tests will work with an isolated data set.

At testing, we run a mock at a certain address. The services already running in the current infrastructure will recognise this address through the config or the service discovery (in our case, Consul) and will be able to send queries to it.

The mock will receive the query and call the handler we’ve set for it. This is how this piece of the Go code will look in the test:

handler := func(request *serviceProto.Message) mock.Result {
statsRequest, ok := request.(*serviceProto.StatsRequest)
// Checking request
return mock.Result{
Msg: PrepareResponse(statsRequest),
Action: mock.ActionWriteResponse,
}
}
serviceMock.Start(listenAddr, serviceProto, handler)defer serviceMock.Stop()

The handler from our example believes that it has received a statistics query and processes it in accordance with the test logic, then prepares an answer and instructs the server on the required action — if the answer is to be sent instantly or with a delay; if it is not to be sent at all; if the connection is to be closed.

Control over the server’s activities, be it tearing the connection down or slowing the sending, affords an additional option of checking how the tested services react to the network malperformance. The server fulfils the requested actions, packs an answer from the handler and sends it to the client. As soon as the test is completed, the mock (the server) is deleted by the defer function.

We use mocks for all our services because they really help save us time at testing.

Implementation

For those who are curious about implementation details, here they are!

Our framework is located in the same repository with the tested services — in an independent “autotests” directory, which comprises several modules:

“Service” facilitates the necessary setting of every service — for its running, stopping, configuring, or obtaining information on its data.

“Mock” contains an implementation of a mock server for every non-third-party service.

“Suite” comprises overall implementation. It knows how to work with services, how to await loading, how to check service performance and much more.

“Environment” stores data on the current test environment (services running) and is responsible for the network.

There are also some auxiliary modules and ones helping with data generation.

Besides the framework modules, we had 21 test suites available at the moment for M service when this article was compiled, in particular, the smoke test suite. Each of them was able to create its own infrastructure with the necessary service set. Tests were stored in the files within the test suite. We have about ~1980 tests for M service and it takes about 1h to build binaries, create containers and run tests (the test phase lasts for about 54 minutes).

Starting a specific test suite takes something like:

go test -count=1 -race -v ./testsuite $(TESTS_FLAGS) -autotests.timeout=15m

As we wanted to transfer the services of our colleagues from other departments to our framework, the core functionality of the framework was located in the general core repository.

QA

How do testers use the integration framework? They don’t have to deploy all the services manually. Integration tests will do that for them and help to establish the necessary infrastructure. If there is no suite compiled for the infrastructure planned, they quickly add it themselves.

Once the testing environment is all set up, QA engineers implement the most complex scenarios in a test. While working, they have access to all the service logs and files, which is handy when making adjustments and understanding what the system is experiencing.

Apart from checking how tests run with a certain code section, they can also indicate certain service versions and run integration for them.

To speed things up, our developers write positive tests immediately, and then the testers take on the more complex cases. This is how collaborative development of the tests and the framework looks like.

Continuous integration

We wanted to run integration tests automatically every time a service is built.

Embedding integration tests into a continuous integration process (CI) turned out to be a doddle. We use TeamCity, and the framework code is located in the same repository as the service code. Firstly, the services are collected and the images compiled, then the framework is built, and finally, it is started.

We’ve shown TeamCity how to use the testing framework output to check which tests have already been completed, and which haven’t. After the run is over, it shows the number and list of the failed tests. Data from all the services after running of every suite are stored and posted on TeamCity as the artefacts of a certain build and run.

Summing up

Here are the results of all the work.

  • Life has got easier. Fewer integration issues now live to see the production stage, which results in more stable production.

In general, the framework saves us some time and we feel more confident. We keep enhancing it and expanding its scope, adding integration tests for the company’s other services.

However, integration testing comes with certain disadvantages which need to be taken into consideration, too.

  • Longer test running. The systems are sophisticated, and the queries are executed in several services.

The main question you have to ask is whether you really require a framework for integration tests.

If the number of services in your project is continuing to grow or has already done so, if connections between them keep multiplying and you need to automate the testing procedure, then it can be a good idea to implement integration tests.

Hopefully, this article has given you an insight into the challenges encountered on this path, as well as into some methods of dealing with them.

Good luck!

Bumble Tech

This is the Bumble tech team blog focused on technology and…