A monolithic architecture for our clients’ hundreds of versions: how we write and support tests

Published in

Bumble Tech

18 min readMay 15, 2019

Like any developer, I really enjoy attending conferences and reading articles. It keeps me up-to-date with what’s going on in software development, as well as be in touch with the tech community. So last December when I was invited to speak at the Highload conference — Russia’s largest professional conference for high-load systems developers — I decided that it was time that I also shared my talk as an article, to do my bit for the community.

This post will be useful to those who themselves write tests for the backend and who encounter problems when testing legacy code, as well as for those wanting to test complex business logic.

What’s this post about? Firstly, I’ll give a brief overview of the Badoo development process and its impact on our need for tests and our desire to write such tests. I’ll then move on to look at the ‘pyramid’ of automated testing, and the kinds of tests we use, talk about the tools used in each and about the problems we resolve using such tools. Finally, I’ll explore how we support and run all this stuff.

Our development process

Here’s an illustration of our development process:

The golfer represents a backend developer. At a given moment in time they receive a task for development, usually in the form of two documents, namely a brief from the business side and a technical document describing changes to our protocol around interaction between the backend and the clients (mobile apps and site).

The developer writes code and puts it into production, and, what’s more, they do so earlier than all other client applications. All functionality is protected by means of feature flags or А/В tests; this is covered in the technical document. Next, depending on current priorities and the product round-up, client applications are released. For us backend developers, we have no idea when this or that feature is going to be implemented on the client side. The release cycle for client applications is somewhat more complicated and longer than in our case, so our product managers are constantly juggling priorities.

Any company’s accepted development culture is very important: a backend developer is responsible for a feature right from its implementation at the backend through until its final integration on the last platform it was destined for.

The following scenario is entirely possible. Six months ago you rolled out a given feature. The client teams took a long time to deploy it because priorities at the company had changed. You’re already working on other tasks; you have fresh deadlines and priorities. And then a colleague comes up to you and says: “Do you remember that thing you came up with for us six months ago? It doesn’t work.” Now you have to set aside the new tasks, and fire-fight.

Our developers have one main concern: they want to make sure that as few problems as possible crop up at the integration stage.

What do you do first and foremost to be sure a given feature works?

Of course, the first thing that comes to mind is manual testing using the client application. You take the given application, run a scenario and see if it fails. But the feature is new and the application didn’t integration it yet. Anyway, passing a manual test isn’t a concrete guarantee that something won’t get broken during the time from backend release to the start of integration.

This is where automated tests come in useful.

Unit tests

The simplest tests we write are unit tests. The basic language we use for the backend is PHP, and the framework we use for module testing is PHPUnit. Getting ahead of ourselves for a moment, let me say that all our backend tests have been written based on this framework.

We generally use unit tests on small, isolated bits of code. We check the efficiency of methods or functions, that is to say, tiny units of business logic. Our modular tests are designed not to interact with anything, nor access databases or services.

SoftMocks

The main difficulty encountered by developers when writing modular tests is untestable code and usually this is legacy code.

Here is a simple example. As a company Badoo is 12 years old; it was once a very small start-up in the hands of just a few people. The start-up existed entirely successfully without any tests whatsoever. But once we got quite big we realised that tests were essential. However, by that time we’d already written a lot of code which worked well. We weren’t going to rewrite it just so that we could subject it to tests! That wouldn’t make a lot of sense from a business point of view.

That is why we developed a little open source library called SoftMocks, which makes our test writing process cheaper and faster. This encompasses all include/require PHP files and ‘on the fly’ it replaces the source file with modified content, that is to say rewritten code. This allows us to create mocks for any code.

This is more or less what it looks like for the developer:

//mock constants
\Badoo\SoftMocks::redefineConstant($constantName, $newValue);
//mock for any methods: static, private or final
\Badoo\SoftMocks::redefineMethod(
      $class,
      $method,
      $method_args,
      $fake_code
);
//mock for functions
\Badoo\SoftMocks::redefineFunction(
      $function,
      $function_args,
      $fake_code
);

By using uncomplicated constructions of this kind we can carry out large-scale redefinition of whatever we want. These constructions also allow us to get around the limitations of a standard PHPUnit mocker. That is to say, we are able to mock static and private methods, redefine constants and do lots of other things which were not possible for an ordinary PHPUnit.

However, we encountered a problem: developers think that if you have SoftMocks you don’t need to write testable code; you can always ‘comb’ through code using our global mocks and everything will work fine. However, this approach leads to the code becoming complicated and to an accumulation of ‘crutches’. So we’ve adopted several rules which allow us to keep the situation under control:

All new code must be easy to test using standard PHPUnit mocks. Providing this condition is fulfilled, the code will be testable and it will be easy to single out and test a small section on its own.
SoftMocks can be applied to old code written in a manner that is unsuited to unit testing, and, also, in cases when it is overly expensive, time consuming or is too difficult to do in any other way.

Compliance with these rules is carefully monitored at the code review stage.

Mutation testing

Before moving on, the quality of unit tests merits discussion. I imagine many of you use the code coverage metric. Unfortunately, however, there’s one question this metric doesn’t answer, namely, “Have a written a good unit test?” It is entirely possible that you have written a test which will not in actual fact test anything and does not contain a single assert, despite generating excellent code coverage. Of course, this example is exaggerated, but it is not that far from the truth.

Recently we began to use mutation testing for the first time. This is quite an old, but fairly unknown concept. Here is a simple step-by-step explanation of what it involves:

We take source code and code coverage;
We parse it and start to change the code: true to false, > to >=, + to — (basically, we ruin it);
For each change/mutation we run a set of tests which cover the line that has been changed;
If the tests fail, then they are fine, and they don’t actually allow us to break the logic;
If, however, the tests pass, then they are most likely not effective enough, despite coverage, and, it is possibly worth taking a closer look at them, adding some asserts (or there may be something the test isn’t covering).

There are several ready-to-use frameworks for PHP, for example, Humbug and Infection. Unfortunately, because they are incompatible with SoftMocks they didn’t suit us. So we wrote our own little console utility: it does the same thing but uses our internal code coverage format and, importantly, is compatible with SoftMocks. Currently, our developers run this manually and analyses the tests written, but we’re working on introducing this tool into our development process.

Integration testing

We use integration tests to test interaction with various services and databases.

For you to understand better what I’m about to say, let’s develop a hypothetical promotional offer and cover it with tests. Let’s imagine that our product managers have decided to give our most loyal users tickets for a conference:

The promotional offer should display if:

User has specified ‘programmer’ under ‘work’
User participates in the HL18_promo А/В test
User registered two or more years ago.

When a user presses the “Get ticket” button we need to save their data to a list to be passed on to our managers who are distributing the tickets.

Even in this quite simple example there is something which cannot be tested using unit tests, namely interaction with the database. Integration tests are required.

Let’s consider a standard way of testing interaction with a database, as offered by PHPUnit:

Pull up the test database
Prepare DataTables and DataSets
Run the test
Clear out the test database.

What potential complications might there be in this approach?

DataTables and DataSets structures require support. If we change the table structure, we need to reflect these changes in the test, which isn’t always convenient and requires additional time.
Preparing the database requires time. Each time we set up a test we need to add something to the database and create tables, and this takes a long time and a lot of effort, especially if there are a lot of tests.
The greatest drawback is that when these tests are run in parallel it results in instability. We ran test А and it began to write to the test table which it had created itself. Simultaneously we ran test B, which wants to work with the same test table. This results in mutual blocking and other unforeseen issues.

To avoid these problems we’ve developed our own little DBMocks library.

DBMocks

The principle of how this works is as follows:

Using SoftMocks we intercept all the wrappers we use to work with databases.
When there is an incoming query via mock, we parse the SQL query and extract DB + TableName, and obtain the host from the connection.
On the same host in tmpfs we create a temporary table with the same structure as the original (we copy the structure using SHOW CREATE TABLE).
After this, we redirect all incoming queries via mocks for this table to the freshly created temporary table.

What does this give us?

Structures are no longer a worry;
Tests can no longer compromise data in the source tables, because we are redirecting them ‘on the fly’ to temporary tables;
As before, we are testing for compatibility with the MySQL version we are working with and if, all of a sudden, the query ceases to be compatible with the new version, then our test will see it and fail.
The most important thing is that the tests are now isolated and, even if we run them in parallel, the threads will separate out into the different temporary tables, since each test will have a unique key added to the relevant, temporary table names.

API testing

The difference between unit and API tests is well illustrated by the following gif:

*The lock is excellent, only it is attached to the wrong door.*

Our tests imitate a client session: they are able to send queries to the backend, observing our protocol, and the backend responds to them as if they were a real client.

Pool of test users

What do we need in order to be able to write these kinds of tests successfully? Let’s revisit the conditions necessary in order for our promotional offer to be displayed:

User had specified ‘programmer’ under ‘work’
User participates in the HL18_promo А/В test
User registered two or more years ago.

As you can see, this is all about the user. And, in reality, 99% of API tests do in fact require the presence of an authorised registered user, who is present on all the services and databases.

Where are we going to get such a user? We could try and register them at the time of testing, however:

This takes a long time and is resource-intensive;
Once the test is over, we somehow need to remove the user, which can be quite complex especially if we’re talking about large-scale projects;
Finally, as in the case of many other heavily loaded projects, we perform lots of operations in the background (adding a given user to various services, replication to other data centres etc.). The tests are not aware of these processes, but, if they implicitly rely on the results of them having been completed, then there is a risk of instability.

We have developed a tool called ‘test user pool’. Underlying this tool are two ideas:

We do not register users every time, but rather use them multiple times.
After a test we restore user data to their initial state (to the point when they were registered). Fail to do this and, over time, the tests will become unstable, because users will be ‘polluted’ with information from other tests.

This is more or less how it works:

It so happened that at a particular point in time we wanted to run our API tests in a production environment. Why? Because a development infrastructure is not the same thing as production.

Although we always try to reproduce a production infrastructure on a smaller scale, a development infrastructure is never going to be a fully-fledged copy. In order to be absolutely certain that a new build meets requirements and there aren’t any problems, we upload the new code to a pre-production cluster, which works with production data and services, and we run our API tests there.

In this case it is very important to decide how you’re going to isolate test users from real users.

What would happen if test users started to appear as real in our app?

How can we achieve isolation? Each of our users has an is_test_user flag. At the registration stage they are assigned yes or no, and that does not change. Based on this flag, we isolate users on all services. It is also important that we exclude test users from business analytics and А/В testing results, to avoid distorting our statistics.

There is a simpler way: just ‘relocate’ all test users to the Antarctic. Providing you have a geoservice this is entirely workable!

QA API

We don’t just need a user, they also need to comply with certain parameters: they need to be working as a programmer, to participate in a given А/В test and be registered for 2 years or more. It is easy to assign test users a profession using our backend API, but participation in А/В tests is a matter of probability. As for the condition that the user registered two or more years ago, this is very complicated to implement, because we don’t know when a given user appeared in the pool.

To solve these problems we have QA API. Basically, this is a backdoor for testing, consisting of well-documented API methods which allow you to quickly and easily to manage user data and change their status without going through our basic protocol for interaction with clients. Methods are written by backend developers for QA engineers and for use in manual, UI and API tests.

QA API can only be used in the case of test users: if the relevant flag isn’t there, the test immediately fails. This is what one of our QA API methods looks like, which allows you to change the user’s registration date to a date of your choosing:

And this is what the three queries will look like, which allow you to quickly change the test user’s data to meet the conditions for the promo offer to be displayed:

Under ‘work’ it says ‘programmer’:
addUserWorkEducation?user_id=ID&works[]=Badoo, programmer
User participates in the HL18_promo A/B test:
forceSplitTest?user_id=ID&test=HL18_promo
Registered two or more years ago:
userCreatedChange?user_id=ID&created=2016–09–01

Since this is a backdoor, it is extremely important to put security measures in place. We have protected our service in several ways:

We have isolated it at a network level: services can only be accessed from the office network;
Along with each query we send a secret key, without which you cannot gain access to QA API even from the office network;
The methods only work with test users.

RemoteMocks

In order to work with the API tests remote backend, we may require mocks. Why is that? For example, if an API test in the production environment starts to access a database, we need to make sure that any test data in the database has been cleared. What’s more, mocks help us to make the test response more amenable to testing.

We have three texts:

Badoo is a multilingual application; we have a complex localisation component, which allows us to translate fast and obtain translations for the user’s current location. Our localisation staff are constantly working on improving translations, running А/В tests with lexemes and are on the lookout for better expressions. And, when carrying out a test, we cannot know which text the server will return — this can change at any time. However, using RemoteMocks we can check whether the localisation component has been accessed correctly.

How do RemoteMocks work? The test requests the backend to initialise RemoteMocks for its session and, on obtaining all subsequent queries, the backend checks whether there are mocks for the current session. If there are, then it simply initialises them using SoftMocks.

If we want to create a remote mock, then we specify which class or method we need to replace and what to replace them with. All subsequent queries to the backend will be carried out taking account of this mock:

$this->remoteInterceptMethod(
       \Promo\HighLoadConference::class,
       'saveUserEmailToDb',
        true
);

Now, let’s assemble our API test:

//obtain a client emulator with an already automated user
$app_startup = [
'supported_promo_blocks' => [\Mobile\Proto\Enum\PromoBlockType::GENERIC_PROMO]
];
$Client = $this->getLoginedConnection(BmaFunctionalConfig::USER_TYPE_NEW, $app_startup);
//configure user$Client->getQaApiClient()->addUserWorkEducation(['Badoo, программист']);
$Client->getQaApiClient()->forceSplitTest('HL18_promo');
$Client->getQaApiClient()->userCreatedChange('2016-09-01');
//mock entry in database
$this->remoteInterceptMethod(\Promo\HighLoadConference::class, ‘saveUserEmail’, true);
//check that promo block returned as per protocol
$Resp = $Client->ServerGetPromoBlocks([]);
$this->assertTrue($Resp->hasMessageType('CLIENT_NEXT_PROMO_BLOCKS'));
$PromoBlock = $Resp->CLIENT_NEXT_PROMO_BLOCKS;
…
//user presses CTA, check that response returned as per protocol
$Resp = $Client->ServerPromoAccepted($PromoBlock->getPromoId());
$this->assertTrue($Resp->hasMessageType('CLIENT_ACKNOWLEDGE_COMMAND'));

We can use this straightforward way to test any functionality which arrives at the backend for development and requires changes to the mobile protocol.

Rules for using API tests

Everything seemed fine, but once again we encountered a problem: API tests turned out to be too easy for development and there was a temptation to use them everywhere. As a result, we realised that we were starting to use API tests for tasks for which they were never intended.

Why is this bad? Because API tests are very slow. They make their way around networks, accessing the backend, which pulls up a session and goes into data bases and loads of services. So we developed a set of rules for using API tests:

API tests are to be used to check protocol in respect of interaction between the client and the server, and also to check whether new code has been integrated correctly;
They can be used to cover complex processes, such as, for example, action chains;
They are not to be used to test minor variations in server response — that is the job of modular tests;
They can be used during code review.

UI tests

As we’re looking at the automation pyramid, let me talk a bit about UI tests.

Backend developers at Badoo don’t write UI tests — we have a dedicated team in the QA department which does that. We cover a feature with UI tests once it has already been finished and stabilised, because we don’t think it makes sense to expend resources on the rather expensive automation of a feature which may not even make it beyond an А/В test.

For mobile autotests we use Calabash, and for the web we use Selenium. You can read here about our automation and testing platform.

Running tests

We currently have 100,000 module tests, 6,000 integrational tests and 14,000 API tests. If we were to try to run them in a single thread, then even our most powerful machine would require the following lead times to run them all in full: module tests — 40 minutes, integrational tests — 90 minutes, and API tests — 10 hours. This is too long.

Parallelisation

The first decision, which seems obvious, is to run tests in several threads. However, we went further and created a cloud for running tests in parallel, in order to be able to scale hardware resources. In simplified terms it looks like this:

The most interesting task here is that of distributing tests between threads, that is to say, breaking them into chunks.

You could divide them up equally, however the tests are all different and this could lead to a major discrepancy in terms of the lead time for a given thread: all the other threads might have finished, while one thread is still pending for another half an hour because its tests happen to be very slow.

You could run several threads and ‘feed’ them tests one-by-one. In this case the drawback is less obvious: there are overheads expended on initialising an environment which, in the case of a large number of tests and given this approach, will start to become significant.

So, what did we do? We started to collect statistics as to the lead time for each test and then started to ‘pack’ chunks in such a way that, based on statistics, a given chunk would be completed within 30 seconds. At the same time, we packed the tests into chunks quite tightly, so that there wouldn’t be too many of them.

However, our approach also has a drawback. It’s tied to API tests: they are very slow, demand a lot of resources and don’t allow fast tests to be carried out.

That is why we split the cloud into two parts: the first part only runs fast tests, and the second runs part fast and part slow tests. Given this approach, we always have a bit of cloud available to process fast tests.

The result was that module tests began running within a minute, integration tests within five minutes and API tests within 15 minutes. That is to say, running all the tests, instead of taking 12 hours, now takes no longer than 22 minutes.

Running tests based on code coverage

We have a large, complicated, monolithic architecture and best practice would be to constantly run all the tests, since a change in one place might break something in another. This is one of the main drawbacks of a monolithic architecture.

At a certain point we came to the conclusion that you don’t need to run all the tests every time; you can run tests based on code coverage:

Take our branch diff.
Form a list of altered files.
For each file we obtain a list of tests covering it.
From these tests we create a set of tests and run it in the test cloud.

Where can we get coverage from? We collect data once a day when the development environment infrastructure is idle. The number of tests to be run has been reduced, while the response speed, by contrast, has increased severalfold. Result!

An additional bonus was being able to run tests for patches. Despite the fact that it’s a long time since Badoo’s start-up days, we are still able to introduce changes to production quickly, get hotfixes out, roll out features and alter configuration. As a rule, speed is very important for us when it comes to rolling out patches. The new approach greatly increased test response speed, because now we no longer have to wait the long time for all the tests to be performed.

But there’s no getting away from drawbacks. We release backend twice a day and coverage is only relevant for the first release, up to the first build after which it starts to lag one build behind. So, in the case of builds we run all the tests. This guarantees for us that code coverage isn’t lagging behind anywhere and that all the necessary tests have been carried out. The worst thing that can happen is that we catch some failed tests at the build stage and not at earlier stages. However, this only happens rarely.

However, this approach is not very effective in the case of API tests since they generate very comprehensive code coverage. During the course of logic testing they pull up a bunch of different files, go into sessions, databases and so on. If you change something in one of the affected files, all the API tests are included in the test set and the advantages of this approach are lost.

Conclusion

All levels of the test automation pyramid are needed in order to be sure that functionality is working properly. Miss out a level, and the probability is that some problem will remain uncovered.
Number of tests ≠ quality. Set aside time for code review of tests and for mutation testing; this is a useful tool.
If you plan to work with test users, think about how to isolate them from real users. And don’t forget to exclude test users from statistics and analytics.
Don’t be afraid of backdoors. They make test writing much simpler and faster and are very helpful with manual testing.
Statistics, and more statistics! Having statistical data in respect of tests can improve parallelisation of tests and reduce the number of tests that need running.