Load Testing — Best Practices & Key Takeaways

Published in

Doctolib

5 min readMay 27, 2020

Saying that Doctolib is a fast growing company would be an understatement. We roughly double our traffic and users every year.

Even if people know us mostly for our ability to help them find and book a medical appointment, our most important service as of today is the booking management system we provide to practitioners, our primary customers. As people’s healthcare may depend on it, we strive for the best possible SLA. Downtime is not an option.

When changing infrastructure or critical components, we need to be sure of the impact. For this we started building load tests in order to validate the change before going to production.

Building load tests

Building and running a load test suite is a three step process:

Target relevant endpoints
Build test scenarios
Measure actual performance

The load tests are executed against an isolated environment with the target infrastructure, so we have no impact on the active production infrastructure.

The new infrastructure must:

Host all required components
In our case, have a fake dataset with the same volume as our production instance to simulate the data volume

Target relevant endpoint

Accurate simulation of the production load is very important. If we simulate 10% of the production load, then we would have sized a new environment unable to handle 100% of the production, or we cannot infer whether a change is dangerous or not.

Simulating 100% of the production load is hard and may require a lot of time. Moreover, we don’t see any value in simulating 100% of the endpoints. The reason is that some of them don’t even sum up to 1% of the resources consumption.

We decided to aim for at least 80% of the production load by targeting the most relevant endpoints.

We used New Relic’s insight API to identify:

the top 20 most time consuming endpoints
the top 20 highest throughput endpoints

With this list of endpoints, we can compare the throughput and performance on those endpoints between the target environment and the actual production environment.

The goal is to get as close as possible to the actual production throughput.

Building the scenarios

We use Gatling for our load tests. This tool is an open-source load testing framework that uses scenarios written in Scala, and runs them onto a dedicated cluster. You can check the Gatling simulation setup general presentation to learn more about it.

A scenario is executed through a Session — each virtual user has its own thread — which chains different calls and stores temporary global state that can be reused on later calls. On each call we can add Checks to assert the response status and content, and also to store some data in the session.

Basic scenario example from the Gatling website

Gatling provides results on the number of executed scenarios, number of failures, etc. which allows to know when a scenario is not working as expected (due to a data issue or incorrect implementation).

To build scenarios we use 2 different methods.

Since we handle sensitive data, it is not possible to use production logs.

Real life scenarios: This one is the easiest, and has been done for the first and easiest scenarios: reproduce our main product activity, which is booking an appointment online as a patient.
Using endpoint statistics: By comparing the target endpoint list’s throughput. We check if we correctly simulate all required endpoints. If some of them are missing, or have too low throughput, we add or tweak virtual scenarios calling them.

In order to improve scenarios, we can use logs that do not bear any personal healthcare related data. For instance, non connected visitors searching practitioners by speciality and location can be a helpful metadata source.

An endpoint can consume different amount of resources depending on its input: if we display a big hospital profile page or a practitioner working alone profile page, the consumed resources can change. The first possibility consumes more resources than the second one. That’s why the distribution call even on a single endpoint can be important to better simulate resources consumption.

Performance measurement

We proceed by iterations where we:

Build: Introduce a change to the architecture
Measure: Run the test suite
Learn: deduce what we need to change to hit the mark

During an iteration, we want to check the difference between the simulation and production of:

number of calls (accuracy of simulation)
global response time
calls distribution
request queuing
backend components behaviour (DB, redis, ElasticSearch, etc)
low level metrics (load average, I/Os, memory, disk usage, etc)

These checks are done through our two most important metric tools:

New Relic: For applicative metrics
Datadog: For system metrics

Takeaways

Here are some recommendations that we can provide based on what we had to go through with our load tests.

Use several small load injectors instead of a big one to avoid JVM GC freezing your performance tests: Instead of 40K virtual users on a single injector use 5K virtual users on 8 injectors.

Furthermore, you should go easy on the ramp up (meaning: time between testing your first virtual user to your maximum of virtual users). You don’t need to stress out the target infrastructure from the start, it’s better to let it warm up first.

Load tests can become irrelevant very easily as production evolves continuously, so it’s better to invest from the start to add some “unit” tests around the load tests. In our case, we have a configuration which loads only 5 virtual users to go through all our scenarios. We have the habit to launch this small and fast test suite every day to check that our test suite is still relevant and not out of date.

Next steps

We still have many ways to improve our load tests.

We currently only simulate web traffic but we have many background jobs that impact the load of our database so we plan to simulate them as well.

We plan to run a load test that simulates twice the production load every month in order to anticipate scalability issues.

And ultimately, we are going to improve the automation of the isolated infrastructure that would handle the load testing in order to run performance tests more efficiently.

There are other kind of performance tests that we are thinking of: stress testing and volume testing.

If you want to learn more about our tech team we write a weekly newsletter, you can sign up here. And if you want to join us, we are hiring!