Fast and stable MongoDB-based tests in Spring

Published in

nexocode

10 min readDec 7, 2020

If your Spring application uses MongoDB, there is one question you will have to answer at some point: how to set up a database instance for your tests. Until early 2020 there was not much choice and the default was to use ‘embedded Mongo’ — my team was no exception. At some point, however, we realised that embedded Mongo was turning our builds into a nightmare. This ultimately prompted us to migrate our code to a tool getting much hype recently — Testcontainers.

In this article I would like to share some of the problems connected with Flapdoodle Embedded Mongo and talk briefly about Spring tests to explain what causes those problems. I will also guide you through the setup process of MongoDB using Testcontainers, and offer some tricks to improve Testcontainers performance. Finally, I will talk about how sar and other Linux tools can be used to measure the load on your machine.

Introducing Flapdoodle Embedded Mongo

If you search the web for ‘test spring boot mongo’ or similar terms, you will very likely end up with instructions which may harm your codebase. Flapdoodle Embedded Mongo was immensely popular at some point and many tutorials were written about using it. Those articles still rank surprisingly high in search engines — things don’t just disappear from the Internet when they cease to be useful. In fact, Embedded Mongo is still recommended by the recent Spring Boot documentation.

At a first glance it looks fantastic: you don’t need to modify your production code, you don’t even need to configure anything, just add a dependency:

testRuntimeOnly('de.flapdoodle.embed:de.flapdoodle.embed.mongo:2.2.0')

…And voilà! Your tests connect to a database started just for the time of your tests being executed.

Under the hood, Spring Boot has logic detecting the presence of this library on classpath, and modifies MongoDB connections to point at an instance started by the library. The library itself handles downloading and caching MongoDB binaries suitable for your machine, and starting/stopping the server.

Our problems with Embedded Mongo

We have been following setup procedures as described in the Spring Documentation for some time. At some point, however, developers started complaining that when they executed unit tests, their computer fans started roaring and the machine became painfully slow to use. It wasn’t just a matter of perception: we found out that during tests multiple MongoDB instances were running, reaching a peak of 8 instances for a standard build or even 15 when gradle --parallel was used. Very often, after tests finished successfully, we found that embedded MongoDB processes were still running and consuming lots of CPU. It's not fun to work if 75% of your CPU is consumed by 'zombie' MongoDBs.

We started to experience random CI build failures: a pipeline that usually took around 15 minutes timed out after 40 minutes, it looked like unit tests never finished.

Finally, we found ourselves at a dead end. Adding a dependency on AWS S3 client library broke our tests. They never finished, but not just on CI server — they never finished on any machine. There was some kind of classpath conflict, preventing us from implementing a business feature. Everything broken by adding a single dependency:

implementation('software.amazon.awssdk:netty-nio-client:2.6.5')

and one Spring bean:

@Bean fun asyncHttpClient() = NettyNioAsyncHttpClient.builder().build()

What is wrong with Embedded Mongo

At the beginning of 2020 it became clear that the library is dead. You could not configure it to run a version of MongoDB newer than 4.0.2. Hardly anything happened with the library source code. Only recently, in October 2020, something started to change, but the newly released library version conflicts with all stable releases of Spring Boot, so still you are not able to test your code with MongoDB 4.2.

Another serious issue is that the library can make your tests extremely resource-hungry by starting a horde of MongoDB instances. And it’s a consequence of library design, not a mistake in the implementation.

The library integration with Spring Boot works by starting an embedded MongoDB instance when Spring context is started. Because it would take ages to execute tests with a context started for each test, Spring offers a caching mechanism. A started context is kept between tests and re-used. However, a cached context cannot be simply used in all situations — when a test expects a context with a different configuration, a fresh context has to be started. Now, the problem is that there are lots of situations where Spring considers a fresh context is needed, and these include things you do very often when writing tests:

you use a @MockBean
you enable a profile with @ActiveProfiles
you override some properties using @TestPropertySource

You can find a more comprehensive list in the Spring Framework documentation.

All in all, there will be many opportunities in your tests to start a fresh context, each holding its own MongoDB instance. What about ‘old’ contexts? They aren’t closed, Spring keeps them in case there is a test that will need them. If you don’t touch the spring.test.context.cache.maxSize property, Spring will keep up to 32 contexts. This means that potentially you can have 32 instances of MongoDB running simultaneously when you use Flapdoodle Embedded Mongo. Moreover, each test worker is an independent JVM, so if you have 2 subprojects and run gradle --parallel you can have up to 64 MongoDB instances, or 96 instances for 3 subprojects and so on.

MongoDB with Testcontainers

Testcontainers is a Java library allowing to start different services for use in tests. It uses Docker to fetch, start and stop those services. In our experience, this approach is way more stable than the one from Flapdoodle Embed Mongo — no more zombie processes eating CPU.

The library is a generic tool rather than a plug-in that will do anything for you under the hood, as it is the case with the Flapdoodle one. You will need a bit of code to connect Spring tests and Testcontainers. Here is how we approached this:

We have a context initializer that starts a Docker container with MongoDB when the context starts and replaces the connection URL with a one pointing at the container. To avoid starting the database for each test, we cache the container in a lazy Kotlin property (following advice from Best Practices for Unit Testing in Kotlin by Philipp Hauer).

We also defined an annotation:

which we use instead of @SpringBootTest in each test class:

Testcontainers performance

Simply replacing Flapdoodle Embed Mongo with Testcontainers makes tests execute much faster. In a small test project this makes a 30% difference.

We wanted to see if we can go even faster. One optimization opportunity appears in start time. By default, even if you cache the container between test runs, the container is stopped once all tests finish. So if you execute just one test from your IDE, each time you do so, you have to wait for the container to start. In this scenario Testcontainers become slower than Flapdoodle Embed Mongo by 2 seconds. It’s painful if you run your unit tests very often as you do TDD.

The situation can be improved by enabling container reuse. This mechanism marks a container as eligible to be picked by subsequent test runs and does not stop it after a test run finishes. If you go back to the code of MongoContainerSingleton, you will see the reuse feature is enabled there. This is however not enough: you have to additionally set testcontainers.reuse.enable=true in ~/.testcontainers.properties. We encourage our developers to enable it, but keep this option disabled on CI servers.

As you can see on the graph above, it improves overall execution time a bit (results with “-r” have container reuse enabled). Startup time of a single test becomes roughly the same as with Flapdoodle Embed Mongo. If you want to learn more about container reuse, check this article by Paweł Pluta.

One challenge when enabling container reuse is parallel build with multiple subprojects. If a test from project1 runs and a test from project2 is started, it will connect to the same reuse-allowing MongoDB container project1 uses. There is a danger those two tests will simultaneously write to the same collections. We solve this by making sure each subproject gets a separate database inside the same container.

We are very happy with the improvement in build stability. After moving the tests to Testcontainers, developers no longer have zombie MongoDB instances on their machines. CI builds fail much less often (well, they still do at times when Testcontainers fail to connect to Docker breaking the build). Additionally, the CI build time is much more consistent now.

Measuring the operating system load

When deciding whether to move to Testcontainers or not, we wanted to understand the consequences of switching. Some of the team members used Linux and others used MacOS. We knew that test execution time and the subjectively perceived machine load was different from person to person. There were multiple open questions:

Is the problem with too many Mongo instances repeatable?
What if Testcontainers are faster on Linux where Docker is a ‘native’ mechanism, but slower on MacOS?
What if Testcontainers make tests execute faster but eat all CPU and memory, so developers aren’t able to do anything while executing tests?

We decided to create a test script, assuring that performance is measured in the same way on different machines. Finding out how fast a build executes is easy: you can do it using the built-in time shell command.

However, it is not trivial to compare CPU and memory usage during test execution as there are multiple processes interacting with each other. Firstly, Gradle has a separate process executed from the command line, another one running tasks (Gradle daemon) and a separate test execution worker. Then there are processes for embedded Mongo. Testcontainers run not only MongoDB Docker containers, but also a separate container called ryuk, which is responsible for cleaning. We decided to measure the load of the whole operating system to have a big picture of the impact of the changes on the build process.

Linux has a great tool for such a purpose called sar available in package sysstat. It can collect a wide range of parameters on system performance in a reliable way, without adding much overhead. You can start it in background:

sar -o sar.binary 2 15 &

Here it will measure performance every 2 seconds, 15 times, and save results to a binary file. Then, after measurement is finished, you can extract data on CPU usage:

% sar -f sar.binary -u
12:36:05        CPU     %user     %nice   %system   %iowait   
12:36:07        all      0,19      0,00      0,25      0,00     
12:36:09        all     70,38      0,00      3,28      0,00     
12:36:11        all     85,47      0,00      4,09      0,57

or memory usage:

% sar -f sar.binary -r | awk '{print $1 "\t" $4}'
12:36:05        kbmemused
12:36:07        1587988
12:36:09        1990692

With built-in shell tools it’s easy to quickly summarize data, for example by counting min, max and average:

cat memory.dat | awk 'BEGIN { min=99999999 } { total += $2; count++; if($2<min) min=$2; if ($2>max) max=$2; } END { print "Max\t" max "\tMin\t" min "\tAverage\t" int(total/count) }'

However, it’s hard to understand the big picture looking at the numbers alone. Visualisation can be a great help. We wanted to make the whole benchmark process fully automated, without manual pasting of data to spreadsheets. A ‘good enough’ approach was to make use of the venerable gnuplot .

If you have plain text data as columns separated by whitespace (as you can see in the awk call, we used tabs), it’s easy to feed the data into gnuplot.

cat cpu.dat | gnuplot -e "set yrange [0:100]; set terminal png size 800,600; set output 'cpu.png'" base.gnuplot

To avoid repeating same gnuplot options when creating CPU graphs and memory graphs, we extracted the common part to a separate file (base.gnuplot):

set timefmt "%H:%M:%S"
set xdata time
set style data lines
plot "/dev/stdin" using 1:2 with lines notitle

Here we define time format, inform that x axis shows time , use a solid line for plotting and finally use columns 1 and 2 from the input as x and y coordinates of each point.

Unfortunately, sar makes use of Linux kernel features and is not available on MacOS. We were able to work around this by measuring memory usage by calling vm_stat command, but we think results are very inaccurate and only show general trends. We could not find a reliable way to measure global CPU usage on Mac.

Summary

Flapdoodle Embedded Mongo is a very popular library for running MongoDB for tests. While still recommended by many tutorials, it is known to cause various performance and maintainability problems. It prevents testing your code against the modern versions of MongoDB, wastes computer resources by starting too many database instances and leaving zombie processes. In some circumstances, it can block your build infinitely if you add dependencies that clash in some cryptic way with it.

Testcontainers is a new tool with a similar purpose that gains much popularity. It is actively maintained and can be used not just for MongoDB but for a wide range of systems, like different SQL and NoSQL databases and even message queues. We recommend learning it as its broad usage means it may be useful not only in your current project, but also in future ones, even if they use a different persistence mechanism. It manages test databases reliably and does not leave zombie processes.

You can check my repository comparing the performance of Flapdoodle Embedded Mongo and Testcontainers. The two projects written in Kotlin along with shell scripts automatically measuring performance are available on GitHub: https://github.com/pkubowicz/embed-vs-testcontainers.

In general, we found that Testcontainers are faster by 30% on Linux and even by 45% on MacOS. You can run tests on your own machine or see the visualization of results we had: https://pkubowicz.github.io/embed-vs-testcontainers/.

Originally published at https://nexocode.com on December 7, 2020.

Want more from the Nexocode team? Follow us on Medium, Twitter, and LinkedIn. Want to make magic together? We’re hiring!