Visual regression testing at Doctolib

Published in

Doctolib

7 min readMar 4, 2019

Here at Doctolib we run visual regression tests on our CI to be sure that our front end is never unintentionally modified.

Visual regression testing? Eh… What do you mean, mate?

It’s pretty straightforward: behind each build of our application, there is an automated test which highlights any visual difference between screenshots of a candidate branch and screenshots of the master branch.

The differential approach of visual tests

And that’s not all, we also, of course, have unit and integration tests. Altogether, we have roughly 10,000 tests running in our CI pipeline.

Ok ok, I see you seem to like testing your application, but if you already have unit and integration tests, why bother visually testing your front-end? Don’t all the scenarios you’ve written as integration tests already prove that your application works?

Why do we have visual tests?

As you may very well know, not everything is perfect and visual testing has both pros and cons. We really do like having visual tests because it grants us better access to automation, however, it does come with drawbacks. In this section we will try to be as honest as possible regarding visual testing at Doctolib and the level of satisfaction we have about it.

Proactive monitoring of our front-stack

Owning our front stack and getting as close as possible to it.

By making sure we add snapshots to new views, we ensure that we do not unintentionally introduce unexpected visual elements. This is helpful in that it increases our team’s familiarity with the views we produce.

Reducing technical debt

This is rather self explanatory : “Untested Code Is Broken Code”.

Reducing time spent on testing

There are around 450 snapshots taken for our visual test suite. This means we visually test around 450 different views or view states. By creating an automated script to run these tasks, you can run your test suite faster and more safely.

For example, let’s imagine you’re working on a new feature that involves three different views. Normally, you would have to manually check each of the three views after your development to ensure that everything has been rendered as it should be.

Let’s say that checking a view takes you three minutes. With a visual test you’ve already saved nine minutes thanks to the snapshots, however, don’t forget:

Your eyes will never catch the pixel differences that might occur
A Product Owner/QA will very likely perform the same checks as you just did
It’s impossible to know whether your new features have had an unexpected effect on other views

Maintain a higher level of quality

Reaching pixel perfect interfaces

Thanks to our visual tests we are now able to detect almost 100% of the pixel differences we introduce and thus ensure that we ship stable views to production.

Visual tests are more simple than they were before

Visual testing has always been quite painful. With a classic end-to-end approach test, you would end up with a scenario like this one:

Trying to make visual test the old fashion way

With a visual difference tool, a simple three line test can do the same job:

But what’s even more interesting, is the fact that a visual test will test far more that just finding elements inside your DOM. It will also check things such as:

Color differences
Block positions
Correct viewport
…

Visual tests are not only easier to write but they also provide a better coverage of your UI.

CI integration is a must

It is also easy to use because it is fully integrated in our build process thanks to Argos-ci (see below):

Visual tests runs concurrently with unit and integration tests
No extra work is required from a developer to run them

Tooling

Argos-ci

As mentioned, at Doctolib we use a tool called Argos-ci (which I will refer to as Argos from now on). Argos is a tool made by two previous fullstack Doctolibers™ Olivier Tassinari and Greg Bergé, and its end goal is to ask you a simple question:

Would you like to accept the visual modifications you have introduced?

From there you have two options, either accept the differences or fix the problem, but before we discuss this process let’s take a look at the tool itself.

Argos usage

Once this line has been added, Argos will automatically compare the two screenshots and report the differences after each build we trigger. Since the tool fully integrates into GitHub, it will directly alert you if any differences have been detected (see below).

When you review the Argos report, you can identify the differences and decide on your next step given the two options available:

Accept the differences that have been introduced by the candidate branch

Fix the problems that have been introduced

If you approve the differences (e.g: say you made a user interface evolution to a page), Argos will update the build status directly on Github:

Once every test suite has passed, you know it is now safe to merge your code into the master branch.

ImageMagick, quick dive

Argos itself is based on ImageMagick, the go-to reference for image manipulation.

According to the ImageMagick website it is used:

“[…] to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats […] to resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves”.

What the website does not directly tell you is that it can also be used to detect differences between two images. The output of this comparison is a third image which highlights these distinctions, thanks to color replacement. If you want to learn more about this process you can find out more details here.

Concrete example of an added modification

We can see that ImageMagick has produced a third image highlighting the sentence missing in the middle of the page. This means that somehow during the development process on the candidate branch, we lost the sentence.

Okay, but how does the Doctolib team really feel about integrating visual tests into their workflow?

Visual testing is a challenge

Flakiness

Flakiness of our builds due to Argos testing is the most painful point we face at the moment.

99% of our flaky builds are due to the fact that when screenshots are taken there is a slight difference between them in terms of positioning. This can be due to a scrollbar that appears on a screenshot or the resolution of the screenshot taken, but most of the time, (56%, actually) this causes a build to appear red instead of green.

The inputs of ImageMagick have to be exactly the same or you’ll enter a flaky nightmare. That’s why our developers are creating strategies in order to tackle flaky tests.

If this is the case, why do you stick with visual tests?

Well, we think it is worthwhile to have a better tool seeing as we can tackle these issues on our own. This is why we invest time obliterating flakies.

Build time

Our average Argos testing time is roughly 10 minutes, which means that each time we push something to a branch, a build is triggered and a developer has to wait ten minutes to get the results of the visual test suite.

This is too long. Hence our goal in 2019 is to reach a build time well below 10 minutes. Perhaps we will write an article later this year on this topic, but we’re already seeing a good trend, as seen below:

Argos maintainability

Argos is open sourced which is a great thing but it also has its drawbacks:

It’s not very well-known, therefore there are not many maintainers and little activity.
Doctolib engineers often have to make changes to the tool to suit our needs as best as possible.

Argos is still our best option for the time being but we might benchmark and evaluate other solutions in the near future.

Next steps

Visual testing is a great way to develop confidence in your product and your deliveries. It empowers you and your team to easily ship pixel perfect interfaces and write tests in a matter of seconds. However, it comes with drawbacks which need to be addressed and we have decided to take on this challenge. This means we have two objectives ahead of us:

Reduce the flakiness of the tool itself so that our ‘trust level’ in it keeps getting better and better
Reduce the build time so that our engineers do not have to wait quite so long each time they want a direct and impartial look on their work

Feel free to let us know in the comment section how you feel about running visual tests and to give us any feedback you may have from your own experiences using them!