Hunting flaky tests at Doctolib
Our test Strategy
Testing is crucial because it is essential in preventing regressions. Tests give us the ability to make changes to existing code without such a burden of pressure. At Doctolib, we invest heavily in tests within the whole stack. We run Ruby model and controller tests, JavaScript tests, visual regression tests with Argos and E2E tests.
How do we define a flaky test?
An essential property of an automated test and the entire test suite is its determinism. This means that a test should always have the same result when the tested code doesn’t change. A test that fails randomly is commonly called a flaky test. Flaky tests hinder your development experience, slow down progress, hide real bugs and, in the end, cost money.
So, what makes flakiness so harmful?
Flakiness can result in a loss of confidence in the Continuous Integration process from developers. When a test is failing, the developer’s first reaction will be to either relaunch the build or dismiss the failure. Additionally, having lost confidence in the test suite, the developers can become less committed to writing further extensive tests. This leads to lost time and an overall loss of quality of the application.
Big players like Facebook and Google are not exempt from these issues. Google even has a dedicated team to tackle the issue:
Almost 16% of our tests have some level of flakiness associated with them! This is a staggering number; it means that more than 1 in 7 of the tests written by our world-class engineers occasionally fail in a way not caused by changes to the code or tests.
Tracking Flaky Tests
In a nutshell, to track flaky tests we retry each failing test and send an event to Sentry if at least one retry is successful.
You can read this blog post if you want to know more about how we hunt them.
Case list
During our investigation, we gathered together many cases of flaky tests. Below are a few examples of those that we encountered recently at Doctolib as well as some applicable fixing patterns:
We will continue to add to this list once other instances of flakiness have been properly detected, addressed and analysed.