Tester’s Dictionary: “F” for “Flaky Tests”

Published in

TestAutonation

6 min readJan 4, 2022

If you have ever built or worked on a testing framework, then you probably experienced “Flaky tests”, if not you are one of the lucky few. Flaky tests are inconsistent tests which provide different results (pass or fail) for different test runs. The primary challenge is determining the reason for these ad hoc failures. The Flaky test phenomenon is most common for e2e tests built with Selenium.

Many people, starting from Developers to Project Managers and Release Managers rely on test results. Based on these results they can decide whether or not to release a new version or merge the latest code changes to the main branch. Unstable tests will cause unnecessary delays, ultimately affecting releases and the team’s efficiency.

What shall we do with these unstable, flaky tests?

Unfortunately, the most common thing to do is ignore them — “ We will fix it later, let’s proceed with the release… “

I used to work on the 8th floor of a Business tower. When I joined the company, we used to have frequent fire drills (without any prior notice), instead of getting up and following protocol, we just learned to ignore the annoying sound of the fire alarm and kept on working. The warning, whose purpose was to alert and save us from danger, didn’t serve its purpose anymore. Just imagine the situation when one day it will not just be a drill. Have you noticed the similarities yet?

There’s no doubt that ignoring these tests or disabling them is the fastest solution. However, are we sure that this is the best solution? Those tests were created to test a particular functionality in our system and notify us when there’s something wrong with the code; Making them inactive will decrease the coverage and our confidence in the system. Flaky tests are still better, than not to having any tests at all for that particular domain (feel free to leave us a comment if you do not agree 😉 ). The cost of following up and investigating these failures may still be more feasible than performing the tests manually every the time. Do I even need to mention the cost of ignoring flaky tests — the related manual checks will not be executed, and you have to rollback on production…

But what are the reasons?

You have to be an outstanding detective if you want to find out the cause of the flaky tests. Thank God, there’s no silver bullet for the fix (I will explain later, why is this a good thing), you have to explore and investigate many areas, network calls, 3rd party integrations, browsers compatibility issues, etc. The next sections are just hints, where the cause of the flakiness can hide.

Test related reasons

Poor UI element locators : no, long XPath does not look professional at all. Try to use IDs or CSS selectors wherever it is possible.
Non-deterministic/Undefined behaviors: Are you sure, that your tests bring value when their outcome is non-deterministic? Think again.
Test Data: Never, I mean NEVER share your automated test users with anyone. Yes, he just has to check something quickly. Next time when you run the tests, suddenly the user has a negative balance, or the account is blocked. Many more issues exist with test data, when they are out of sync, when their availability depends on the time of the execution, etc. You can consider using mock data, but don’t forget, that the real data can look different, so it depends on the case.
You may come across a situation where, your tests are passing on your Team Test System, but they start to fail on Staging. Why? Most probably it is due to another team merging new functionality. Well, communication issues can happen quickly.
Order of your tests: your tests should not depend on another test. If your test is a precondition for another test (e.g. is creating test data, registering a new user, etc.), you should consider reimplementing it.
Faker: yes, it is free and handy, but if you are running it very often, there’s a chance that it will generate an already existing email address during the registration. Try to clean up your Test Data after your tests to avoid such issues.
Extensive tests are more likely to be flaky: if your test is too long and you are doing too many steps in it, there’s a higher chance that one of the steps will fail.
AJAX calls in your app: Although Selenium and WebDriver are good at waiting for the SYNCHRONOUS requests to finish, this does not always apply to ASYNCHRONOUS requests, unfortunately

Environment-related Reasons

Flaky 3rd party integration: whenever there are issues with the service, with the network, or the service is just too slow…
Test System: Imagine a scenario when you only have one payment provider sandbox available for tests of the Test Systems. Right at the moment when your tests are running, someone just disconnects it from your TS and connects it to another Test System.
Time sensitivity: Sometimes it just takes a little bit longer for the response to arrive, or for the page to load because the system is overloaded, non-professional Thread.sleep usages, etc.

How to fix them?

QUICK-WIN

One of the quickest solutions is to re-execute the failed flaky tests. Most of the frameworks already support this feature, so why not to use it? There are a lot of different approaches:

Marking the Test as “Failed” when it failed three times in a row
Marking the Test as “Passed” if it passed at least once from the three tries

This approach is time-consuming, especially if the test execution takes longer. However, it is a way cleaner solution than merely ignoring the test results or disabling the tests.

LONG TERM SOLUTIONS

Remember, that extensive tests are more likely to be flaky, so keep your tests short and don’t try to test everything in one test. Try to keep your tests simple. Please, don’t forget, that any interaction on the UI is very expensive, so whenever it is possible, move down the test to a lower layer of the Testing Pyramid (read more about the Testing Pyramid).
Add logging, or even screen capture to your tests, if you do not have them already. This makes your tests easier to debug when they fail, and you need to investigate the cause of the failure.
Try to add a new web element to your page which is displayed when the AJAX call returned its response so that WebDriver can catch that element quickly (you can do it for your Test System only). You can reduce the flakiness by this simple tiny web element (it can be just a pixel).
Take it seriously. Even it is “just a test code” and not the production code of your app; you should still use version control, create Pull Request, so your peers can review your tests, they can suggest better locators, methods, etc., which will make your tests more stable. Remember, this is not a competition; you are in the same boat, you have a common goal: better quality product.
It will not harm if your test code and the production code uses the same language. We faced with this technical gap as well: The production code was in Javascript/AngularJS, and our test code was in Java. We did not have enough experience to implement tests in this language, but we still managed! It eventually speeded up the implementation when we switched to JS. We can use the same dependencies and package versions as the production code; the JS gurus can help us out with the automated tests, the Jenkins job is easier to configure, etc.

Why are the Flaky Tests good for you?

No, this is not a sarcastic question.

At first look, these tests seem to be very annoying, and maybe you cannot find a scenario where they can be useful. But! While trying to fix the failed flaky tests, you can learn something new about the processes in the app. You might:

debug which call takes longer than the others,
you can find the limits of the selected tool,
just find a new tool/plugin/package on the internet, which can help you to make your tests more stable.

Do you have a different approach to the flaky tests? What are your tips when it comes to fixing these tests? Share your ideas with the public 😉

Inspiration

Originally published at https://testautonation.com.