Gaining confidence in automated checks
This follows on from the previous blog post here.
As discussed in the previous blog post, we came to understand that the way we were working wasn’t the way we wanted to be, nor was it how we could deliver software safely and confidently to production. We identified that we needed confidence in many areas. For example, we had our automated checks (you can read about testing and checking here) and they were failing regularly, this wasn’t just because the software was broken — they were flaky, with many ignored checks and muted checks. Ignored or muted checks should also be deleted because they add noise to the information, and noise can cause you to miss things.
How many UI checks do you have?
The Content Platform team I was working with had more than 200 of these at the time. These checks most certainly served a purpose at a time. They included navigating to some pages, checking for certain elements and some behaviours of the pages themselves.
This is fine if that’s what you need, but it didn’t seem right for us at the time. There were so many checks that failed consistently — but didn’t everyone have failing checks? They also had ignored checks, but again, everyone ignores something, right? Another problem was the time taken to run the checks was upward of 45 minutes. So much for quick feedback that we wanted from our automated checks. As the team matured, we realised that there were better ways of doing things.
Why were the checks failing?
This was the first question we tackled. There was obviously more than one reason, so I’ll list them here and discuss how we overcame them later:
- By their very nature, our UI checks were flaky — they were doing too much, they were opening up a browser, performing multiple actions and then doing some kind of assertion on what was displayed in the browser
- The checks were running on different versions of the browser — they were running on a build server that was outside of our control, so they often ran on different versions of whatever browser was installed. This was often incompatible with the latest ChromeDriver that the checks were using.
- They weren’t coded with the relevant waits — I’m not talking Thread.Sleep() here either
- Timeouts — As mentioned, the UI checks took more than 45 minutes to run. This this meant that things could go wrong during this time, and often did
Having looked at the automated UI checks, we came to the realisation that many of them were loading up a web page and checking that an element was displayed. We realised we didn’t have to load up a browser, we could make an HTTP request to get the content and ensure that the response returned contained the correct element.
To help us with this, we found a package called FizzlerEx, which enables you to query HTML using locators (much the same way that Selenium does) and then interrogate the element to perform certain assertions on it, if that’s what you wish to do.
We moved as many of the UI checks down to the HTTP request layer as we could, and this reduced the size to just under 50, 75% less than the original number. The checks also now took less time to run, as they were no longer taking time to fire up a browser to run some checks. We also added in some parallelisation. This meant the checks now ran in less than 15 minutes, which again, was a massive improvement.
Next to think about was the fact that they were running on different versions of Chrome. We wanted a collection of machines that we had control over that we could run our UI checks on — effectively a Selenium Grid. To help us with this, we had two options:
- We could run and maintain our own Selenium Grid, meaning we could control and update the machines that the checks would be running on with whatever browser(s) we wanted to have on there
- We could pay to access a large number of browsers and devices, which we could run our checks on, and leave the management aspect to someone else
We chose the second option, and to help us with this we used SauceLabs. SauceLabs (as with other browser grid suppliers) enabled us to access thousands of device/browser combinations, which, as well as giving us greater control over the browser configuration, would also enable us to debug any issues we may encounter from customers using any such device/browser combination.
We now knew what browsers and devices our UI checks were running on and would no longer encounter issues with browsers updating themselves, or with random timeouts occurring due to build server issues.
To identify the flakiness of either the checks or of the actual deployment, we needed to make sure they were passing against a deployment, and so left them running consistently over the weekend. This meant the checks were running against a known state, a number of times. If they failed, they failed because of genuine flakiness related to the checks and not the data (the checks don’t change data states) or the deployment. We then identified the guilty checks and put in fixes to ensure that they were more robust.
We were slowly building up confidence in our checks. We still had things to do to increase said confidence but, in terms of running our automated checks and knowing if they failed or passed for valid reasons, we were in a far better state. We still had some manual checks that we were performing at the point of release.
To ensure that the assets had loaded correctly and that the page looked correct, we would manually navigate to the page and check how it looked. We investigated some visual regression tools and selected Applitools.
This would mean that we could have a baseline image of the page and the automated check would go to the page, take a screenshot and then compare with the baseline. We could have quite high control over the sensitivity and areas to ignore. This meant that if the assets hadn’t loaded, or the page looked incorrect, the check would fail.
Now the checks were running quite a bit faster, they weren’t failing due to flaky checks. We were confident that when they failed, they were failing for a valid reason and we had more control over the conditions that the checks were running under. We were slowly getting to a point where we had confidence in our software after having run the checks.
There were still a number of things we needed to tackle, but things were slowly improving. We’ll look at the other challenges that we faced in future blog posts, so check back.
Gareth is a Lead QA Engineer, helping to make ASOS Tech and ASOS QA rock 🎸🤘. He’s also a Sunderland AFC season ticket holder, a football coach and general sports fan.