QA testing in the cloud

The evolution of apps QA at Azimo

Published in

AzimoLabs

7 min readOct 5, 2021

This is the fifth in a series of blog posts in which we outline our multiple years’ experience with our Android app testing at Azimo. Most of the principles, goals, and achievements also apply to our iOS app.

Table of content

The evolution of apps QA, first days, and unit testing
QA engineers, functional and UI testing
Custom tooling for testing stack management
Removing and subtracting tests is part of development too
(This post) — QA testing in the cloud

We’ve come a long way since we started our mobile app development. From duck-typed MVP with no tests, QA-ed manually every month to the product released once a week, developed by independent mission teams, and tested automatically within hours. But the more you get into it, the more complicated it becomes. Over time we only face more challenges. Here are the latest ones in our journey.

Limited scalability and increased demand for maintenance

After about two years of active development, it became clear that our custom-made solution — AutomationTestsSupervisor, from the scalability unlocker, became the obstacle in further growth. The tool required constant maintenance, especially when there were new versions of SDK, dev tools, or emulators. It required too much time investments from our relatively small engineering team (1 QA engineer and 2–4 software engineers per mobile platform). We also reached the limit of scalability. Because of tests being run on QA engineers’ computers, it became almost impossible to increase the number of parallel emulators (five was our limit). With the custom-made tool, there was also no effortless way to migrate it to the cloud.

Flakiness

While in previous years we were glad of reducing flakiness from 20% to 10%, the final results weren’t satisfactory for us just yet. 10% of failing tests were about 20–30 functionalities that we needed to recheck manually. Very often, the remaining 10% of tests doubled or tripled overall testing time.

What is flakiness and how we deal with it

Difficulties of QA engineering excellence

medium.com

Pandemic, full remote world

Going to the full-remote environment also stretched our testing capabilities. Until then, reporting and discussing bugs or regressions was about approaching the colleague sitting next to you and debugging things together. Remote work became much more about copy/pasting logs, screenshotting, and Slack messages communication.

Firebase Test Lab

With the limitation of the local machine, it became clear to us that the next step in scalability would be to run tests in cloud solutions. After some experimentations, we decided to sunset the AutomationTestSupervisor project and migrated all our tests into FTL. What was the deal for us? We lost control over ADB (which we used to reset the app or device state, among the others). It was also harder to get data that we use for our reports (we couldn’t call webhooks when we wanted to do so, and we needed to scrap some data from the console output). But in the end, gains outweighed the losses. Here are some of them:

Support from the community on https://firebase.community/. When you develop an internal tool, you don’t have access to thousands of engineers worldwide. If you use something publicly available, there is a big chance that someone is already facing the challenges you have.
Now, when we run tests in the cloud, our QA engineer’s machine isn’t blocked for that time. We can keep working on the project, and don’t worry that high CPU usage will affect our test suite stability.
We can share test results with software engineers by copy/pasting URL addresses. With video and logs easily accessible, you don’t have to worry about remote work and your teammates sitting hundreds of kilometers away from you.
Theoretically unlimited scaling. We effortlessly increased from 5 emulators on the local machine to 20 of them on Firebase Test Lab.
More emulators mean more tests that we can run in parallel. It was colossal time savings for us — from 1–2 hours to less than 25 minutes. It’s a 6–8x improvement! 🚀
Significantly faster testing made us much faster with identifying and fixing flakiness, which is currently about 1–2%.

A few words about costs 💰

Firebase Test Lab currently costs us around 400$ per month. In most cases, we use emulators, not real devices (which are more expensive), and we pay around 3–4$ per test suite run. Of course, for some developers, hundreds of bucks per month is nothing. But for others, it can be out of reach. So here is what helped us to make this buy vs. build decision.

In the previous configuration, our QA engineer’s machine was blocked 1–3 times a week, for about 2 hours, to run tests. Additionally, at least every half a year, we needed to bump emulators versions or SDKs. It usually took us about two weeks, in which we needed to freeze release pipelines.

Testing time is also crucial for us. Reduction from 2 hours to 25minutes means that software engineers get feedback about their work almost instantly. Changes can be release in smaller chunks. There is less competition for QA engineer priorities (actually, we can move some of the responsibilities from a single QA engineer to all engineers).

Saving tens of hours of QA engineer’s work, faster feedback for software engineers, and fewer unexpected maintenance shutdowns — these things were definitely worth investing 400$/mo for us. It’s a no-brainer.

Implementation details

Migration to Firebase Test Lab didn’t come with zero work for us. Here are some challenges we needed to overcome during this process.

Tests sharding

The mechanism for splitting tests in FTL is very basic (it’s the one we replaced by AutomatedTestSupervisor years ago). Tests are being split evenly by number, but with varying test lengths, it means wildly different total execution times. There is no control of that, neither from CLI nor the configuration file. Fortunately for us, there were already engineers who faced this problem before and created the tool to solve it. They built the open-source project — Flank (massive parallel test runner for FTL). One of the functions it provides is smart sharding — it optimizes test sharding based on historical runes. It gathers information on how long a given test takes, and then divides them into packages, which will take about the same time. Thanks to which the tests are executed faster and we pay less for it. There are two ways to integrate Flank into a project — by using the jar file or using the Fladle Gradle plugin.

Test results analysis

For reading logs and test results, Firebase Test Lab was a step back compared to AutomationTestSupervisor’s capabilities. The biggest problem was videos being recorded not per single test case but the entire test run. So if your shard has five tests, and the 3rd one is failing, you need to scroll through the video until you find the issue.

Over time we realized that the real problem was in logs, which didn’t give us complete information helping to reproduce the error. Video recording should be only the addition, not the primary source of knowledge.

By default, the Espresso framework prints a dump of the view hierarchy with information about failing assertions. We added a few improvements to our test framework that made it easier for us to debug tests faster (information about navigation state, allows us to launch activities with all of the params that were applied during the failing test, which allows us to launch activities with all of the params that were applied during the failing test) or giving the reason what was the cause of the failure (print encountered API errors). We will share more about these solutions in another article.

The future of testing at Azimo

Even with so many gains from cloud testing (6–8x testing time improvement, flakiness reduced from 10% to 2%), it’s not the end of our journey in making QA even better. We still have an appetite for “10x” improvements. Here are some initiatives and goals that are ahead of us:

Testing time and flakiness reduction allow us to move some testing responsibilities from QA to software engineers. Now, when developers don’t have to wait long hours and dig through tens of failing tests, they can do initial testing independently. Thanks to that, QA engineers can go more into the advisor role, ensuring that good testing practices are spread as wide as possible.
Independent software engineers and better QA culture are critical factors in scaling up and making Azimo mission teams fully autonomous. Watch out for a separate article about that.
Flakiness, even so small, is still a problem that we want to resolve. With tests being run so often, we are going to take a more data-driven approach, which will help us better understand what is the distribution of failing tests, their causes, and in the future also the likelihood of tests being flaky.

We can’t wait to share more details with you in the following month! Stay tuned.

Towards financial services available to all

We’re working throughout the company to create faster, cheaper, and more available financial services all over the world, and here are some of the techniques that we’re utilizing. There’s still a long way ahead of us, and if you’d like to be part of that journey, check out our careers page.