Automating Integration Tests to Ensure Android App Quality

Using Espresso, UI Automator and BrowserStack to achieve continuous testing of our Thumbtack Android apps.

Brian Terczynski
Mar 16, 2021 · 8 min read
Photo by tabitha turner on Unsplash

Thumbtack Android engineers have been writing unit tests for years. But we tested full UI flows manually. Whenever we would add a new feature, we would schedule time to “dogfood” it. This involved loading the app on some physical devices and manually running through test cases. This has proven to be a valuable way for us to verify the quality of new features, and gives us a chance to fix bugs before we would launch.

What had been missing was regular regression testing of our UI flows. We did not have any general regression testing of our basic, critical use cases in our apps. To address that at the time, we added weekly QA sessions. At these sessions, employees would meet and go through a set of “golden” test cases that represented the most important flows in our apps.

This increased the number of manual QA sessions we had to attend, cutting even more into our valuable time. And we only ran these sessions once a week, right before we would release. This meant that we usually only discovered regressions right before we intended to roll out to production. If we found any, we would have to hold up the release. Finally, these manual testing sessions were only as good as the amount of effort people put into them. For instance, some people might have tested more thoroughly than others. And over time, participation was inconstant. It was a costly, inconsistent process.

We needed something better.

Automated Integration Tests

The solution was to automate these golden test cases. And Android provides testing frameworks to do that, such as Espresso and UI Automator. We decided to evaluate both to see if we could automate our QA process.

For this to be successful, we needed a way to run our tests on a regular basis. In the past we had tried using Espresso. And while we found it to be useful, the problem was we did not have a way to run those tests automatically and repeatedly. In particular, our build machines were unable to run emulators. So our tests were derelict, and would only run if an engineer took the time to run them locally. They ended up getting ignored, and we eventually abandoned their use.

But things changed in 2019. We found a suitable remote device testing platform called BrowserStack. Their App Automate feature could run Espresso tests on a wide variety of Android devices. It also had an API that our build machines could invoke. This was the key piece we needed to repeatedly run our integration tests.

Building Our Tests

We started by prototyping our tests with Espresso and UI Automator. We found that Espresso handled most of our testing needs, allowing for a nice declarative syntax for driving our applications. But some cases required interacting with elements outside of our app, such as the camera. For those we used UI Automator. We settled on Espresso for the majority of our tests, with UI Automator for interactions outside of our apps. We found that Espresso and UI Automator interoperated quite well.

We then integrated those tests into our build system, and set them up to run on BrowserStack from Jenkins. Once those initial tests were working, we converted our remaining golden test cases into automated tests.

Backend Integration

Rather than using a mocked backend, we decided to use a live backend. In particular, we decided to use the backend in our pre-production environment. The code in pre-production matches the code that is in production, but uses test data. This allows us to test the same code that runs in production, but because it is a mirror it does not interfere with it.

By testing with this live backend, not only would our tests verify our Android code, they would verify the interaction between our Android apps and the server. That would mean our tests could catch backend changes that would break Android (such as invalid network data or unintentional removal of endpoints). Conversely, we could catch changes in the Android code that would be incompatible with the backend. Thus to ensure this harmony, we decided to use a live backend for these tests.

Test Fixtures in the Backend

One of the concerns we had was around setting up test fixtures. Since we were using a live backend, our test data could not be encapsulated within the test class itself. To overcome this, we wrote each automated test to inject test data into pre-production before it would run. It typically involves each test doing the following:

  • Create a test user in pre-production.
  • Create the needed data for that test user.
  • Run the test as that user.

By having each test create a separate test user with its own data, it ensures that our tests are hermetic and repeatable.

How We Run Our Tests

We have a Jenkins job that, on regular intervals throughout the day, builds the latest code for our apps and runs our integration tests against that. We run our tests on BrowserStack’s App Automate. When the tests finish, we generate a report that shows which tests succeeded and failed. For each test in that report, we show the test logs, the logcat, a screenshot of how the app looked when the test finished, and a video of the test. The screenshot has been especially useful for when a test fails. It allows us to see what was on the screen at the time and how it differed from what we expected (for example if an error message appeared on the screen). It is often easier to parse than a stack trace.

A link to the results is also published to our team’s Slack channel. This ensures we are persistently notified of the health of our tests.

We do not run these tests on every commit like we do unit tests. That is because these tests take many minutes to run, and consume a lot of resources. Thus we only run them at certain hours of the day. But with those regular intervals we get frequent signals on the general health of our app (and backend) well before our weekly releases.

We also run these tests at the time we release. When we cut a new release, we run our integration tests on the release branch. If the tests are green, we proceed to shipping that release to Google Play. This ensures that we have run our critical tests on the actual code we ship to users.

Monitoring Test Failures

Every week we have an on-call engineer that monitors the health of our releases in production. The on-call also monitors the health of our integration tests. Whenever a test fails, our on-call will triage the issue, and either fix it or reach out to the relevant engineer.

Issues Caught by Our Tests

Our integration tests run on well-established user flows that tend not to change much. As such, most of the time our tests do not surface bugs, which is good. But there have been a few cases where they have. Most of them were due to backend code or configuration changes that broke our apps. But there have been a few cases where the bugs were in our Android code.

Our tests will also fail if we make an intentional product change but forget to update the test to match it. For example, we may turn on a new feature as an experiment, which is on for all users in pre-production but only some in production. When that happens, we will either update our test, or our test will turn off that experiment for the test user it creates (so we continue to test the old flow). If this new feature is launched, we update the test. These situations do not represent actual bugs, but they are still useful because they alert us to intentional changes in our products.


Anyone who writes UI tests knows they tend to be flaky. Our integration tests are also victim to this. In fact, it has been our biggest challenge with them.

Some of it comes from how we write Espresso tests. It is tricky to ensure that tests perform UI events and assertions only when the right conditions exist. Espresso helps a lot by always waiting for a clear message queue. Square’s RxIlder has also been a boon. But there have been some cases where even those have not helped. Random cases of flakiness sometimes pop up that we cannot explain.

But the largest source of flakiness comes from our use of a live backend. Any time there are network issues or other technical issues with pre-production, our tests fail. And because it is used by many other developers at the same time, any changes they may make to it can destabilize it, which can cause our tests to fail more frequently.

We have addressed some of these issues by hardening some of our tests (e. g. ensuring that we are only interacting with concrete UI elements). In cases where idling resources do not work as expected, we have added check-with-retry loops. We have also wrapped each of our tests in bounded retry loops to overcome transient failures (such as network glitches). Finally, we have at times disabled some tests that have had a history of being flaky, so that they would not create noise in our test reports.

Flakiness is still an issue we continue to address. But our efforts have significantly reduced these cases.

Where We Are Now

It has been months of effort. But we no longer do weekly manual QA. We have converted all our golden test cases into automated tests. They run several times a day. They also run whenever we cut a new release. This means we are testing our Android apps against our backend code and surfacing any bugs as a result. That, along with unit and screenshot tests, means that we are running a full pyramid of automated tests on our Android products.

Gone are the days in which our mobile engineering teams huddled around a box of donuts, clicking through repetitive test cases. Our integration tests provide consistency, repeatability and the top of the test pyramid. And all for zero carbohydrates.

If you also share a passion for writing high quality Android software, come join us! We would love to have you aboard.

About Thumbtack

Thumbtack ( is a local services marketplace where customers find and hire skilled professionals. Our app intelligently matches customers to electricians, landscapers, photographers and more with the right expertise, availability, and pricing. Headquartered in San Francisco, Thumbtack has raised more than $400 million from Baillie Gifford, Capital G, Javelin Venture Partners, Sequoia Capital, and Tiger Global Management among others.

Thumbtack Engineering

From the Engineering team at Thumbtack