Better Android Testing at Airbnb — Part 5: Test Architecture

Published in

The Airbnb Tech Blog

8 min readDec 23, 2019

In the fifth part of our series on Android Testing at Airbnb, we take a close look at the architecture of our integration testing framework.

In previous installments of this series we laid out our state mocking system, and how we built UI integration testing on top of it. In this article we go into depth on how the system is architected, how idle detection is approached, and how errors are gracefully handled.

This includes a fair amount of implementation depth, with the goal of providing enough detail so that others could recreate a similar system while avoiding much of the trouble we encountered.

Architecture of the Test Framework

Our integration tests are run with Espresso, but have a fairly complex test harness built on top. This is because we need to be able to easily setup fragments, manipulate their views, and tear them down, which is difficult to do via direct Espresso API’s. We also don’t make normal JUnit or Espresso assertions in our tests, but instead take screenshots, programmatically click through the view hierarchy, and upload report files.

Leveraging a Base Activity

We use a custom Activity (named IntegrationTestActivity) to run the tests, which lives in a library module in our app. This gives it direct access to manipulate Fragments during the test. The module is included as a test dependency so it isn’t shipped to production.

On the JUnit side, a single test launches the IntegrationTestActivity with a Fragment name as a String extra. The activity uses the Fragment name to reflectively access mocks declared for that Fragment. The Activity then runs through all mocks, setting each one up, applying some action to it (such as screenshotting it), and then tearing it down and going on to the next. After all mocks are processed the Activity cleans itself up and marks its IdlingResource as idle.

IntegrationTestActivity is an abstract class that does the work of managing the mocks. It also overrides all of the activity functions to prevent them from being invoked by a Fragment under test (as mentioned in the previous article about interaction testing). Subclasses simply implement a function to run validations on the mock, such as taking a screenshot or executing the interaction testing.

For example, our activity that takes screenshots of each Fragment mock is just the following code.

Similarly, our interaction testing activity is another simple subclass.

Note how these leverage the mock framework to consume each Fragment in its mocked state, without having to worry about setting up the fragment, waiting for the view to be stable, tearing down the UI, or any other tedium that is normally inherent to integration tests.

Additionally, we can very easily create new subclasses to process these mocks with any other checks we may want to make.

Running the Test Activity

While one of these activities runs, the JUnit test waits and listens for when the Activity is done so that JUnit can finish the test. This is achieved with a custom Espresso IdlingResource declared in the IntegrationTestActivity’s module. It is through this idler that the JUnit test and the Activity can communicate.

The Idler is registered with Espresso in a custom JUnit TestRule, and we leverage Espresso to wait for it. Normally this is done by invoking a view assertion, but since we don’t use Espresso assertions we instead tell Espresso explicitly to wait for Idlers by calling Espresso.onIdle().

On the JUnit side our test then looks roughly like this:

Notice how the test declaration for each Fragment is a single line. You’ll see how we automatically generate this for each Fragment later.

In review, the benefits of this single Activity hosting approach are:

Directly manipulate Fragments and views for screenshot and interaction tests
Blocking function calls to the Activity, such as finish, so Fragments under test can’t unwittingly affect the test framework
Provides mocked Fragments via a generic base class, which can be subclassed for specific test needs

However, there are also challenges with this approach, which require some complexity to work around.

Idle Detection

Normally Espresso handles idle detection automatically. For example, an Espresso ViewAssertion call waits until the UI is idle before making the assertion. In our custom test activity we aren’t making assertions via Espresso, and also don’t have access to Espresso’s underlying API’s to get callbacks on idle.

However, IntegrationTestActivity needs to show a Fragment and reliably know when it is fully laid out so that our tests can run on it. This means we needed to build our own implementation of idle detection. This is tricky for a few reasons:

Accounting for all sources of asynchronously running code is like playing whack-a-mole, and may not even be possible if a feature runs custom async code that the test harness doesn’t know about
Not accounting for all async code can lead to test flakiness
Being overly defensive, and waiting for too long, can unnecessarily extend test times, or even make tests time out

Fortunately, our task is simplified because our Fragment data is completely mocked out. We don’t need to worry about waiting for network requests or database queries. We simply need to make sure that we wait for anything that may affect the UI.

The most important aspect to this is waiting for the main thread to be idle, since that is what processes UI updates. Conveniently, Handler exposes an API to allow us to do just that!

Handler().looper.queue.isIdle

We could simply poll this function until the queue is idle. However, there is a problem with this approach. The Handler’s queue may report being idle while the last Runnable is still being run. That is, a runnable is dequeued and then run, so we can tell when the queue is empty, but can’t know whether the last Runnable has finished being run.

Instead we post our own Runnable to the Handler, and when it is run we check whether the queue is empty. This lets us flush the queue before checking its status. A basic approach looks like this:

This conveys the basic idea, but a few things should be added to make it production ready:

The post loop inside HandlerIdleDetector must be canceled once you are done with it to avoid it continuing indefinitely
Instead of polling, a callback system can be used along with coroutines to gracefully sleep while we wait for idleness
A timeout system can be added to provide a more robust error message if the Loopers never go idle
This cannot be run on a thread included by one of the Loopers, since they would be competing to run on the same thread
It generally isn’t enough to just wait until all threads are idle. Notably, some view updates, like animations, are posted to the next animation frame. Make sure the loopers all stay idle for at least one frame (~16ms).

We also use Epoxy extensively in our UI, which does some processing on a background thread. In fact, it is quite common for RecyclerView setups to use an asynchronous thread to handle diffing of RecyclerView changes, so this will be a common use case for many people. We built our idle detection to take a list of Loopers to watch, and return when they are all idle. This makes it easy to add new threads to the list.

One caveat to this approach is that any Runnable posted to a later time with postDelayed is not accounted for (Espresso cannot account for these either). To prevent flakiness from these we don’t allow these calls in our Fragments. If a Fragment needs to delay a Runnable (mainly for animation purposes) then we have a utility function to wrap the call, which forces the delay to zero in test builds.

Error Handling

If a test throws an exception, our goal (as creators of a test framework) is to present that exception to the end developer as clearly as easily possible. Developers shouldn’t have to dig through logs, understand code in the test framework, or otherwise deal with any overhead in understanding the root cause of a crash. We’ve done a few things to make this easier for them.

First, if a crash happens off the main thread, say in some async task, then the test runner normally reports a generic message:

Test failed to run to completion. Reason: ‘Instrumentation run failed due to Process crashed.’. Check device logcat for details

This is unhelpful, and forces extra work on the developer. Instead, we use a test rule to register a default exception handler for all threads, including RxJava and coroutines. This handler passes the exception along to the main thread so that Espresso can handle it and properly display the error as the test failure message.

Even if the correct error message is provided, an engineer might have trouble debugging it without the context of what was happening when it was thrown. A single test can cover many mocks, so it is difficult to tell which mock was being test. One could dig into the test logs, but we wanted to make this easier. For example, we can provide information about which mock was set on the Fragment, which View was being clicked, or any other pertinent information about what the test framework was doing when the exception was thrown.

To accomplish this we maintain a stack of Strings representing “test context”. The framework can push any String onto the stack it wants, and pop it when that phase is over. For example, the stack may have these two strings:

Loaded ‘failure state’ for BookingFragment
Clicking ‘book_button’ view

Then, since we already have a system in place to catch thrown exceptions, we can wrap the exceptions to provide our context to the developer.

These techniques have made it easier for developers to address errors in their code, and to be more independent. As a maintainer of the test framework this is important in reducing the number of times I get directly messaged to help somebody debug their test failure.

The last piece of our error handling approach relates to how failures are surfaced to a PR on Github. This is covered in detail in a later article about CI.

Next: Obstacles to Consistent Mocking

This article detailed the implementation of our test framework, issues we ran into, and design decisions we made to make the system robust and scalable.

In the next article we will look at common reasons the test framework may be flaky, and how we can solve them at the root.

Series Index

This is a seven part article series on testing at Airbnb.

Part 1 — Testing Philosophy and a Mocking System

Part 2 — Screenshot Testing with MvRx and Happo

Part 3 — Automated Interaction Testing

Part 4 — A Framework for Unit Testing ViewModels

Part 5 (This article) — Architecture of our Automated Testing Framework

Part 6 — Obstacles to Consistent Mocking

Part 7 — Test Generation and CI Configuration

We’re Hiring!

Want to work with us on these and other Android projects at scale? Airbnb is hiring for several Android engineer positions across the company! See https://careers.airbnb.com for current openings.