The Airbnb Tech Blog
Published in

The Airbnb Tech Blog

Better Android Testing at Airbnb — Part 3: Interaction Testing

In the third installment of our series on Android Testing at Airbnb, we explore an automated system we’ve built for testing user actions.

Interaction Testing

This “interaction handling” code can contain complicated logic and be a common source of bugs. It also represents a large percentage of the code in a product feature, so for high code coverage it is important that it is well tested.

Testing interactions is fairly straightforward with Espresso — clicks can be manually forced and assertions on results can then be made. However, these tests are brittle for a variety of reasons:

  • Views are manually identified by id or position, which commonly change across product updates
  • Views in scrollable lists must be scrolled to
  • Asynchronous results must be waited for, which can cause flakiness or require extra code to handle correctly

Even if these are addressed, manually writing tests for all possible interactions on a page is tedious, and likely to exclude small details such as the arguments that are passed or network requests that are made.

Just as screenshot tests automatically detect UI changes, we built a similar system to detect changes in interaction handling. We then leverage Approval Testing techniques to automatically update our tests. This allows us to automatically verify each screen’s behavior without writing any traditional Espresso tests.

The philosophy behind this is based on the following:

  1. All changes that result from a click are measurable, and can be represented with a textual description.
  2. All views in the activity hierarchy can be programmatically clicked and the results measured, allowing us to generate a report that maps each view to its onClick behavior.
  3. We can test a screen in isolation, and define its interface as any actions that may affect other screens, such as starting a new screen or returning a result.
  4. We don’t need end-to-end tests that link screens as long as for each screen we test how it handles possible inputs (mock states and arguments) and validate correct outputs (actions that affect other screens)

Our implementation of this is as follows:

  • A mock is laid out and we wait for its view to stabilize.
  • We iterate through each view in the fragment view hierarchy and programmatically click it
  • After each click, we record any actions that result, blocking them from actually occurring
  • A JSON file is produced that defines the results for each view
  • JSON files are diffed to detect changes with interactions, exactly as we do for screenshots

This technique works surprisingly well, and has a lot of parallels with screenshot testing. In fact, we can reuse much of the same infrastructure we already built to run screenshot tests. Let’s look at each step in detail.

View Layout

Subclasses of this base Activity handle specifics of a test, such as screenshotting or performing clicks.

Iterating the View Hierarchy

First, supporting RecyclerViews means that we need to programmatically scroll the screen down to reach every item. This requires asynchronously waiting for the new views to be laid out before continuing with the depth first search.

Second, each click can potentially change the view hierarchy, so we can’t immediately continue testing the next view. For example, the click could trigger a fragment transaction, show a dialog window, or expand some ViewGroup. Instead, we need to reset the view hierarchy to its initial state and then resume iteration from the previous view in the hierarchy.

The reliability of the test hinges on our ability to accurately reset the app to the original mock view after each click. This lets us smoothly continue on to test each subsequent view.

To support this resetting, we run the entire test in a single activity. After each click we remove all fragments and then add back a new instance of the mocked fragment.

If an AlertDialog was shown as a result of a click we programmatically close it. There is no clean way to do this, and we instead rely on reflection to access the global window manager.

Before each click we store our traversal location in the view hierarchy. After the view is reset the test resumes from that point. This traversal path is represented as a list of the child indices of each view group in the hierarchy.

Recording Interactions

  • Android framework level results, such Fragment transactions or starting/finishing an Activity
  • Airbnb app-specific events, such as submitting a change to MvRX state or executing a network request

Ideally the test framework can automatically record any results affecting the Android framework, but we needed a clean way to also detect changes to any of our internal systems.

Detecting Framework Results

We also check whether a result was set on the activity, so we wait until the interaction is over and then use reflection to inspect the values of the Activity’s result code and result data. This allows our test to catch changes to returned results on click.

To detect Fragment transactions, a FragmentLifecycleCallbacks is registered on the test activity, and recursively detects any changes to the Fragment stack. It records the ending state of the fragment stack after everything has stabilized. We also record the arguments that each fragment contains, so we have a record of which arguments each fragment was started with.

Finally we use reflection to access WindowManagerGlobal and check for windows added as a result of the click. If it was an AlertDialog or BottomSheetDialog we can get information about it such as the title, message, and button text. We also force close dialogs to prevent them from sticking around as the test progresses.

Detecting Changes to Custom Systems

Instead we leverage interfaces and dependency injection to decouple the test interaction recording from the actual system. Here’s how we approach it.

  • Create an interface that knows how to report actions to our test runner
  • Use a test Dagger module to override creation of each dependency, and mock it to instead invoke the interaction reporting interface.
  • Use Dagger multi binding to collect these reporter interfaces into a set that the test runner can be injected with.

A well thought out dependency injection graph, combined with multi binding, is crucial for this to work well. Once it is set up, it is extremely powerful because it allows us to measure and catch changes to how every click in the app interacts with our services.

Capturing Non-Visual View Data

  • The contentDescription of a View, to check accessibility configuration
  • The url loaded in a WebView or ImageView
  • The configuration settings of a video view

To support this, the view iterator calls back with each view and gives us an opportunity to check its type and add arbitrary information about it to the report. This makes it extensible for any custom views or data about the view we want to capture.

Knowing When an Interaction is Over

This idle detection is discussed in detail in Part 5 (coming in a few weeks).

JSON Report Output

This JSON object declares the behavior of a single view on the screen. The full report will have an entry like this for each clickable view.

The top level JSON object key identifies the view in the hierarchy. We use the view ids of each of the view’s parents to construct a chain that allows us to uniquely identify the view on screen.

fragment_container->coordinator_layout->recycler_view->AccountDocumentMarqueeModel->link_text

We also note that this is a TextView in a RecyclerView. It’s within a AccountDocumentMarqueeModel, which is an Epoxy model representing the item view. Details like these allow developers to easily figure out where on screen this JSON refers to.

Finally, the report notes what happens when the view is clicked.

This represents that we are opening a UserProfileFragment in a MvRxActivity, and also notes the arguments and request code that are being passed with it.

Through trial and error we arrived at this format with these points in mind:

Readability

While the report can contain metadata to help the user more easily identify which view has been affected, counter intuitively this should be minimized because it can harm consistency.

For example, if the metadata includes a RecyclerView item’s index (to make it easy to read which item changed), then if a new item is added it can change the indices of all other items and caused a large change in the report. In this way the goal of readability and consistency can be at odds.

While readability is important, it should not come at the cost of diff-ability or consistency. An item in the report should ideally only be shown as changed if its behavior was actually affected, otherwise reports become too flaky and burdensome to read.

Diff-ability

It is critical that each view has an identifier that is both unique within the view hierarchy and stable across branches. This identifier is the key to the set of changes associated with the view, and changes to the key result in a confusing diff. We use an identifier that represents the child view of each view group in the hierarchy above a view. We avoid index when possible, because that is subject to change when other views are added — instead, we use view id.

If a diff shows that something changed, we need it to be easy for engineers to read the diff and identify the difference. If this isn’t easy then they are more likely to ignore a diff when it may represent a real regression.

Consistency

JSON diffs are harder to read than screenshots diffs — screenshots are fairly obvious in indicating a visual difference whereas JSON diffs can require some study to understand what has changed (which is why the report must have good diffability).

For these reasons, consistency is very important, and we have made some design decisions to optimize for it. For example, JSON object keys are sorted to avoid diffs that may be a spurious change in action order

One consistency problem we ran into was that the text representing data (such as Bundles or Intents) may not be consistent across runs.

There are two main reasons this may happen.

  1. A class does not implement toString(), and instead uses the default implementation where its hashcode representation is used — eg Person@372c7c43. To combat this, we use reflection to generate a consistent String representation based on properties in the class, recursively. We do this if we see the hash pattern in the original toString(), or if the object is a Kotlin data class.
  2. If an object is an integer it may represent an Android Resource value. While these are constant for a single build, the integer values representing the same resource can change across builds as other resources are added or removed. To stabilize this, we use our reflection based string representation from (1) to lookup integers in the resource table, and if there is a match we use that resource name (eg R.string.title_text), instead of the integer value.

Kotlin data classes are targeted for a custom String representation because of point (2) — the data class is commonly used to pass arguments and was the main place we saw String resources showing up. Additionally, since their toString() is already generated and unlikely to be custom it is safer for us to replace it with our own generated representation.

How Various Actions Are Represented

Finishing an activity

Emitting a log

View properties

We capture other properties too, such as urls set on images.

Selecting a Toolbar option

Fragment transactions

Network request executed

Updating ViewModel State

We can even explicitly call out additions or removals to Lists and Maps.

Overall, this JSON system allows us to record information as granularly as we want, which makes our tests extremely comprehensive. Manually writing Espresso test to assert these same checks would be tedious to the point of impractical. Instead, all of our data is generated automatically, viewed through a nice UI, and changes are approved and updated with a single click.

Diffing Reports to Find Changes

The JSON snapshots are combined with UI screenshots to create a single report representing the behavior of a branch. Happo’s web UI shows both JSON and UI diffs, and our JSON diffing leverages all of the existing tooling that Happo offers, such as change subscriptions and component history.

For example, here is a report for a PR that changed the behavior of a ToggleActionRow to make a GraphQL request for a ListingsQuery. We’ve automatically captured the behavior change and can present it clearly.

Interaction report JSON diff showing a new executed request that happened on click

Additionally, we didn’t have to make any changes to our CI setup because these are just additional JUnit tests added to our existing app instrumentation test suite. The JSON diffs are added to the existing Happo report that the screenshot tests create. This is explained further in a subsequent article on our CI setup, and shows how easy this system is to extend.

Possible Future Extensions

While we have first focused on capturing common actions and low hanging fruit, we don’t yet completely capture all possible interaction behavior on a screen. We can continue to improve our test coverage:

  • Finding EditTexts in the view hierarchy, programmatically changing text, and observing results
  • Capturing the behavior of onActivityResult callbacks
  • Recording what happens when the fragment is setup or town down (such as network requests or logging) and including that in the final report

Next: Testing ViewModel Logic

In Part 4, we’ll look at how we use unit tests to manually test all logic in a ViewModel, as well as the DSL and framework we created to make this process easy!

Series Index

Part 1— Testing Philosophy and a Mocking System

Part 2 — Screenshot Testing with MvRx and Happo

Part 3 (This article)Automated Interaction Testing

Part 4 — A Framework for Unit Testing ViewModels

Part 5 — Architecture of our Automated Testing Framework

Part 6 — Obstacles to Consistent Mocking

Part 7 — Test Generation and CI Configuration

We’re Hiring!