Better Android Testing at Airbnb — Part 2: Screenshot Testing

Published in

The Airbnb Tech Blog

7 min readNov 30, 2019

In the second part in our series on Android Testing at Airbnb, we look at how screenshot testing is used to automatically test our Fragments’ UI.

In our first article we looked at the MvRx mocking system, and the Launcher that opens any screen or mock in the app. While this is great for manual testing, the biggest benefit of mocks is the power they unlock in automated testing. To take advantage of this, we started by automating screenshot tests which detect visual changes across commits.

This helps to catch a slew of issues:

Padding, color, and styling changes
Logic issues with how data is presented
How edge cases, like null or empty data, are displayed
Right-to-left (RTL) layout regressions
Changes introduced by a library version change, such as a new RecyclerView or ConstraintLayout release

Testing all of these manually takes a long time, and we found that issues often slipped through our developer testing and QA process. Especially on complex screens it was really easy to miss these types of regressions. Screenshot testing catches these issues, and in addition, adds basic sanity checking that the Fragment can be run without crashing. This was the lowest hanging fruit for us, and was fairly easy to add once we had our mock infrastructure in place.

We implemented this in a few steps:

Built an Android library to screenshot an activity and upload the bitmap to cloud storage
Leveraged Happo to provide a web UI to show bitmap differences across branches
Setup our CI to screenshot each mock, generate a diff report with Happo, and post the results back to a PR

Each of these steps provided a unique set of challenges.

Screenshot Library

Capturing a screenshot is fairly straightforward, and there are many libraries that do it; however, we built our own because we had a few important requirements.

First, we needed to capture the entire view hierarchy, not just what was visible on the device screen. We use RecyclerViews extensively, and these do not lay out content off of the screen, so that content is not captured in normal screenshots. To work around this, we manually measure the activity view and allow it to take as much height as it wants. Then, we force a layout. In order to simulate a real layout pass we also call registered layout listeners and pre-draw listeners — these may request another layout pass, in which we case loop until all views are laid out. Finally, we draw the entire view hierarchy to a Canvas and save it to a Bitmap.

Another requirement was to minimize visual differences across test iterations (ie flakiness). The library has a few ways to minimize these differences:

Disable EditText cursors, which otherwise blink on a timer
Clear focus of each view, which can be inconsistently set
Invalidate and requestLayout on each view, which is necessary to clear measurement caches, drawable states, and ensure that each view will completely redraw itself
Clear the Resource drawable cache, since shared drawables can cause unpredictable pixel aliasing when they are redrawn at different sizes.

Our library uploads each bitmap to cloud storage, and compiles all of the screenshot urls into a report that is uploaded to Happo. The resulting Happo report is keyed to a git SHA, and Happo can compare it against any other SHA reports to find changes across branches.

Finally, since we have to process thousands of screenshots, it was necessary that our library be as performant as possible. To achieve this, it uses coroutines to process and upload bitmaps concurrently.

Because we are working with bitmaps, OutOfMemory exceptions are a threat. This is exacerbated by our approach of laying out entire RecyclerViews, which can be infinitely long. To prevent issues we truncate lists in mock data to three items, but inherently long screens must still be supported. By efficiently reusing bitmaps, and enabling a large heap, we are able to screenshot views up to 40,000 pixels long.

We have had issues with flaky network requests to both AWS and Happo, which can timeout or encounter other issues outside of our control. Wrapping all of these requests in retry logic with exponential backoff greatly increased test stability.

The result is that for each mock we have a corresponding image url that visually represents it. An md5 SHA of the bitmap is used as the file name, and allows us to easily check if two images are the same.

Happo

Happo is an external service that we leverage to run bitmap comparisons for us. It offers a host of nice features such as

Viewing the bitmap history of a screen, to see how it has changed over time
Email alerts notifications when screens you care about have changed
UI to mark diffs as flaky, which is stored for future comparisons
Blocks Github PR’s until a diff report is approved
A web UI for viewing all screenshots in a report, and identifying visual changes across reports

A Happo screenshot diff showing a change in price per night styling

Here Happo shows a change in how the price per night is displayed. The diff allows the engineer to check that their PR has the intended change, and code reviewers get a better sense of what the PR does. Any unintended changes are easily caught and fixed before merging.

This approach is called Approval testing, and there are many benefits:

Minimal effort needed to update tests. Engineers only need to view the diff and accept it if the changes are expected. The new report is automatically updated as the standard.
Complete coverage of the UI rendering flow. No manual tests for UI need to be written.
Easy testing of UI edge cases. This system scales to support as many mock variants as we need.

CI Setup

There was some non trivial work to hookup all of these pieces into a cohesive testing experience for the end developer. This is described in detail in Parts 5–7 of this article series.

In short, the test framework we built automatically looks up each mock in the app, loads it on screen, and then allows us to run our test on it. This is done in a generic way that allows us to apply any test validations we want — in this case, screenshot testing.

The end result is that each PR runs a blocking job to generate screenshots and compare them. If any differences are found, a comment is posted to the PR with the differences and a link to the Happo report.

A Github PR comment indicating visual changed detected by Happo

This makes it clear to both author and code reviewer that the PR caused a UI change, and shows exactly what the change was. This has been a huge help in catching regressions and preventing unintended code changes.

Additionally, developers don’t need to do any extra work to set up screenshot tests for their Fragments. They simply add mock definitions to their Fragment class (as explained in Part 1), and the test framework automatically picks them up and generates the screenshots.

Additional Tests

Once we had this mock testing system set up, it was easy to tack on additional checks:

We setup LeakCanary instrumentation testing to run while each mock is screenshotted. This makes it easy to automatically detect and fail the test if the Fragment, View, or Activity is leaked after the test ends.
Once our Happo library lays out the entire Activity, we run the Espresso AccessibilityChecks assertions on it to catch common accessibility violations on the screen.
The Fragment Arguments and State are run through a process that mimics process recreation with state saving and recreation. This checks that they can be parceled and restored without crashing. We also screenshot the result of the recreation so we can see how the Fragment handles restoring saved state.

The fantastic thing about the test framework is that it lays a foundation for automatically showing every screen in the app and running dynamically generated test code on each one. Each of these additional tests is setup with just a few lines of code and instantly applies to all Fragments in the app, with no extra work required from developers. A product engineer’s initial effort to create Fragment mocks continues to pay off as we increase testability purely from the infrastructure side.

Next: Testing Event Handling

In this article we looked at how we test the static UI content of screens. However, much of a feature’s code deals with event handling, such as navigating between screens, updating state, or executing a request.

In Part 3 we’ll take the idea of UI screenshot comparisons and see how we’ve applied it to interaction testing to automatically test event handling code.

Series Index

This is a seven part article series on testing at Airbnb.

Part 1— Testing Philosophy and a Mocking System

Part 2 (This article)— Screenshot Testing with MvRx and Happo

Part 3 — Automated Interaction Testing

Part 4 — A Framework for Unit Testing ViewModels

Part 5 — Architecture of our Automated Testing Framework

Part 6 — Obstacles to Consistent Mocking

Part 7 — Test Generation and CI Configuration

We’re Hiring!

Want to work with us on these and other Android projects at scale? Airbnb is hiring for several Android engineer positions across the company! See https://careers.airbnb.com for current openings.