EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Improving iOS UI Testing

How to make iOS UI Testing fast and reliable

Corbin Montague

Published in

Expedia Group Technology

13 min readSep 15, 2020

man running swiftly down a large sand dune — Photo by Rémi Jacquaint on Unsplash

Introduction

In 2018 we started writing user interface (UI) tests for the Vrbo™️ iOS app. UI tests are incredibly valuable, but also very different to write and maintain in comparison to unit tests. If your code isn’t structured well to support UI tests, and your contributors aren’t educated on best practices, they can quickly become unwieldy. This was all too evident at Vrbo, where after just a couple of years our iOS UI tests were a broken mess:

64% of UI tests consistently failed, or 100 out of 157.
On average, it took 32.7s to run each UI test, or 85.5min to run all 157 UI tests.
On average, it took 381s to perform a clean build of our UITests scheme and 44s for a follow-up build.

By establishing better software patterns, leveraging 3rd party tools, and gaining a deeper understanding of Apple APIs, we’ve built a new UI testing approach that is faster, more stable, and easier to comprehend.

UI tests pass much more consistently (Less than 10% of our UI test builds failed intermittently prior to Xcode 12.3. Unfortunately, that Xcode version introduced some issues we need to work through. I’ll try to update this blog when we address them.)
On average, it now takes 14.7s to run each UI test, or 353s to run 24 UI tests. This is 55% faster than before and without being parallelized yet, so they will only get faster!
Most UI tests now live within our experience modules: Xcode projects that encapsulate an isolated part of our app’s user experience and generate a framework for our top level project to consume. In these smaller, more isolated projects, it only takes 170s to do a clean build, and 25s for a follow-up build. This is 55% faster clean build times, and 43% faster follow-up build times.

Our initial approach was certainly flawed, but having the ability to write fast, stable UI tests is critical to scaling native apps at Vrbo, so let’s take a closer look at UI testing on iOS, what went wrong, and how we can do it better.

What is UI Testing?

Unlike unit tests which verify the correctness of business logic, UI tests verify the user interface. These tests interact with an app’s UI in the same way a user does, without access to internal functions and variables. There are two processes involved when UI testing on iOS: a test runner and the app being tested. XCUIApplication is a proxy for the app, and UI tests call XCTest APIs to tell the test runner how to interact with the app’s UI. Verifying an app’s functionality through the lens of a user is powerful, and that’s exactly what UI tests do. Like unit tests, they increase code coverage and lead to more stable releases, but they also give you a level of confidence that unit tests can’t because they actually tap on your buttons, navigate between your views, and verify the content in the same way your users will. After all, users don’t care if your functions work, they care if your app works.

Why Write UI Tests?

Unit tests can verify the UI too, so if UI tests are slower, less stable, and more difficult to write, why even bother? I hear this come up all the time, and it’s a valid question, so let’s discuss it. It’s true that unit tests can create views and reference code in those objects to verify code paths that are executed when a user interacts with your UI, but they are testing these code paths in isolation. UI tests actually simulate a user interacting with your app in real time. So while a unit test can verify that func buttonTapped() behaves correctly when called, a UI test can verify that clicking that button behaves correctly in real time, when the app is running in the hands of your users. If the connection between the button and buttonTapped() function wasn’t set, the unit test wouldn’t notice, and that’s a bug. Unit tests also don’t setup a root Window or have dedicated APIs for testing the UI, so trying to do so is usually clunky. The biggest advantage to verifying your UI within unit tests is speed, so much so that popular iOS UI testing frameworks like KIF do exactly that. That said, writing traditional UI tests (using Apple’s UI Testing Bundle) has advantages:

You can record interactions to generate UI testing code. It’s far from perfect, but it can give you a good idea of how to get started.
Referencing Apple’s UI Testing APIs makes interacting with the UI much easier than from a unit test.
UI tests verify your code through the lens of a user, automating the verification of user flows in a way unit tests can never do because they only test in isolation.

UI Testing in 2018 — A Retrospective

When UI tests were first introduced at Vrbo in 2018 the goal was to automate the verification of common user flows to alleviate the workload on our QA engineers. We built a workflow on Jenkins to run them nightly since we were uncertain if their stability and speed would degrade our PR builds. This turned out to be true, but was also their ultimate demise. The nightly build wasn’t watched closely, and developers rarely ran UI tests locally before pushing up a PR since they took over an hour to run. As a result, PRs commonly broke UI tests unintentionally. We would eventually recognize the breakage and fix it, but over time, as the app grew along with the number of contributors, our UI tests degraded significantly. Let’s look at the key issues that plagued our initial UI testing approach:

Continuous Integration and Continuous Delivery (CI/CD)

UI tests were never added to a CI/CD process tied to PR builds (because they were too slow and unstable). Without a feedback loop in place to notify contributors that their PR broke a UI test, code was commonly merged into master that broke UI tests.

Stability

UI tests ran against Stage endpoints meaning their stability was always tied to the stability of those endpoints. We needed a way to mock the data, but UI tests don’t have access to internal functions and properties so they can’t easily pass around mock objects like unit tests.
When writing UI tests, developers commonly forgot to wait on the existence of a UI element before interacting with it.
Running tests on CI/CD is slower than running them locally, and some code paths are not synchronous. Contributors need to understand what code paths their test executes and choose appropriate timeouts when waiting on the existence of UI elements before interacting with them.
Sometimes tests just fail. This happens more frequently with UI tests compared to unit tests due to the asynchronous nature of having a test runner interact with your app’s UI. Having a solution in place to retry failed tests can dramatically reduce the number of false negatives. We had no such solution in place.

Speed

Poorly written assertions can force UI tests to wait for long timeouts to expire before continuing to execute a test. Many cases like this existed in our original UI tests.
UI tests are inherently slower than unit tests. There are two processes instead of one, and between each test run the app has to be torn down and launched again. Taking actions on an app’s UI is also much slower than executing code line by line as a unit test does. Due to the inherently slow nature of UI tests, they often need to be parallelized. Ours were not.
Our UITests scheme was slow to compile which in turn made writing UI tests a slow process. We modularized our codebase, but never setup our modules to support UI testing against their demos in a more isolated environment. This meant all UI tests continued to run against our top level project and never gained the compile time benefits of modularization.

Code Structure and Ease of Comprehension

Most of our UI testing code was undocumented.
There was a lack of internal documentation to educate contributors on best practices and debugging techniques when writing UI tests.
There was a lack of SOLID software patterns in place to better structure our UI testing code.

UI Tests and CI/CD

There’s not much to say here. UI tests need to be part of CI/CD just like unit tests. They need to be run on PR builds otherwise contributors won’t know if they broke something. The key is really ensuring that your UI tests are stable and fast enough to be run on PR builds without making that process a major pain point for contributors, so let’s focus on those topics.

Writing Stable UI Tests

Stub Network Requests

Having UI tests hit live endpoints can be a useful tactic for writing end-to-end (E2E) tests that verify backend services in addition to the UI, but these tests are hard to automate as part of PR builds because their stability is always tied to the stability of the endpoints being hit. We wanted the flexibility to continue writing these kinds of E2E tests, while also having the option to stub network requests so we could write more stable UI tests that run on PR builds. Enter Swifter!

Swifter is a tiny HTTP server engine that runs on a local port and synchronizes network requests so we can stub them while testing. We wrote a MockServer class that wraps around Swifter and exposes APIs for easily stubbing network requests. The end result is code like this:

/// Tests happy path for logging into the app.
func testLogin() {
    let stubs = [
        HTTPStubInfo(urlPath: "/path/to/authEndpoint", filename: "authenticate_success", method: .POST),
        HTTPStubInfo(urlPath: "/path/to/pull/profile/data", filename: "userProfile_success", method: .POST)
    ]
    stubs.forEach { server.addJSONStub($0) }
    
    login(emailAddress: "foo@gmail.com", password: "password")
}

In this example, we create stubs for authentication and profile requests, tell our MockServer instance to add those stubs, then call a helper function that navigates to our login view, enters an email and password, and attempts to login.

Wait For Existence

When writing UI tests, you should always call waitForExistence(timeout:) before interacting with a UI element.

Consider the code below which grabs a reference to a button, but does not wait on that element to exist before interacting with it. If we have just navigated to a view containing this button, the test runner may attempt to tap the button before the view has even appeared on the screen. In this case, the test will fail on line 2 because the button does not exist yet.

let button = app.buttons["SomeButton"]
button.tap()

Instead, we need to wait for the button to exist before interacting with it. The code below will give the test runner 5 seconds to wait for the button to exist. From a performance standpoint, it’s worth noting that the test runner will not wait for that entire timeout duration to expire before checking its existence. Instead, it will check at regular intervals (in this case, about every second). If the check succeeds, it will continue executing and tap the button, otherwise the timeout will be reached and the test will fail on line 2.

let button = app.buttons["SomeButton"]
XCTAssertTrue(button.waitForExistence(timeout: 5.0))
button.tap()

Use Appropriate Timeouts

Building on the point above, we need to use appropriate timeouts when calling waitForExistence(timeout:). We wrote a struct called UITestTimeout for this exact purpose that defines timeout cases for common testing scenarios. The cases are well documented so contributors can more easily determine which case is best to use based on the user flow they are testing, and using these cases makes it easy for us to change all of our timeouts from a single location because there are no magic numbers floating around our UI tests using custom timeout durations.

The reason timeout usage is so important is because UI tests will always run slower on remote build machines during CI/CD builds compared to local test runs, some code paths are asynchronous, and some code paths have animation or debounce timers associated with them. These scenarios require different timeout durations, so UITestTimeout was created with them in mind. Each case has a timeout longer than would typically be required to run UI tests locally so they are more stable when running on CI. This does not impact testing speed so long as our waitForExistence(timeout:) calls are embedded in XCTAssertTrue() statements (more on that later).

Retry Logic

Sometimes tests fail. Asynchronous code paths can cause timeouts to be hit, simulators can crash, CI/CD can have issues, etc. Modern software is complicated, full of interconnected dependencies all expected to behave a certain way, but sometimes one breaks. We accepted this fact and began searching for a way to retry failed tests. We recently started using fastlane’s multi_scan plugin to retry failed unit tests up to a certain number of times before marking them as failed. We don’t have a retry solution in place for UI tests yet, but we are going to investigate re-using this fastlane plugin for UI tests vs leveraging Flank (a 3rd party testing framework by Google) which also provides this capability along with many other useful features.

Writing Fast UI Tests

Bad Assertions

Perhaps the most common mistake I’ve seen that leads to longer UI test runs is the misuse of assertions when calling waitForExistence(timeout:). When placed inside of an XCTAssertFalse(), waitForExistence(timeout:) forces the test runner to wait the entire timeout duration before checking a UI element’s existence. This differs from the behavior discussed above when wrapping a waitForExistence(timeout:) inside of an XCTAssertTrue(). This kind of check can make sense in certain circumstances, but should be avoided if possible. In other words, it’s better to check for what you expect to exist, rather than check for what you expect not to exist.

Parallelization

As with all software, parallelization can help speed things up but has its trade-offs. Let’s look at a study done at Bitrise (our Mobile CI/CD tool) on parallelizing tests:

The impact of parallelization on build times

1 Simulator: 71 minutes
2 Simulators: 43 minutes
3 Simulators: 38 minutes
4 Simulators: 34 minutes

The impact of parallelization on build failure rates

1 Simulator: 0.0% failure rate
2 Simulators: 0.0% failure rate
3 Simulators: 1.5% failure rate
4 Simulators: 5.4% failure rate

As you increase the number of parallel simulators you trade speed for stability. The speed gains fall off as well, so there’s a balancing act. This is where having retry logic in place, like we discussed above, helps compliment parallelization. Our MockServer already supports being parallelized, so after we implement a retry solution for UI tests, we’ll spike a solution for parallelization to further reduce UI test run times.

Modularization Impacts Compile Time

We finished modularizing the iOS Vrbo code base in early 2020, and our revamped UI testing approach now takes advantage of it. UI tests that verify cross module interactions or hit live endpoints still live in our app layer (the top level project), but now each experience module (lower level projects) supports the ability to write UI tests against its demo. Many UI tests can be written against these isolated experiences, and doing so drastically reduces compile time:

On average, clean building the app layer’s UITests scheme takes 544s vs 170s for a module’s UI test scheme. This is 69% faster clean build times when UI testing within a module.
On average, building the app layer’s UITests scheme takes 67s vs 25s for a module’s UI test scheme. This is 63% faster build times when UI testing within a module.

Third Party UI Testing Frameworks

There are 3rd party UI testing frameworks like EarlGrey and KIF that swizzle Apple code and leverage undocumented APIs to increase UI testing speed. We do not use any of these frameworks currently, and I hesitate to do so. We will monitor our UI testing performance and potentially spike these solutions in the future, but our current implementation is built directly on top of Apple’s XCTest APIs, giving us the maximum amount of flexibility and control, without coupling us to an external dependency.

Writing SOLID UI Tests

Writing good UI testing code is hardly different from writing good software in general: Learn from the past, follow SOLID principles, and read Robert Martin’s Clean Code. I’m sure not all software engineers agree with me on that approach, but it’s my go-to! That said, let's look at exactly what we did to establish better software patterns for UI testing this time around:

UITesting - a framework that houses reusable UI testing code that can be shared between our various projects: the app, experience modules, and lower level UI frameworks.
UITest - a base class that exposes common features and convenience properties to all UI test classes while also abstracting away common setup code necessary to launch and tear down the app between test runs.
Protocols - built out for each module to house reusable functions for interacting with the UI within that module (like the login(emailAddress:password:) function shown above in example code). UI test classes can conform to these protocols to gain the ability to verify user flows within those areas of the app without being exposed to functionality in other areas of the app where they have no interest.
UITestTimeout - a struct that defines timeout durations for testing different code paths.
Mock JSON files - for stubbing network requests with Swifter.
CoreAccessibility - a low-level, micro-framework that houses all accessibility identifiers so they can be shared between our various projects.
UITestRunner - a protocol that extends its own interface to define common UI testing behaviors necessary for our AppDelegate classes in the app and experience module demos to run UI tests.

All of the new code mentioned above is fully documented. The UITesting framework has 96% code coverage. Every helper function within the UITesting protocols wraps their code within a runActivity(named:block:) call to improve readability on console and test logs. These newly established patterns not only allow us to easily write and share UI testing code between the app and experience modules, but also facilitate writing small, readable UI tests.

Conclusion

Only time will tell how well we’ve improved iOS UI testing at Vrbo, but the initial results are promising. We still need to implement a retry solution, and parallelize our UI tests, but those become easy follow up items now that we have a better foundation for UI testing in place. Internally, we’ve updated documentation and recorded a Tech Talk to better educate our contributors on best practices when UI testing. There may be new pain points that arise as we enter a new age of UI testing at Vrbo, after all there’s always room for improvement, but that’s why we listen and iterate.

Learn more about technology at Expedia Group