Detox: The Unobtainable Test Stability (or is it?)

Rotem Meidan
Wix Engineering
9 min readJan 30, 2021

--

Detox is a gray box testing solution for mobile apps, it manages sync between the test code and the app so the users don’t have to do it manually. Despite eliminating the user’s need to do so, and the abundant documentation and guides, developers and testers can still get tripped up on usage patterns, misconfiguration, and suffer from poor test stability. We know, we feel that pain every day…

This article aims to explain how Wix is using Detox internally, how we manage configuration, how we fight flakiness, and some best practices we’ve developed over the ~3 years of building and using Detox in our CI process. We’ll also discuss our endless strive towards “0 manual QA”, which always seems in reach, if we only overcome that one last technical hurdle.

Gray-box testing and Detox

So we’ll start at the beginning, in order get everything into the right context. Detox was built at Wix in order to solve a growing problem of flaky E2E tests, especially on our then-new react native app (I wrote about this in depth in the past). The main idea was to follow the approach successfully executed by Espresso and EarlGrey.
As opposed to Black-box testing, where the tester needs to decide how much time to wait for the app finish what its doing before sending the next action/expectation, the Gray-box approach includes running a synchronization mechanism inside the app-under-test’s process, to detect busyness/idleness of the process, and only interact with the app when it is considered idle, meaning it has no more events to handle, no network requests, no animations, no transitions. This approach guarantees that any action/expectation with the app will only happen when it's finished processing everything, and nothing will change anymore until the next user action.

A very brief intro to Wix’s app architecture

In order for us to talk about how we incorporate E2E testing inside Wix, let’s first discuss how Wix’s mobile app is architected. In a birds eye view there are 4 types of parts which we build the app from.
1. Engine - the backbone of the app, an entity that includes all the native app code, has an API registry for modules that are built on top of it, so that they can communicate with it and with one another.
The infrastructure libraries used in the engine (most of which are open source) and the engine itself, are usually written in languages native to the platform, alongside a unified JavaScript API.
2. UI components and toolset library (aka React Native UI Lib), which all the UI is built on top of, includes both JavaScript and native components.
3. Module - a single product (blog, store management, CRM, chat, etc.) exposing an API of itself and consumes APIs of other modules(many modules exist). This is the actual product implementation (here most of the screens and business logic is defined).
4. Other libraries - Open or closed source, these are somewhat disconnected from the release process, and are inherently transitive dependencies of one of the above mentioned parts. New releases are being updated manually (mostly) through PRs to one of the above mentioned libraries. Regarding testing, they are being treated separately, and have their own suites of tests.

Each of these parts has its own independent CI process, and when everything looks good, a new version can be published for everyone to use.
The final stage is to take all of the above and build one big binary that includes everything, this is the app we release to the stores at the end of that process.

For a more detailed look at this, I recommend reading Omri Bruchim’s post on how Wix’s app is architected.

Testing strategies

Now that we have a slight view on how everything is laid out, let’s discuss testing. I will not discuss unit tests, though every project has its own elaborate unit test suite, which runs all the time, most because they are very very cheap to run.
On the E2E end, we differentiate between a few types of tests, and split our test suites and execution timing to different stages of the development cycle.

Production E2E

Fully functional app (mostly) with very minimal mocking. Wix use experiment and A/B tests quite extensively, with this type of tests we have to make sure to get the same features/tests on every test execution, so we developed an experiment override mechanism that enables configuration of a predefined experiment blob at the beginning of every test, this launches the app with these experiments running, thus insuring we never get different behaviour while the app is under test.
Production E2E runs whenever a module owner wants to “GA” (publish) their work into the full app.

Mocked E2E

These tests use mocked server endpoints, and do not interact with the production environment. This controls the mock servers’ outputs and tests the module’s behaviour in predefined states.
The big upside is the ability to control all inputs of the test, consistent input with local mock servers → consistent output, usually more stable and a bit faster than Production E2E.
These run in CI on every push to the module’s codebase.

Screenshot testing

Mostly used in our shared UI library (UILib).
Detox doesn’t have any sort of screenshot comparison mechanism, but it can take screenshots on demand, both device-level screenshots and Element-level screenshots, to be compared with an external library (we use applitools in order to get smarter comparisons that avoid false positives on slight pixel variations).
NOTE: It is worth mentioning here, that slight pixel variations is not an uncommon issue, it can happen by comparing screenshots taken with two different graphics cards or drivers, for instance, a local dev machine where the baseline screenshot is taken, and the CI machine where all the test screenshots are taken, and can easily cause tests to fail if not handled properly.

Component testing

Some of the modules incorporate testing of actual react (native) components states by only loading the components themselves and switching state/props programatically throughout the test.
This is being done on top Detox with Kompot, our React Native component testing library.
These run in CI on every push to the module’s codebase.

Choosing what to test

Wix app modules are written 100% with JS/TS. This means that most of the module business logic runs the same code on the two platforms.
A lot of times product developers find that a bug related to their module manifests on both platforms similarly, and that bug which manifests only in one platform is usually an issue with an infrastructure library. This means that module devs are able to get pretty good coverage by only running E2E on one platform.
This does not, however, mean the whole app can be tested on one platform. All of Wix’s infrastructure libraries are running tests on both iOS and Android.

Stable E2E

In order for us to be able to trust E2Es, they must be very stable, but also, provide a good insight to what happens when a test fails. Over the years, we tried to improve that in two ways.
The first is by developing a series of tools to help us figure things out, view hierarchy dumps, trace logs, videos, test timeline graphs, and also trying to make it easier for the user to understand why tests hang when they do (usually caused by the app not being able to get into idle state, due to a network request, an infinite animation, or an infinite loop).
The second is education. The latter was mostly directed internally at Wix’s engineering team, but some of it also reached our documentation.

Test Isolation

We’re also working hard to make sure all tests are 100% isolated from one another, this alone makes failed test summaries much more readable and retractable.

Current pains

Main pains we currently have are test duration, but we are working very hard to solve this, mostly by adding out of the box support for emulator/device cloud providers.

Read This!

As I was writing this section, it turned out to have much more content than I initially anticipated, so instead of keeping it here, I extracted it into a whole different blog post focused solely on Writing Stable Test Suites.

Configuration

As Detox becomes more mature and gets more features and tighter integration with Jest (usually in order to leverage some features available in Jest, or a new cool feature in Detox itself) configuration becomes harder to handle. Although documentation is pretty good, it’s hard to keep up with changes in config.

Our detox configuration files… Too many?

As mentioned above, Wix app is built from many independently developed modules. Each has it’s own git project and build configuration in CI.
Each of these was required to add Detox configuration files into its repo, most of these were similar config files.
During the last 6 months we got to a realization that it makes zero sense to keep it that way, and went for a unified configuration for everyone.

The configuration files are an extremely important yet fragile part of Detox, it is easy to misconfigure and get subpar experience.
For those reasons, we have decided to supply some sort of basic, shareable configuration right out of the box sometime in the near future, this something we already do internally.

CI setup

CI for mobile is one of those things that are still pretty hard to nail with out of the box solutions.

SaaS CI

We’ve tried all the major SaaS CI solutions, Running iOS E2E on either @CircleCI ,@travisci or @bitrise is pretty easy, the big problem is with Android Emulators.
The real insight here is that Android Emulators work better on Mac VMs than on the offered Linux ones, as they are guaranteed to run with nested virtualization, which is required for running an x86 emulator (we basically run a VM (Android x86 emulator) inside a VM (macOS)).

Our Solution

The current solution for our in-house CI includes VMWare ESXi on Mac Pros (via MacStadium) with macOS VMs. This works for both iOS and Android as VMWare supports nested virtualization, but performance on those machines is pretty bad, compilation time is slow, and test parallelization is somewhat limited.

Next steps

Bare Metal - Our next-gen setup includes “bare metal” Mac Minis, such that run everything on the host OS. These have much better performance, at least x2 faster than a VM on an Intel based Mac Mini, hopefully it will be even faster on Mac Mini with M1 chip.

Genycloud - On the Android emulators front, we recently released support for Genycloud SaaS device orchestration. By using Genycloud emulators we gain 4 key benefits:
1. Drop all prerequisites from the CI machine, as these emulators run remotely, on Genycloud’s infrastructure.
2. Better performance per individual emulator. In our case it decreased a 35 minute suite with two workers running Android emulators to 15 minutes on two workers running Genycloud emulators.
3. Maybe the biggest benefit, the ability to scale infinitely, and run as many workers as we want in parallel.
4. All SaaS CI options now become valid, as they can be used for builds and test orchestration, and offload device emulation to Genycloud.

The main caveat with Genycloud is fact that it is a paid service (even for open source projects).

0 Manual QA

The holy grail of the release process is to have it all automated, and fast(!).
As we continue to solve our problems, we get just a bit closer to that goal at each step.
The main issue we’re facing today is the test execution speed in CI, especially for elaborate production E2E suites (read about production E2E above). The absurd is that it is faster for our QA engineers to go over the test suite manually than to wait for 150 minutes for it to finish, so they do it…
This will not stay this way for long though, with just a bit more work (mentioned in the previous section), we’ll get much faster test suite execution, with potentially infinite scale.

Next steps

Incorporate faster emulators and simulators, and dramatically reduce the full test suite’s execution time.

Our efforts are paying off…

Final Words

So for us, the goal of achieving stable (and fast) E2E tests is indeed feasible, but it will probably pose many more, yet unknown, issues in the future. The great thing in my opinion is that Wix encounters these issues at a pretty big scale.
This blog post was also delivered as a talk at TestJS Summit, check it (and some other very interesting talks as well) out at https://www.testjssummit.com/#videos

For more engineering updates and insights:

--

--