Tackling asynchronous integration testing

Published in

The Startup

10 min readNov 23, 2019

Athletes on a running track — Photo by Steven Lelham on Unsplash.

It’s been well over a year since any public updates about Senses 2 have been posted. In the mean time we have worked hard and we were able to release Senses 2 to Android, iOS and really soon, web.

In this blog, I’ll describe how we’ve set up our integration tests. We rely heavily on our integration tests as part of our CI/CD process and with multiple deployments and millions of log ins per day you can imagine the importance integration tests have for us.

What’s an integration test?

In our app every team is responsible for a small part of the application. Integration tests are tests that check if two or more independent parts work well together. This is maybe best explained by a gif where this goes wrong:

Two functioning units went to production without integration tests

In this blog I will discuss browser based integration tests. We have many front-end components that need to interact correctly with each other. For example we have a user preference component that determines which features are shown on the dashboard.

Those familiar with the testing pyramid might argue that this sounds more like end-to-end testing, but that’s not true since we are mocking our back-end and solely focusing on the interaction of front-end components.

How do we run/write integration tests?

We want to test all our front-end components on a wide range of browsers and mobile devices. To accomplish that we have chosen WebdriverIO v5, which also has a mature ecosystem of plugins.

Because it is based on Selenium, it is slower than newer competitors such as Cypress.io, but the range of browsers and devices we can test on is much wider and it supports web components.

Challenges with integration tests 🚧

Our previous integration test setup had some difficulties we needed to solve:

1. We did not have a way to wait for all the asynchronous components on a page
2. Our Page Objects were inconsistent
3. We did not have firm rules and best practices enforced
4. Some browsers are unstable
5. We did not know which tests were a little flaky and which are flatout unstable
6. Execution of our test suite took a long time

The building blocks of browser-based integration tests 🧱

To write integration tests you need a couple of things:
1. A test file (also called a spec file) where you run your tests. For example, retrieve element ‘my-button’ on the DOM, then click it and I expect something to happen.
2. To keep your code DRY and readable it is wise to use an abstraction of the DOM for each page you are going to test. This is called a Page Object

I will introduce both in the next sections:

Page and Component Objects
From a strictly technical perspective, what we call Page Objects and Component Objects, both belong to the Page Object Pattern which is one of the best ways to keep your test code DRY.

In our set-up a Page Object represents a full page and a Component Object represents a Feature (such as the Bank Account Component) or a UI Component (such as the Navigation bar).

Each Page Object consists out of one or more Component Objects: a Dashboard Page has Bank Account Component and a Navigation Bar.

Spec files
The Page Objects are used in spec files which are actually run by WDIO, see the official docs for an example.

Tackling concurrency / timing issues ⌚

Before we continue it’s important to stress the asynchronous nature of our app. On a given page, we have Web Components (loaded async), Angular 7 components (potentially async) and AngularJS components in a compatibility layer (loaded async).

To give an example: on our banking dashboard we want to make sure that navigation, bank accounts (async), credit cards (async) and personal loans (async) are all done loading.

We need to make sure that every component on a page is rendered before we kick off our tests. To do so we require every Component Objects to implement a rendermethod that signifies the component to be fully rendered. This render Promise resolves when the last DOM node is rendered.

Pro tip: use 6x CPU slowdown in Dev Tools to determine which element renders last.

This is how the Component Object looks in its most simple form:

When we want to make sure that the page is fully loaded we group all these Promises together in an array on the Page Object. The Page Object itself has a pageWaitUntilRendered method which calls Promise.all() with the array of Promises.

In our spec file, we await the page.waitUntilRendered() method after which we can safely assume that all components are rendered and we can kick off the tests.

Making Page Objects and Component Objects scalable

In the previous section we already touched on some of the methods that allow us to wait for a page to be fully loaded. We want to prevent each Page Object and Component Object to implement these methods individually. That would potentially lead to diverging implementations.

We decided that it is wise to abstract the main use case. This allows for a better developer experience and more standardization. This keeps the architecture scalable and maintainable.

To create this abtraction we provide the base classes PageObjectBase and ComponentObjectBase:

Using these base classes we can refactor our previous examples to simple implementations:

The nice thing is that the team responsible for maintaining the test architecture can extend these base classes with more advanced features such as:

Prefixing a parent selector to other selectors which is useful if you have multiple pages in the DOM and thus multiple instances of (UI) component
Adding configuration to Component Objects to indicate visibility on certain viewports
Runtime exclusion of registered features on pages which is useful after certain flows.
Example: if a user indicates preference to hide creditcards, they are hidden from dashboard. And the dashboard page should not wait for the credit card to load
Runtime addition of extra promises to wait for
Example: we need to make a POST request before kicking off the tests

You can find the extended base classes using these links: Extended Page Object, Extended Component Object.

Tips for writing Page Objects and Component Objects

1. Implementing class properties should only return strings

Any logic related to the browser happens inside the spec file. This keeps our spec files readable and the Page / Component Objects simple and predictable.

All implementing classes should only return strings: DashboardPage.menuButton would return '[data-test="menu-button"] (see point 3 below). In the spec file this selector can be used to create a WebdriverIO Element: const menuBtn = await browser.$(DashboardPage.menuButton);.

2. No new methods in implementing classes

Just like the previous point, all logic should happen in the spec file. This means that no methods should be written in implementing classes. Even though it might seem tempting to do (for example to prevent duplication), we value readability of the spec file and predictability of Page / Component Objects more.

3. Use data-attributes as selectors

In a large app refactors are common. It can happen that the structure of your HTML changes and you want the selectors in your Page Objects to be as resilient as possible to changes.

Let’s consider an example where you have a wrapper element: <div id="sidebar-wrapper>. For SEO reasons might want to use a semantic <aside> tag. If the selector in your test was div#sidebar-wrapper, you now have a broken test 😔.

For this reason we encourage all our developers to use data-test attributes which in our example would be <aside data-test="sidebar-wrapper">. You can use this in your Page Object to defined a selector as [data-test="sidebar-wrapper"].

Browser strategy 🌐

We want to test our app on all major browsers and mobile devices. This does not mean that you should just turn on your tests on all of them.

You should start small: begin with happy flows on stable browsers (Chrome, Firefox). After you’ve reached an “island of stability”, start adding to your island. I’d recommend first adding some more browsers: Safari, Edge, IE11 and WebView. Add them one by one, at least one week apart and make sure to extensively test stability. We defined stability to be at least 19 out of 20 successful runs.

You might run into browser and WebDriver incompatibilities and quirks. We’ve seen them around sending keys, clipboard usage and there are probably some more. These quirks might require you to write some overrides. You can create your own browser specific commands for these use cases.

If you notice that certain browsers remain flaky (do not reach 95% success rate), you can consider moving them to their own command. We’ve created an npm script called test:non-blocking-browser where we do run our tests (so we can collect data on them), but they were not blocking for the time being.

Make sure you plan ahead, because to prevent flakiness you want each new browser to run for +- a week to find any issues. Turning on all browsers should not be done on a single day.

Categorizing and monitoring your tests 📊

Test that run in a browser are inherently flaky. Unlike unit tests, dependencies can break, be slow or suffer from network connections.

Therefore, we want to divide our integration tests into 3 categories:

- ✅ Stable: these tests normally pass, if these break, we know it’s bad.
- 🔁 Retry: these tests can suffer from some concurrency/timing issues. Ideally we should fix them, but as a quick fix, feature teams can mark them with a retry flag and they will retry 3 times, with one second intervals. Failing 3 times means a broken test.
- 🚨Flaky: these tests are in a scenario that fails often. If flaky tests fail 3 times, a warning is given that a manual check should be performed, but the test suite as a whole will not fail.

If a test is marked with retry for more than 2 weeks, we will sit down with the team and see how we can make it stable, if we cannot get the tests to stable, we will mark them as flaky.

In order to determine in which of these categories each of our individual tests should be grouped we needed some sort of monitoring. I’m not going to discuss details, but in order to make informed decisions about your test suite I’d recommend monitoring which tests fail most, which browsers/devices you test on are unstable and set up alerting on anomalies.

Categorizing on different axes 📈

In the previous section we aimed to categorize our tests on the stability axis, but we ended up categorizing our app in some other ways as well:

Important (smoke) vs less important tests
The most important tests in your test suite are the ones that cover the happy flow. Consider a loan application form. We should be able to fill in and submit the form using expected values. This is the happy flow, we group these tests under the #smoke umbrella.

Testing edge cases such as email format validation should not be part of the happy flow / smoke tests.

The benefit of categorizing your tests this way is that you can run a subset with the most important tests if you want or need to. Two real world scenarios in which you only want to run smoke tests:

- Slow browsers/devices: Testing on real devices is much slower than on Chrome. In order to keep our test execution time manageable we only run #smoke tests on devices.
- Hotfixes: consider that your app is completely broken and you want to deploy a hotfix, in that case you want to deploy as soon as possible and not wait for the whole test suite to finish. Running only #smoke tests will be a good option.

Viewport (in)dependent tests
An obvious categorization, but nonetheless an important one. We use #desktop for tests that should run on large viewports only and #mobile for those that should run on small viewports. If neither #mobile and #desktop is present, the test will run on both (viewport independent).

Speeding up test execution

The hashtags we use to categorize our test can be used in combination with the @wdio/jasmine-framework grep functionality. This works great within a test/spec file for cherry picking tests to be executed:

However, even when there is no match on the grep, Webdriver.io creates a browser instance. This does not scale with hundreds of spec files. Starting up a new window takes time and if you need to execute preparation tasks (logging in, loading base URL, etc.) it adds up. Additionally, some WebDrivers are slow and, in our case, testing on real devices is even slower.

We needed to optimize. The biggest bottleneck was testing on real devices. We opted for an approach where we would only run #smoke tests on mobile. All combinations look like this:

We knew that only a small subset of all spec files would match the #smoke tests. A possible approach could be to split up our tests in multiple files (e.g. *.spec.ts, *.smoke.spec.ts, *.mobile.spec.ts), but that would cause bad developer experience and also duplicate expensive actions such as preparation steps, navigating to URL, clean-up, etc.).

We decided to create our own grep script, which reads all possible spec files in memory, executes a regex based on our hashtags and returns an array of filenames that have a spec file that matches our requirements.

This optimization allowed us to reduce the amount of created browser instances from over two hundred to a couple dozen, while at the same time keeping developer experience great (they can write all their tests in a single spec file).

If you have any comments, potential improvements or completely disagree with this approach, leave a note in the comments, I’d love to hear your thoughts!

Thanks for reading!

Bonus: favorite WDIO plugins ❤

Selenium standalone service
This service sets up a selenium server for you to use without any hassle. It will also provide you with browser drivers.
Axe-core
Best way to test for accessibility issues. Technically not a plug-in, but easily set up using this example
wdio-image-comparison-service
Easy to use, yet powerful way to automatically take screenshots of pages or components and compare them to a baseline
Performance
For performance we use an in-house package called Gonzales. If you have any tips for public packages, drop them in the comments 💬