The Automated UI Testing Methodology You Need To Try

Ralph Kootker
FactSet
Published in
9 min readSep 14, 2021

I didn’t always love writing automated tests. In fact, some days I still get frustrated wrangling dependencies, tracking down race conditions, and pleading with the compiler. But my new workflow has me smiling during PR reviews. And I want to share this joy.

I’ll start by thanking all the dedicated folks working in open source. Without their efforts, this wouldn’t be possible. And it took several projects, all developed in the open, to converge on this pattern. Test like a user.

Test the Right Interface

In API/interface testing, you might hear “test from the interface back.” This means that you want to write tests in such a way that the implementation doesn’t matter. It’s good advice to not tightly couple tests to an implementation. It helps make them less fragile (i.e., the test breaks when the implementation breaks), which frees up developers to innovate and refactor code quickly. Given your experience with front-end and browser testing, how often does that really happen? And even when it does happen, is it really testing the right interface?

photo by @rawpixel

If you look at the demos of today’s leading UI automated testing solutions, you’ll frequently find the same shallow examples. There’s a button on the page and we’re going to click it. Use the CSS selector to find the element, then invoke the click method in the HTMLElement prototype. Not exactly like a user would, huh? It's not completely their fault. Many testing tools just aren't that great alone. And they aren't designed to test accessibility, the interface between your code and the user. They test the interface between your app code and your test code. They only show off their native capabilities in a demo. Suggested or required plugins and libraries come later.

Test Like a User

If you want to test like a user, you would Tab to the element by its accessible name and role, then send an Enter keystroke. Maybe not every user does this, but some of your most vulnerable users do. I’m talking about folks with disabilities. They use keyboards and assistive technology (AT) to access the web. And they’re often excluded when teams don’t check accessibility. But this approach also benefits keyboard-savvy power users. That’s the benefit of prioritizing the most vulnerable. Accessibility first.

photo by @sigmund

So, here’s a pattern that allows you to do your functional testing and accessibility testing in one pass! (Okay, maybe two passes if you test pointer inputs as well). Test like a user tests the right interface, the one between your code and the user’s input devices. Not the interface between your code and the browser. Or worse, between your code and your test code.

An important component of the methodology involves using the accessibility tree (axTree) constructed by the browser to find your interactive elements. While many JavaScript libraries can approximate an element’s role or name, I feel the best approach is to use the browser’s source of truth directly. The information that will be consumed by AT.

After you test the page’s semantics, you can test like a user who isn’t aware of the semantics, a user that doesn’t use AT. For this, you can use the typical click() methods to interact with elements. But finding elements should rely, as much as possible, on visible text. This is a departure from most testing demos but a very important distinction. I'll cover these selectors and utility functions in a future article.

Relying on accessible name and role or visible text will improve your testing. But you can also need the second component of this methodology: real keyboard signals. Testing tools haven’t made this methodology easy or obvious. For one, it’s rare for tools to even implement the ability to press Enter in a test. And when they do, it has to be hooked directly into the event simulation of your framework (e.g., React) to make your code work (e.g., cause a form to submit). Beware the testing tools that guess what the next element in tab order is. Their bugs are your coverage issues.

Using the page’s semantics or text labels and not relying on the front-end framework makes your tests brittle (i.e., tests break when the code environment changes). What you want is test automation that sends real keyboard signals. Look no further than the Chrome Devtools Protocol (CDP). The CDP is how the browser sends signals internally. This includes signals from the operating system, like input devices. While a Tab sent in JavaScript only fires event listeners for the specific event you created, an Input.dispatchKeyEvent from the CDP will trigger all of the native events for you, including shifting focus to the next tab stop in DOM order!

Chrome Devtools Protocol (CDP)

Several projects now use the CDP either as their core functionality or as a supplement to their features. Puppeteer and Playwright are convenience wrappers for the CDP, often implementing the entire API. These tools run the fastest. But the developer experience isn’t exactly “all included.” Other tools, like Selenium 4, TestCafe, and Cypress connect to the browser using the CDP for edge cases that they can’t solve with JavaScript alone. While tools like Cypress claim to only use in-page JavaScript for testing, exploiting the CDP connection is what gives their tool superpowers in webkit. This is the main short-coming of this approach. It really only fully works on webkit browsers. When I test like a user, I’m only focused on Chrome (although Edge would work the same). Since not all users use Chrome, I could amend my methodology to test like a Chrome user.

Like many things in testing, it’s a trade-off. In exchange for focusing on the browser with the largest market share and highest spec conformance, we get the highest possible accuracy in our testing. Nothing’s stopping you from doing old-school functional tests in other browsers. It’s up to you whether you want to maintain two testing systems and their separate methodologies. I, for one, believe that the depth of detail in my tests are more important than the breadth across browser/AT/OS combinations. Besides, that’s not a matrix that most organizations can fill out. The investment in developer time brings far too little gain in risk management to justify the costs.

The CDP does a ton of other cool stuff. Nearly everything the browser and devtools can do including screenshots, debugging, profiling, and emulation. But maybe because it’s so powerful, tools like Cypress haven’t documented their very capable bridge to using it in your tests. I’ve had a lot of experience with Puppeteer and Playwright and with writing tests in hard-to-reach contexts like browser extensions. I want to share what I’ve learned, hear your feedback, and see what you come up with.

Snapshot Testing

This will have to be a series of articles. Sorry, I just can’t fit it all into one post. But rest assured, I’ll give you enough into to get started so you can write your first axTree snapshot test. Something I think will become more useful to your team than visual regression testing (VRT) or DOM snapshot tests that are popular today. I understand the need for VRT, but I’m not sold on the usefulness of DOM snapshots. Changing any implementation detail that affects the render will likely lead to a new snapshot. And if that’s your only test, it unlikely to help you mitigate risks of a negative impact to your users. There’s often no way to look at rendered HTML without CSS or JS and know exactly how it will behave in AT.

On the other hand, the accessibility snapshot can free you up to fix or improve your render. In fact, you can often write this test so early, you might start scaffolding your views in plain old HTML first. Then add the logic, data, and “guts” of your application all without updating the test. So, what’s in the axTree that makes it so important? Great question!

The axTree consists of all the rendered semantics of a webpage. It’s the name, role, and value of each important element, that AT needs to process so people using AT have a comparable experience. Getting the semantics right isn’t always easy. And it should be tested by people who rely on AT often. What you might call a “native AT user.” These types of usability tests are very important but also time-consuming and costly. Not to mention, you might need quite a bit of the app completed to start testing. Once your team has created the best semantic representation of your application, it shouldn’t change. An accessibility snapshot test can help you get the most from your investments in usability testing.

I’m going to focus on Cypress because it has a good developer experience and a large, active community around it. First, we need to set up Cypress to get the axTree from the CDP:

// in your cypress/support/index.js
/**
* get the JSON representation of the axTree from the CDP
* @see https://chromedevtools.github.io/devtools-protocol/tot/Accessibility/#method-getFullAXTree
*/
Cypress.Commands.add("axTree", () => {
return Cypress.automation("remote:debugger:protocol", {
command: "Accessibility.enable",
}).then(() => {
return Cypress.automation("remote:debugger:protocol", {
command: "Accessibility.getFullAXTree",
});
});
});

Next, we need the ability to snapshot which Cypress provides through a command add-on:

  1. Install as a dev dependency
npm i -D @cypress/snapshot
  1. Load and register the add-on
// in your cypress/support/commands.js
require("@cypress/snapshot").register();

Accessibility Testing (Without Functional Testing)

Now you can start planning your application using better semantic structure. And you can write a test to ensure that no one changes the semantics of your initial render on accident. But we need to go a little further before we get into functional testing. We want to ensure that our semantic structure is accessible. I’m going to add an accessibility checker using axe from the cypress-axe add-on. Axe has rules that can quickly check many of the WGAC success criteria for accessibility. It’s not the same as a complete audit, but it can catch a lot of mistakes and it’s simple to set up with most testing tools.

  1. Install the dev dependency.
npm i -D cypress-axe`
  1. Load the add-on
// in your cypress/support/index.js
require("cypress-axe");

Here’s a sample test using our new .snapshot() command and a check against axe's rules:

context("My app", () => {
before(() => {
cy.visit("http://localhost:8080");
cy.injectAxe();
});
it("has the correct semantic structure", () => {
cy.checkA11y();
cy.axTree().snapshot();
});
});

You can start every project like this. Just don’t accept pull requests that violate an axe rule or change your semantic structure in unplanned ways. If you’re adding these tests to an existing app and there’s a lot of work to get it passing, use the skipFailures option of cypress-axe.

As soon as your application adds interactive components, you need to re-run these checks. Every time the DOM changes states, you need to check axe rules. Most of those changes will come with render calls or DOM manipulations. But a few might come from native interactions with CSS. So, keep that in mind.

How’s it Going?

I’ve found bugs using these techniques. And in some cases, they’re bugs that I code reviewed and didn’t catch manually. I have many reasons to not only continue on this path but share this process with others. I could share some early stats, but your mileage may vary (YMMV). And there’s no shame in having a lot of user interface bugs. The only shame is knowing how to find and fix them and doing nothing.

What’s Next

In the next part of the series, I’ll share how I find and interact with elements like a user. For now, if you want to test more views and states of your app, check out Cypress’s API.

Written by Paul Grenier - Lead Software Engineer @FactSet

--

--

Ralph Kootker
FactSet
Writer for

I publish on behalf of others or myself. Please carefully look at the acknowledgements at the bottom of each article