Our journey to better UI tests

When we started writing E2E UI tests for our product, we went with the time-tested, traditional UI automation stack — Selenium, Java & sadness.
The source of said sadness was to do with the dreaded F-word when it comes to UI Automation suites — Flakiness.
Flaky, unpredictable results in an automation suite aren’t a new problem and have been an affliction for a lot of QA teams, and we were no exception. We set out with grand ambitions of having a large, comprehensive & reliable UI test suite but ended up accomplishing only two out of those three desired attributes.
A lot of the times UI test suites are fragile due to unsuitable practices adopted while writing the tests (bad locators, faulty wait conditions, etc.), but there are also architectural issues in the way Selenium WebDriver tests are executed.
A very quick primer:
- WebDriver defines a standardised API protocol to allow programmatic interaction with browsers. This enables capabilities like automated page navigation, user input, JavaScript execution, etc. on the browsers.
- Browsers implement the above protocol using servers (called Drivers) that run in the browser. Every browser has its own driver, Eg. FirefoxDriver, ChromiumDriver, etc.
- These Drivers accept requests conforming to the protocol (basically JSON requests over HTTP) and in response “drive” the browser to do things as per the request.
- Selenium makes bindings available for a variety of programming languages which allow interactions with the WebDriver API using those languages.
- Thus, the pairing of a language binding with a browser Driver enables automation of that browser using programs in the chosen language.

On paper, this architectural patterns looks good because it’s modular and allows the choice of language to be decoupled from the choice of browser and that allows the same test suite to be executed on different browsers by just swapping out one Driver for another. The HTTP-based protocol leaves the browser-specific heavy-lifting to be done by the Drivers.
The primary issue with a design like this is that it forces the use of a stateless communication paradigm (multiple HTTP requests) to interact with increasingly stateful systems (modern JS-powered single-page apps). Gone are the halcyon days of the web where practically every significant user action prompted a request to the server to load a new server-rendered page. Most of the interactivity in modern web-apps comes from JS code which not only maintains the app’s state within the browser, but also asynchronously keeps updating this state in response to user actions, data from the server, etc.
Responding to these unpredictable, async changes to app state is infeasible using a synchronous, stateless protocol. This is why you find most Selenium test suites littered with:
- wait conditions (with arbitrary timeouts) and/or
- calls to “wait-until-this-happens” methods which poll the browser state until a certain condition is satisfied.
This is a big reason behind the flakiness of Selenium-based test suites. We started facing similar issues in our UI automation efforts as well, and this forced us to examine alternatives.
Enter Cypress.

Cypress is a Javascript-based testing tool which runs tests inside the browser, in the same run loop as your application code. Due to this, Cypress can access pretty much everything your application code can. This opens up quite a few interesting possibilities. A few examples of things you can do in Cypress with FAR less ceremony than in Selenium:
- Stub API calls & application methods
- Access/manipulate in-memory data (Eg. your Redux state)
- Execute custom event-driven JS code
Having such low-level control of the browser’s state allowed us to mitigate a lot of the causes of flakiness in our tests. Test initialisation, latency tolerance, assertions all became significantly simpler & more reliable with Cypress.
Beyond reducing our problems with fragile tests, Cypress also surprised us with how easy it makes the experience of writing and debugging tests. Some of the features we loved:
- A companion UI when executing tests in UI-mode, which makes debugging a lot simpler
- A headless mode which “just works”. Made setting up the suite on our CI server a LOT less painful.
- The concept of plugins which allow you to execute Node.js code. Pretty useful when you’d like to do things outside of the browser. Eg. we use it to check the delivery of mails from our product.
Of course, some of the choices that Cypress makes do come at a cost:
- Since Cypress tests execute inside the browser, they can only be written in JavaScript.
- Support for individual browsers has to be written from the ground-up. Cypress only supports Chrome right now; although support for others is on the way.
- Limitations around operating only a single tab and a single browser at a time.
For a more detailed account of the trade-offs involved in using Cypress, take a look at this.
Overall, it’s early days in our adoption of Cypress, but the initial signs have been very encouraging. It has already solved a few very real problems for us & we’re optimistic about the promise that it holds.
