Testing at the UI Level

Published in

SafetyCulture Engineering

11 min readFeb 12, 2020

Automated UI testing is a very common and obvious approach in modern test automation, but despite its misleading simplicity, there are two important concepts that are sometimes mixed up by mistake.

Testing through UI
Testing the UI

This is why when we talk about testing at the UI level it may lead to misunderstanding and endless debate. To achieve the required regression testing level you may need by applying both of these concepts, you should clearly understand when and why each of them is used.

For clarification let’s go through each concept separately.

Testing THROUGH the UI

A form of end-to-end testing that provides regression across key user behaviour. A risk with testing through the UI is that false confidence in releasing a product is achieved, especially when only these tests are considered.

Why implement it?

Other levels of testing may not exist (can be hard to apply) or don’t bring us the same level of confidence (for example, perhaps previous experiences have destroyed trust in the other levels).

What do we have in the end?

Take as an example: we are growing weekly, or even daily, the set of UI tests because we need to cover a lot of functionality and edge cases. At the beginning it might look good, tests are passing and sometimes even reveal issues are identified, but then the behaviour pattern starts to change and the return on the effort required to maintain the full test suite crosses a threshold. This is brilliantly summarised in the following picture by Gojko Adzic

Where is the wisdom?

UI tests alone do not promote us to write a more flexible and scalable application!

Unfortunately, with UI tests as a standalone strategy, it’s very easy to build bad habits. For example, if the engineering effort required to maintain them does not add to the confidence in the test results then very quickly they will no longer be seen as fit for purpose! A more subtle risk though is that large numbers of UI tests end up acting like a gatekeeper and instead of building a quality application, engineering effort is focused on keeping the tests green which results in the quality checks locking the application into functionality and behaviour that matches the test scenarios.

To help mitigate these risks teams should be challenged on two key points every time they think about adding a new UI test.

Can we reach the same coverage using automation at a lower level in the stack?
If the answer to 1 is No, can we refactor our solution to make it possible to automate at a lower level?

Only if both these are a ‘No’ should new UI tests be considered as acceptance criteria. Try to avoid testing the app through the UI as much as you can. And if you do, implement only those Face Saving scenarios as necessary.

Testing to ‘save face’

What should be checked?

Only the key business flows and only that these basic flows complete correctly. For example, an online bookstore has only two key flows: one is registration and the second consists of logging in and purchasing a book.

What should not be checked?

Don’t check the details and business rules, like calculating taxes and delivery time, these should be covered at lower levels in the application otherwise tests will very quickly become complicated and hard to maintain.

Three levels in automating end-to-end flows

If the team still believes that we need UI tests (it’s an appropriate solution) readability is critical (maybe even more so than in the production code) because this code will have a higher priority (as it is a gateway check) and needs to clarify immediately where something needs to be investigated. Well written tests quickly identify conflicts and are easily scalable, and maintainable. Maintainability is especially important because the majority of test classes, objects, and elements should be reusable.

To help apply these principles we should design each test according to the three levels below.

Business Rules provide a high-level statement of the key flow, e.g. free delivery is offered to customers who order two or more books.
Workflows provide more details and outline the process in language that an end-user would be familiar with, e.g. add two books to your a shopping cart, enter address details, verify that the delivery options include free delivery.
Technical Activity is a more complete and explicit breakdown of what needs to be implemented in the test code, e.g. open the browser and navigate to the homepage URL, log in with “testuser” and “testpassword”, go to the “/book” page, click on the first image with the “book” CSS class, wait for the page to load, click on the “Buy now” link… and so on.

Mapping the levels to our code:

Business Rule => Test name.
Workflow Actions => Page objects and methods for element interactions.
Technical Activity => The functional implementation of the methods for the page(s).

What needs to be emphasised here is that separating out the second and third levels forces us to consider the user’s point of view without (hopefully) jumping directly into the implementation. This helps to create more robust test scenarios.

We want to decouple the users intent from the implementation details.

For instance, recently we had a major login page change. Initially, the form was a single page which allowed the user to type their email and password and be logged in immediately after clicking the button. Now we have a 2 steps process. First, you type your email then click the continue login button, the form updates and you can then enter your password and complete the login by again clicking the continue login button.

In this case, if the test methods were described directly from the user workflow point of view each test that was using the login step would have required refactoring instead of the single login method in the implemented PageObject.

The concept above is based on Gojko Adzic’s article — How to implement UI testing without shooting yourself in the foot.

Testing THE UI

The goal for these tests is to provide confidence in the logic coded in the frontend and ONLY the frontend! Remembering that these scenarios have already been vetted as not being testable at lower levels of the application, all we need to prove is that the elements have been integrated correctly. Especially if there are third-party libraries involved.

Testing the UI should focus on

Changes in UI that should happen, e.g. after a mouse click or button event, but still skipping details. Again, business rules and detailed checks should be implemented and verified at other levels in the application.
Data rendering i.e. from the API to the UI (visual testing is especially useful here)
Serialise correctly changes from the UI to the API (but no further).

How does this approach differ from Testing through the UI?

Instead of focusing on flow completion, we focus on the functional parts of the page. It is important to note that this is a significant conceptual difference which is essential to stopping UI tests from becoming end-to-end testing. We want to perform Testing of the UI not Testing through the UI.

General principles

NO PageObjects. As each test class is focused on a particular feature, the number of shared objects across tests should be very limited. This helps to avoid extra complexity in the framework architecture and because each class is isolated from the others it is much easier to maintain tests as there is no unexpected dependencies across test cases.
Limit scope to an explicit feature state. Like with unit tests, failing a test immediately gives us clarity into where the problem actually is rather than having to trace scenarios through the entire application stack. It also provides a significantly faster feedback loop.
Isolated environment. UI tests involve a lot of interacting parts, having rigid control of the test environment limits the chance of unforeseen changes affecting the test results. This is one of the key aspects of why there is a generally perceived ‘flakiness’ to UI test suites. For the maintaining engineer, there can be a serious cognitive disconnect between a UI test that is in error and the backend issue it represents — the UI code will most likely have nothing to do with the error being traced, yet from their perspective, they have no other context or indicators. Using mocks or dumps for the backend will help to avoid inconsistencies from the part that often is most affected by changes. Yes, we need to test for these events to help ensure system stability, but the UI is not the most appropriate level for ascertaining problems deep in the stack.

Best practices or principles for ANY UI testing

Assertions — How easy is it for someone to understand why the test has failed?
Readability is everything. Always write comments! Refactor the code or extract the method, even if it is only used once, to encapsulate the logic into smaller coherent blocks.
Operational Context — Each test run should be easily connected to the changeset that has affected them. Time is money, especially if you consider our professional developers. It should be immediately observable that a test is failing and from that, what is broken, and most importantly who should fix it. If you multiply investigation time by the number of possible changes that happened between recent UI test runs, you will find yourself in an unpleasant position. There are generally few volunteers to investigate issues, this results in the need to assign special lifeguards who will deep dive into this complexity, this should be avoided wherever possible.
Controlled environment — as much as you can, establish a reliable and deterministic test environment. UI tests include a lot of moving parts already. Not just the particular actions available from these parts but even the ordering of these actions can affect the end result of the test. Having the ability to avoid or exclude any external layers of complexity is crucial (especially if these do not exist in a real production environment). For instance, the choice of test framework can impact test stability e.g. Selenium tests use WebDriver to interact with the browser. Having an understanding of the known issues and limitations in these helps to triage when failures are possibly not due to our application.
Logical Context — keep the tests in the same repo as the system under test (SUT). UI tests are tightly coupled to the frontend anyway which makes them dependent on changes in that code. Generally, this means that changes in any other part of the application should not require the tests to be updated. Cognitively it is easier for an engineer to perform implementations/investigations if they are not having to trace through multiple repositories e.g. performing a global search for element ids will eliminate potential surprises later during a pipeline build.

What about cross-browser testing (CBT)

First, let’s be clear what the purpose of this testing type is. The goal is not to catch any functional issues as described above. CBT is all about feedback around browser or platform-specific issues. If we have a build that takes hours to complete because we need to run it on multiple environments and it catches an issue it doesn’t mean that the test project was successful. From experience what we have found is that these failures are generally not specific to a particular browser or version so there was little benefit from executing these long and complex test runs across all known browsers and versions.

Ok, you ask “…but what about when we do catch a real browser-specific issue?!” Then let’s think about this potential scenario. Firstly, what is most important is when can we catch these kinds of issues. Due to the time required for CBT runs, it’s impossible to use these as a gateway check for PR’s (blocking releases). This means that the changes have generally already slipped into production in some form.

Secondly, again due to the delay in the feedback, the potential changeset to review is large and generally results in a request to the lifeguard(s) with the domain knowledge to help investigate the true root cause. All of which adds further delays to the time before a fix can even be identified, let alone implemented.

And lastly, what if this browser-specific issue is not 100% reproducible? The general conclusion becomes “oh, it’s just flaky tests!” Tests get ignored or they keep producing a diminishing of trust. Unfortunately, we had a chance to face this difficult to reproduce, browser-specific issue, and this is why it’s not just an edge case.

Unfortunately, CBT is not giving us any extra benefit for free. It is not only a time-consuming operation for test runs, but it requires developers to spend extra effort in making tests pass for multiple browsers. Especially for those that handle javascript in an inconsistent or non-standard manner, e.g. IE11.

What is the solution

It may sound surprising and simple, but performing manual regression testing for each required browser saves you from most of these problems. Of course, it doesn’t mean that somebody has to do all regression testing before every release. This is relevant to new features only. The majority of issues with CBT (especially with IE11) is not about functionality it’s about visibility. These sorts of checks can not be covered by automated tests anyway as they are not known beforehand (otherwise the engineer would have fixed them wouldn’t they?) They require the human mind to differentiate discrepancies and invalid behaviour.

But what about regressions?

Fortunately, a solution already exists which is Sentry. By this tool, the incident above was revealed and planned for a fix before customers reach out to customer support. This tool can alert engineers about any issue that happens in the browser with the required information to asses the impact, priority and potential root reason before users are seriously impacted.

Yeah but what about IE11!

Unfortunately, a large proportion of our user base is restricted to IE11 as their only browser.

A while ago IE11 issues were seriously impacting our customers and shamed our engineers. Unfortunately, the user experience was pretty dramatic and resulted in pages not loading at all. The root reason was found in a library that wasn’t supported by IE11. Knowing this we created a simple test that navigates through all the main web pages and performs a check for errors or infinite loading spinners.

What this gives us is performant UI tests which are easy to scale — for each new page you need to only add 4 lines of code:

Load the URL
Perform an existence check on a unique element that is specific to the page (check that we weren’t redirected)
Ensure that there are no errors (error components are not visible)
Check that there are no loading bars (the component we display while waiting for a response from the API)

To make it robust and as simple as possible no interaction on the pages have been implemented (generally where all the pain with CBT is coming from). Even for logging in the tests don’t use form interactions, just pure HTTP requests.