Screenshot testing 101

Published in

XM Global

7 min readJun 6, 2019

Automated testing is the only way to ensure maintainable and high-quality software. This applies to all aspects of a software application. Whether we are testing a function for password validation, an action performed when a button is pressed, or the appearance of a UI element. We are, in all regards asserting that something is “as it’s supposed to”.

Most of us are quite comfortable testing the first two (unit) cases. The appearance problem isn’t as straightforward though. The lack of formalized specs when it comes to UI and the very nature of human perception and taste differ from developer to designer and vice-versa.

Count the times you’ve been in a “this feels wrong” discussion or even in a “that’s it, now it looks great” meeting? Being able to “lock” on the correct appearance or quickly identify problematic UI has been a long-standing request.

So what’s all the fuss about? We are all good citizens, we have unit tests to cover the business logic, we even write end-to-end UI tests to check how our app behaves. However, none of these cover the “how it actually looks” ambiguity mentioned above. Screenshot testing sits right in between unit and integration testing.

Screenshot testing will take some time to configure in the project. Depending on the platform this could mean custom Gradle tasks, build target configuration or something else. It will add the usual overhead that comes when writing tests. However, manually testing for normal and edge cases in the UI, while developing a new feature and then periodically re-visit the same code to test for regressions, is so 2014.

What is Screenshot Testing

Screenshot testing is the process of rendering a UI component using precomputed or static data and capturing a screenshot of it in isolation. This screenshot is then used in comparison with another previously recorded (using the same preconditions) reference screenshot. If something has changed the assertion will fail.

Pull request showing diff in the closing time string

This falls into the category of what Michael Feathers in his book Working Effectively with Legacy Code identifies as Characterisation Testing (aka Golden Master Testing).

“They aren’t really tests written as a gold standard that the software must live up to. We aren’t trying to find bugs right now. We are trying to put in a mechanism to find bugs later, bugs that show up as differences from the system’s current behavior.” — Michael C. Feathers.

Screenshot testing is widely adopted nowadays. While originally introduced by Facebook it is now supported by many different companies and individuals. The most well-known flavors (depending on the platform) are jest, screenshot tests for Android and FBSnapshotTestCase. The two mobile libraries do pixel-by-pixel comparison while the first library compares Text output and more specifically React view trees.

note: sometimes you might see the term snapshot testing instead of screenshot testing. Most of the times they mean exactly the same thing. Snapshot testing is just more broad in the sense that it’s not limited to a pixel-by-pixel comparison. Snapshot testing compares output, pixels or generated source code, where screenshot testing is strictly comparing pixels.

Screenshot testing Unagi

“Unagi is a state of total awareness; only by achieving true unagi can you be prepared for any danger that may befall you.” — Ross Geller.

A failure in a screenshot test usually means one of two things.

We changed something inadvertently ➡️ we’ll get a nice image comparison report portraying the error in perfect context.
We actually meant to alter the UI ➡️ we must re-compute the reference screenshot to the newly desired functionality.

Screenshot testing enables us to be at all times, aware of intentional and unintentional changes in our UI. Gone are the days where a color change in “Button A” or a padding increase in “Label B” messed up seemingly irrelevant parts of our App. We can now make that UI related code refactoring knowing that everything else will stay exactly the same.

Visual regressions will become a thing of the past.

Half UI — Half unit! The best of both worlds

Depending on the framework someone uses, the subject under test or the process followed, a screenshot test can be considered either a unit or an instrumentation test.

We believe that it’s actually both. It can test a single component, it’s very fast to run and iterate over multiple configurations while simultaneously cover big chunks of our code base in each pass.

The many faces of screenshot testing

One of the most important aspects of screenshot testing when it comes to mobile is the ability to test how the UI behaves in different configurations. Each configuration adds another dimension in our testing plane-field. Simply put, a dimension is anything that makes a developer wonder “How will it show if the device has X?” i.e. “a large screen”. Anything you can think of for the X is a dimension candidate.

Dimensions should be considered as configuration categories that can coexist. In our application we have identified the following dimensions though yours may differ significantly:

Screen size: we use two configurations, a small and a large setup. These suffice for now but a folded configuration is not far :)
Τheme: Our app supports light and dark theme. We need to preview and test against both to make sure everything shows as expected.
Localization: Every developer that supports more than one language knows the pain of maintenance and the uncertainty if “the translated texts will fit” that every new language brings. Length variation, descender or ascenders in characters and of course RTL. While we could have a dimension variation for each supported locale this would explode the screenshot generation. To address the explosion one might introduce a pseudo localization logic. Instead of translating the text in all foreign languages, we only need to test a pseudo-language where the string resources of the application are replaced by an altered version of the original language, usually English. These pseudo-resources might have padding text for testing against greater length texts and special characters that stress the UI from a vertical line height perspective, font and encoding support.

The total number of all possible combinations when picking one element from each dimension is the product of the cardinality of each dimension, i.e.|D1| * |D2| *|D3|.

When screen_size = {small, large}, theme = {light, dark}, localization = {pseudo-localization} the total screenshots generated for each test case would be |screen_size| * |theme| * |localization| = 2 * 2 * 1 = 4.

Coding side effects (the good ones)

Even though the main purpose of Screenshot testing is to guard against unwanted UI changes we have found some very interesting “side effects” when using screenshot testing. Coding with screenshot tests actually shifted the whole mindset of how we develop UI.

UI prototyping on steroids 💊

In mobile development, actually displaying your UI would mean, spin up an emulator, navigate to the view however deep in your navigation flow and pray you won’t have to do it all over again.

Having screenshot tests in place enables us to preview the UI we are developing in no time. There’s no need to wait for ViewControllers/Activities, ViewModels, etc, to be in place just to render the view. You can write the view code, use screenshot tests to verify it, and push to repo ✅!

Feedback cycle

Being able to immediately see how the UI behaves in various scenarios like when a string is too long or when a network call fails is extremely helpful. Having that feedback early on to share it with designers and copywriters will save you tons of time later on.

The perfect PR

Imagine a pull-request that is more than just code and tests. A pull-request that actually shows the changes right there and then. The reviewer will be able to quickly evaluate the committed UI changes leaving out the guess work or the need to run the code to check it out.

You can find a sample in one our demo android project in https://github.com/trading-point/screenshot-testing-android-demo/pull/3/files

Gotchas

The road to hell is paved with good intentions and Screenshot testing is no exception. There are some pain points, especially in projects were requirement changes (affecting UI) are very common.

Reference screenshots must be constantly updated every time something changes since even the slightest difference will make the tests fail.
Developers will have to deal with lots of false positives.
Screenshots can grow your git repo size significantly. This can be mitigated using a git submodule project but it will add more setup overhead.

What’s next?

This article is the first part of a blog posts series. Platform-specific posts will follow for iOS and Android, so stay tuned. In the meantime, you can check out our demo repositories for implementation details:

Presentations

Following the topic of Screenshot testing, we gave a couple of talks that you can checkout for convenience

Voxxed Days Athens 2019

Special thanks to Sotiropoulos Georgios, Xristoforos Filippou, Vagelis Koutkias , Kwnstantinos Natsios, Natalia Chalkidou for contributing and proofreading.