Overview of BackstopJS, a tool to test a web application’s UI

12 min readJan 16, 2019

In this article, we’ll talk about the BackstopJS CSS regression testing framework, and how we integrated it in our business application to answer the following questions:

How can we iterate faster on UI changes without inadvertently breaking some design ?
Can we increase confidence in a stable UI design while reducing the user testing effort ?
Can we do that without considerably slowing down the daily work of our frontend developers ?
How can we test a highly dynamic application ?

To save some people time, let’s start with the conclusion and problems encountered.

Our UI development flow has certainly improved and we were able to spot and fix issues that we wouldn’t have been aware of without BackstopJS. Therefore we’ll keep using it from now on and improve upon our integration as time goes.

But the integration hasn’t been easy, and some issues stay unsolved:

This library can only use your machine’s resources to run the tests. While some other libraries like the Headless Chrome Node API puppeteer offer ways to simulate other devices, you’re still taking screenshots from your browser using different resolutions and not from a real device. We sometimes encounter UI issues for iOS devices only that do not appear from the Chrome’s device toolbar (eg trying to display an icon on top of another element’s padding area), and BackstopJS won’t help identifying these errors early.
Although it seems like BackstopJS did a good job at optimizing things, taking screenshots remains a slow and time consuming operation. At the time of writing, it takes us about 10 minutes to take 71 screenshots.

It is sometimes difficult to stabilize a screenshot. Our application being dynamically generated, and containing animations, ensuring that a screenshot is taken at the right time from every developer’s machine is tedious.
Since some time is required to load a page, if your machine is busy running another software, chances are high that your screenshot will differ thus your test fail. We currently prepare some low-computation work to do before we start running tests to avoid this issue.
At this time, only Linux developers are able to run the test suite. That’s because other Operating systems will display fonts in slightly different manners which will result in screenshots differing by up to 5%. We’ll resolve this issue when updating our integration to take screenshots from a Docker container.

Now that we got the demerits out of the bag, let’s go back to the article.

At GOunite, we take design seriously. The product being branding design driven, every little detail matters.

But when you develop a new service with countless new features in the backlog, and try to follow software engineering standards such as writing DRY code, it is REALLY difficult to not break some part of the design at some point.

It’s even easier to break something without noticing it when using highly volatile libraries such as Material-UI.

If you have limited resources — almost always the case for startups and new ventures — or limited skills, this can result in months being lost fixing bugs that weren’t caught during reviews, or got detected in the staging server but blocked further production releases until they got addressed.

Having built a strong infrastructure (Thanks GCP and Kubernetes), a good development flow (Thanks Github) and somewhat okay codebase (Thanks linters, code formatters, type checkers and unit test/integration test runners), the UI integration became our biggest weakness, curbing the development of the application.

How can we iterate faster while preserving a high quality level of design, supported on various viewports, browsers and devices?

Various solutions exist and often involve paying a 3rd-party service, with all benefits and disadvantages that come with it. Services aren’t cheap, and we weren’t interested in a host of options that came included in the price. I then found BackstopJS, which appears to be used by most of these 3rd-party services in the backend, and looked easy enough to integrate at first glance.

I’ll discuss how we quickly integrated it to reduce the user testing effort and increase UI reliability.

Backstop is a simple-to-use tool to take screenshots of pages through various viewports.

viewport: a framed area on a display screen for viewing information.

The most common names used to describe viewports are device categories: mobile, tablet and desktop.
Another interesting option is to use breakpoints like those defined by Material Design. Material-UI, a React library on top of Material Design, implemented a subset of them which are quite intuitive.

Overview

Let’s first have a look at BackstopJS’s workflow (once it’s setup)

Generate references with backstop reference(create screenshots that will be compared against) and save them (if using a versioning tool like git, you could commit them, but we’ll see later a better alternative).
Modify some code. Run the tests with backstop test to generate screenshots for the current version
If a test screenshot differs from the reference, an error will be show ! To fix it, either fix your code because the reference is correct, or…
Approve the new screenshot with backstop approve because the reference is inaccurate (for example because you added a new feature to your design). This will replace the reference so you need to save it again.

It is recommended to only version the references and ignore the test screenshot.

The workflow is very straightforward, let’s setup BackstopJS in our project and try it out.

Setup

First run backstop init to setup your project.

backstop creates by default a backstop.json file which contains all the configuration, including all scenarios you’ll want to run.

A very minimal configuration with a single scenario would look like this:

Whaaat ? It is very long already. I could already see a bottleneck appear in the near future when we’ll have hundreds of scenarios (very common for a business application). It will be very hard to maintain such a file.

Moreover, don’t we need several of these configuration files ? One to run all scenarios locally one time on the developer’s machine, one for continuous testing on file changes that will only run a subset of scenarios/viewports, and one for a CI engine like Travis

jq to the rescue !

jq is a lightweight and flexible command-line JSON processor. It is like sed for JSON data

json is a great format for APIs and such, but this is no configuration file material. You always want to document the non-obvious parts, let messages for other developers, and keep the sizes small.

That’s what makes YAML more suitable. So I decided to split off the config in several files

Since backstop still needs a json file, let’s auto-generate it and gitignore the generated file. It’s a good rule of thumb to never commit auto-generated files unless there’s a good reason for it, like lock files to insure all developers use the same packages versions.

let’s dive in the css:config command.

First, we use yq, which is a wrapper around jq, to parse the yaml files and return json content. All the scenarios are slurped together to form a single array of scenarios. Then we pipe the json content to jq which will merge all the scenarios (input) in the `scenarios` key of the configuration object.

(
  cat backstop.yaml backstop_local.yaml # read the config file
    | yq .;                             # overrided if needed
  cat backstop_data/scenarios/*  # read the scenarios files
    | yq '.scenarios[]'          # extract the scenarios
    | jq -s .                    # slurp all scenarios
)
|                                # pipe the config and scenarios
jq '.scenarios=input'            # merge the scenarios in the config
> backstop.json                  # Save the json result

Hey, your yq command fails when I try it, do you know why ?

There are different softwares called yq, so my guess is you installed the wrong one. At GOunite, we use kislyuk/yq.

In some cases you may be able to avoid writing a config file by using the node invocation e.g. Pass a config object to the command.

References

Once your setup is ready, execute backstop reference to generate your reference screenshots. It can be either a staging or production application, but we’ll see later why a staging instance makes more sense for dynamic applications.
Since we created a custom configuration, and to insure that a generated json file is present, we can instead run yarn css:reference .

This has created as many images in the directory backstop_data/bitmaps_reference/ as there are viewports times scenarios.

While you want to version these files in order for your team to keep testing new UI screenshots against these references, do not commit them in git! Your git repository’s size would quickly explode, making it very inconvenient for everybody.

A good solution for versioning large files is git lfs. This way, instead of saving full blobs of data in git, you’ll only save the hash to the remotely stored file (github integrates it seamlessly).

$ git lfs install
$ git lfs track "backstop_data/bitmaps_reference/*"
$ git add .gitattributes
$ git commit -am "feat: save backstop references with LFS"
$ git push

Then ask your colleagues to install git lfs and run git lfs pull to be up to date.

Tests

Run backstop test to create your test screenshots that will be compared with the references. Similarly to the references, we’ve wrapped it in a new command yarn css:test to insure the config file is there.

As a developer, if you run an instance of the application locally, you can quickly spot how different your local application and the staging one are.

This is what makes CSS regression testing suites so powerful. Imagine you added a 10px padding to an element to fix a design issue. Many non-designers wouldn’t even be able to point out the issue (I certainly wouldn’t !), but if a screenshot has a long purple band showing the difference between screenshots, you’ll notice it immediately.

Let’s have a look at one screenshot:

The screenshot above shows a fix for an address field in the mobile viewport which wasn’t aligned with other fields. While a trained eye could spot the error, this isn’t something obvious, especially when the page containing this information is very long and when it only breaks under special conditions like a viewport.

Approve

Finally run backstop approve whenever an error is spotted but your local version is correct. That will replace the reference with the new screenshot.

Now we can call yarn css:approve --filter=”host.*mobile”and only the mobile viewport reference for the host scenario will be created.

Great, now BackstopJS is integrated to our project. Compared to other tools, the integration was rather easy :)

Git hook Backstop on files changed

Running the test suite with 70 tests already take us 10 minutes, and we’ve only gotten started covering the application. Running the full suite is expected to take more than an hour, which would critically impact the speed and efficiency of our frontend developers.

We want to reduce the number of scenarios executed on the developer’s side as much as possible.

Insuring that every scenario run as expected across all viewports is the job of the CI component (we use Travis CI), which often comes at the code review phase (during a Github Pull Request in our case).

How can we make the frontend developer’s experience as painless as possible ?

One solution would be to detect what files were modified by the developer, and via labels only execute the related scenarios.

Since we use Husky to handle git hooks, let’s update the git pre-commit one. Note that BackstopJS recommends integrating this in the build process, so we might update our webpack setup to support it in the near future.

One testing approach to consider is incorporating BackstopJS into your build process and just let the CLI report run on each build or before each deploy.

Back to our pre-commit hook:

Whenever we try to commit something, it’ll figure out what css/js files have been modified, parse their path for a match to a label, then use the matched list of filters to only run backstop with said labels.

Tip: As convention, always write lowercase labels, else it won’t match against the related file path.

Let’s say we modified the Header component of the Home page

$ git statusChanges not staged for commit:
  src/components/pages/Home/Header.js

Since our home scenario has for label “home”, which matches the lowercase path src/components/pages/home/header.js, it will be executed when committing this change.

Let’s get to the meat of the topic:

How can we take screenshots of pages which are mostly composed of dynamic data ?

Backstop Fixtures

If you’re testing a page with dynamic content, BackstopJS recommends either to hide/remove the related content from the screenshot, or use fixtures

The best way to test a dynamic app would be to use a known static content data stub — or ideally many content stubs of varying lengths which, regardless of input length, should produce certain specific bitmap output.

This is why using the production application as reference for your screenshots is not a good idea.

A staging environment is supposed to be identical to the production, albeit using different data. That makes it very suitable as a reference candidate.

I’ll mention two ways of updating a staging server’s database:

Use a fixtures generator. Fixtures could be data stored in a yaml file, that could be easily loaded in the database. We often use fixtures in unit tests to bypass the database altogether and return fixtures data through mocks.
Copy production data —from old enough backups — and process it (like replacing non-staff members emails with @example.com mocked emails)

Once you have a data file (often .sql or .json), you’ll need to concatenate it with another set of data dedicated to BackstopJS screenshots.

Using a django server in a kubernetes cluster, we could approach this staging database update as follow:

Download json data with the Django manage.py dumpdata command
Replace emails then concatenate backstop js fixtures with jq
Remove all data from the staging db, and load the new fixtures with manage.py flush && manage.py loaddata

Repeat the database update locally and your test and staging environment will be using the same set of backstopjs fixtures.

Why do we need to use Django and Kubernetes for BackstopJS ?

You don’t. I’m just sharing our configuration, but you will need to write your own script to adapt to your application’s environment.

We can now take screenshots of backstop-related pages being confident that dynamic content will not cause screenshots to break.

In some situations where it cannot be avoided — like randomly generated feeds in a list page — we should tweak the scenarios to hide or remove the placeholders using the appropriate selectors.

# HOME PAGE
scenarios:
- label: "home"
  url: "https://gounite.local/"
  referenceUrl: "https://staging.gounite.com/"
  removeSelectors:
  - "#djDebug"
  hideSelectors:
  - ".card-wrap"
  delay: 1000---
# BASIC HOST PAGE
scenarios:
- label: "host-basic-sm"
  url: "https://gounite.local/host/backstop-basic"
  referenceUrl: "https://staging.gounite.com/host/backstop-basic"
  removeSelectors:
  - "#djDebug"
  delay: 500
  onReadyScript: "puppet/sortFeed.js"
  scrollToSelector: "#sticky-end"
  postInteractionWait: 1000

Since we use Django Debug Toolbar, we need to remove it from all our tests via the selector #djDebug.
Our home page has feeds of randomly selected cards, so we added a script to auto-sort them before taking a screenshot.

Use the following template to create a new onReadyScript script:

module.exports = async (page, scenario, vp) => {
  console.log(`SCENARIO > ${scenario.label}`);
  await require("./clickAndHoverHelper")(page, scenario);// add more ready handlers here...
};

Here’s our sortFeed script:

Awesome, our development environment is ready, and although there is still plenty of room for improvement, we’ll be able to iterate upon it over time when necessary.

Since we started using this tool, re-occurring design bugs have dramatically fallen.

Summary

Only 3 commands to remember: backstop generate (create reference screenshots), backstop test (take new screenshots and compare) and backstop approve (accept new screenshots to become new references)
The BackstopJS configuration can quickly become huge. I shared a tip to split the configuration in several yaml files for easy maintenance.
BackstopJS will spawn a headless browser, wait for the page to load, execute javascript scripts then take a screenshot. Beware time load and adjust your scenario using appropriate properties like readyselector , delay and postInteractionWait.
Comparing screenshots means that you need predictable output. To do so, make sure that your reference and test environment use the same data (share the same fixtures). You can also modify the page behavior to remove unpredictable behaviors like randomized lists via JS scripts.
Running screenshots takes a long time. Leave this job to your continuous integration tool, and only take screenshots of staged files to iterate faster on the development side. git hooks or dist builds are a good time to execute the CSS regression testing.

In the next post, I’ll talk about running a Docker testing environment from Travis CI and how to run our full CSS regression testing suite from it, to validate/refuse a Pull request on Github.

Update 2019/02/27: Fixed approve part following Garris’s comment.