How to use Golden Master testing for your iOS apps

Jan Olbrich
7 min readOct 30, 2017

The human perception is an interesting topic. We don’t perceive in absolute values, instead, we see things in a relative way. Ever wondered why pirates had eye patches? This is due to the eye underneath the patch having a different relative value to light than the other eye. Going below deck, they were still able to see and didn’t have to adjust their sight. A long time ago, this was life-saving. It’s often easier to spot differences, when perceiving colors and grayscales in relative values (e.g. a tiger in the jungle). Sadly at the same time, it makes us prone to subtle changes. We don’t see small changes, for e.g. the difference between 36 and 36.5% in a pie chart:

Lucky for us, the computer works with absolute values. This enables high precision and the above difference would have been detected without any issues.

During testing, we often need to compare our current version with some kind of prior version. The so-called Golden Master testing captures the result of a process and compares it to a prior correct version to discover unexpected changes. A common implementation of this approach is UI-snapshot testing. But don’t be mistaken! Checking your backend API’s by comparing it to prior results is also a form of Golden Master testing.

Since we are working with iOS apps, let’s have a deeper look into UI-snapshot testing.

Reasons

Golden Master testing won’t test your code for correctness, but you can be sure the behavior stays the same. This in itself can be more important, than correctness. This can be especially useful when you don’t have an easy way to integrate another kind of automated tests. Have you ever worked with a huge legacy code base? I was lucky and my first two jobs were precisely like this. Introducing any kind of test was a pain. It got so far, that we weren’t able to introduce XCTest into our iOS-App since we used libstdc++. One option would have been to add Golden Master testing. At least we would have known, whether we broke something or not. Sadly I didn’t know of this option back then.

On the other hand, if you already have a huge base of unit-tests and UI-tests, it still might help you to find an unexpected regression, which is not covered by your tests (you can’t test every possible path). Furthermore, the analysis can be quite detailed. It can detect small changes and is even suitable in case your code depends on the result of some external library.

As I’ve already mentioned, Golden Master testing can help to visualize perceptional differences in our UI.

UI-Snapshot testing

Let’s have a more detailed look into visual perception testing (also called UI-snapshot testing).

The basic idea is to take an image of the screen while it displays the known correct data. This is our golden master. Now every time we change our software we create a new image of the same screen and compare it to our golden master. In case everything is the same, we know we don’t have any kind of visual difference. If this is not the case, we know there is a difference, but how do we proceed? It doesn’t necessarily mean that we broke some code. Maybe it contains expected changes, maybe not. We don’t know without looking! The only thing we know, is something has changed. Further investigation is needed, so UI-snapshot testing is not our answer to everything.

Test data

I’ve created a small screen which displays the German flag as a ball, two labels, two text views and a button.

Our golden master is:

This is the result of a failing test:

How many differences can you find, just by looking at it?

PerceptualDiff

Google has written a small tool which will calculate the difference between two images.

Here is the PerceptualDiff output using the two images above:

FAIL: Images are visibly different
1839 pixels are different

I guess you found the error in the username, but did you see the difference on the German flag? It wasn’t a square anymore. Instead, it was 127.5x128 pixels. This change is nearly impossible to detect when using your naked eye without an image to compare to. By default it doesn’t generate an output file. It does have an output flag though:

pdiff -output diff.ppm testimage.png goldenMaster.png

This will create a file in which you can visually see the differences.

A nice feature perceptual difference has, is setting the number of allowed changed pixels. Sometimes you don’t want to have a test fail due to small changes, but most often I would advise you to stick to the default.

There are more tools to create a diff for two images. You could achieve the same diff result with ImageMagick:

compare testimage.png goldenMaster.png -compose src diff.png

FBSnapshotTestCase

Of course sitting down, creating screenshots and then comparing them is a quite tedious task. We could do so by using either fastlane snapshot to create the snapshots (or using Xcode 9’s new feature snapshot), store them with a specific name in a directory and use pdiff to compare them to our golden master.

Or we could use FBSnapshotTestCase. Facebook released this tool to ease snapshot testing in iOS development. Following I will look into how to set it up and how to use it. I will describe its usage within Unit-Test targets, but you can also use UI-Test targets.

Setup

Facebook describes on their Github page how to set FBSnapshotTestCase up via CocoaPods. But you also have the option of integrating it via Carthage. It’s straightforward like any other Carthage integration.

Furthermore, add the following to your schemes startup parameter so we can find our golden master images within our repository:

FB_REFERENCE_IMAGE_DIR: $(SOURCE_ROOT)/$(PROJECT_NAME)Tests/FailureDiffsFB_REFERENCE_IMAGE_DIR: $(SOURCE_ROOT)/$(PROJECT_NAME)Tests

In our Unit-Test target create a new file and change import XCTest to import FBSnapshotTestCase. Do the same for the superclass of your Test class.

Add a test like this to your TestCase:

This does the following:

  • instantiate a ViewController
  • start the view cycle
  • enter test data
  • using FBSnapshotTest to verify it’s correctness

To create our golden master we have to add recordMode = true to setUp(). When you run this test, it will fail. But reading a little bit into the logs it will state:

failed — Test ran in record mode. The reference image is now saved. Disable record mode to perform an actual snapshot comparison!

Your golden master has been created in the directory you’ve specified above.

Removing recordMode = true will configure your test to actually compare the test data with your golden master.

Let’s see what the test result was for the above-taken images.

Within the test output you can find a directory in which you can find all the created images. This contains the testImage, referenceImage and a diffImage. I’ve used these three images above.

CI Process

Sadly FBSnapshotTestCase is not able to judge whether this error is intended or not, so it can’t give you any details except for “Test failed”. This judgment needs to be done by you. We are not the first to look into visual perception testing. A team at Google did this and thankfully they mentioned their CI process in a video. In general, it looks like this:

  1. Run Tests
  2. Create images
  3. Compare Images to GM
  4. Release Manager checks whether changes were expected

I think point 4 is the most interesting. Whenever the UI changes your Release Manager checks the failing tests and decides for or against the release. They stress it too:

“Failing UI-snapshot tests do not mean something is broken”

Gotchas

Even though I focused on UI-snapshot testing, I mentioned other use cases for Golden Master tests. One issue arises when comparing floating point values. You know they shouldn’t be compared with equality but this is exactly what Golden Master testing does. So you might want to exclude this, or make special rules about this. Another problem can be time stamps (e.g. expiration date of coupons). Whenever you are not in control of these, you might want to exclude them.

Conclusion

Golden Master testing is a really useful tool. You know if something unexpected arises, but sadly you are not always able to know, what was it. This kind of test can be easily added to any legacy code base, even when you are unable to write a lot of unit tests due to high complexity. Running these tests ensures you are not changing anything. It doesn’t check for confirmation. Instead, it checks for changes. This way, if the tests pass, you can be sure the behavior didn’t change.

I’m curious to know your experience with Golden Master testing. Feel free to share your case in the comments below.

Further Reading

--

--

Jan Olbrich

iOS developer focused on quality and continuous delivery