expect(anything).toMatchSnapshot(“Here be dragons”)

Craig Bilner
11 min readJun 25, 2018

--

This week I’ve mostly been thinking about snapshots.

Working on times-components has provided a unique and fascinating opportunity to build something different across the whole gamut of the software lifecyle. Particularly it makes you focus on the developer experience (DX), how you make cross-platform components and how you test them. AFAIK no one is doing this with this particular approach, and this affords us the ability to do things; wrong, badly and/or make a multitude of mistakes. One area we have particularly excelled in, is generating a lot of questionable Jest snapshots.

Snapping at React

Let’s take a step back and look at how the React docs position snapshot testing. The main docs are very much focused on using React itself and testing is left to fall under API Reference. Here in the Test Utilities section we’re directed to either Enzyme or the react-testing-library (RTL) which allows us to more easily walk/manipulate rendered trees, but with no opinion on assertion at all. RTL does mention that it works just as well with snapshots, but is more geared towards simple node assertions and interactive testing.

There is a link to the Jest tutorial however, that drops us straight into a basic snapshot example __tests__/__snapshots__/Link.react.test.js.snap which for reasons we’ll talk about later is already questionable.

The other sections Shallow Renderer and Test Renderer provide simple examples of how you could assert on a shallow rendered tree.

If we were trying to ascertain how we would test a simple React component at this point, we know our options for testing interactivity, simple render assertions and a hint of snapshot testing but it’s certainly not forced on us.

Native Snaps

The main React Native docs offer even less than React in terms of testing advice other than a potential flag here where weary travellers in search of testing approaches must have oft stumbled. Fortunately the sign post takes them back to familiar territory, Jest. One suggestion of using snapshot testing from React feels incidental, but React Native (RN) also/only points here…and Jest is also from Facebook, so snapshot testing must be the idiomatic approach!

If we follow the 2 year old tutorial offered up it does give us some insight into how one goes about snapshot testing.

Jest can capture snapshots of React trees or other serializable values to write tests quickly and it provides a seamless update experience.

And I can attest to this, they are definitely useful for capturing trees and they are quick to write. This post also gives us reasons why traditional assertions may be inappropriate for UI tests.

The bigger picture (full render) is complicated to create, tedious to maintain and very repetitive (copy-paste). The second approach (using find, contains…) is unstable: We need to remember to implement all the cases since the new ones won’t break those tests.

With this useful piece of advice

You need to make sure that the __snapshots__ folders are reviewed in your PRs

Yet we do get a glimpse into the dangers ahead

Snapshots can be huge and sometimes it might be tedious to find the part you are looking for (lot of noise). In contrast, pull requests will highlight any changes for you.

It’s not clear why snapshot-diff would help with huge snapshots in general, but it does provide a way to make smaller ones in certain scenarios, when you care about the difference in component state perhaps.

Back to the source

By now we have the concept of snapshotting and in the snapshot section of Jest another post gives us some insight into why they came about in the first place

Engineers told us all they wanted was to make sure their components don’t change unexpectedly.

And another one gives us this word of warning

Snapshot tests are a complement for conventional tests not a replacement

They’re for trivial UI changes, that are easy to read in PR reviews which can leverage auto-mocking to reduce the burden of snapshotting dependencies. This echoes the oft mentioned mantra of “No flakiness, Fast iteration speed, Debugging”.

And lest we forget, Jest itself gives us some Best Practices. In particular

Ensure that your snapshots are readable by keeping them focused, short, and by using tools that enforce these stylistic conventions

Apart from the formatting which is addressed for us, the only tool provided is eslint-plugin-jest, which does provide the invaluable no-large-snapshots rule. Other than that we’re largely left to our own devices around policing of snapshots and being extremely disciplined during PRs.

What happened?

When you’re building a large repo with a multitude of developers (who are often changing) and you have deadlines to build features, snapshots are going to be the first to be neglected, particularly because of the ease of their creation. As we’ve already discovered for RN, there are not an awful lot of testing options, so initially we were naively performing “integration” tests with the snapshots to ensure a composite component was rendering “correctly”. RN is also particularly bad for just spewing out huge snapshots of inconsequential rendering information that you can’t turn off.

The tech debt slowly creeps almost unnoticed. One PR which has an overly large snapshot but which looks to be doing about the right thing goes in. It happens a few more times and then there are partial changes to them and before you know it…you have a 2mb snapshot file. This is exacerbated when you’re snapshot testing across three platforms (iOS, Android, web) with various approaches from different developers. We’re doing snapshot testing at scale ;-)

Even though we weren’t using the no-large-snapshots rule this can still happen. Alarm bells start ringing when, for innocuous changes, you can’t even see the diff on GitHub anymore, changing padding became a 20 file change and you still need to verify it looks correct, making you question the value of the snapshots at all. This is compounded when you upgrade react-native-web (RNW) or RN which slightly change a prop or className here and there growing to 200+ file changes when the components themselves do nothing differently. Yet you can’t just go and delete them all…

You can address the debt one snapshot at a time, but how do you know which one to start with and what even makes a “good” snapshot. While we’ve learnt general rules for snapshots (they should be human readable…) how do you curate a subjectively “good” snapshot. Not auto-mocking Jest or using eslint were certainly mistakes but even these don’t go far enough for me, we ideally need to stop “bad” snapshots.

Are we alone?

Not everyone is building a library for React and RN components with snapshot testing, so maybe it’s just us and maybe we missed something. I trawled through awesome-jest and it’s a little bereft of tooling around the snapshots themselves save the eslint-plugin we should have used.

I did the same thing with awesome-react-components to see if anyone else had fallen foul….and despite what twitter says…turns out not everyone is snapshot testing. The closest examples I could find did have some very questionable snapshots and some which were pretty much ok, perhaps due to their light use. It’s quite a different story for RN though, which on the whole has a large majority of less than ideal snapshots.

On the one hand it is relatively good news it’s not just us, but on the other we’re still feeling a lot of snapshot pain and making changes across components and reviewing the snapshots has become unwieldy. This provided the opportunity to think about what makes a “good” snapshot, how to identify “bad” ones and write a tool to stop this from happening again.

jest-lint

Due to a plethora of less than ideal snapshots to work with I started to codify the problems with them which formed the basis for a POC which became jest-lint. The focus is currently on linting JSX in snapshots but can obviously be extended.

It doesn’t use eslint itself because it’s limited in the type of assertions I’d like to make and would stifle it’s growth into a more holistic snapshot validation tool. Through iteration I ended up with the rules detailed in the README. This is already contentious because snapshots aren’t supposed to be machine readable and there’s a certain amount of hacking around this prettified output. It would be nice if there was a snapshot standard that was both human and machine readable.

Now that we have the ability to pinpoint bad snapshots per package and have a goal to work towards (no lint errors), it’s made refactoring the snapshots much easier. I’ve also run it on several other open source repos to get a better idea of how other people write them and it generally concurs with my previous conclusion that on the whole, people don’t tend to use snapshots in Reactland so they tend to be ok. However, RN has a tendency to subjectively produce many more and poorer snapshots.

If we were to return to that Link example in the docs, I would say this should fail the linter for a couple of reasons. The [Function] prop values you so often see, mean absolutely nothing in terms of code correctness and should be tested separately and ideally combined with a type system. The three tests are clearly testing className changes (snapshot-diff would be useful here as it goes) but the snapshot contains text, functions and an href which will be a diff on a PR if changed which are completely unrelated to the test itself. This analysis doesn’t actually exist in jest-lint yet but would be a nice feature i.e. warning on code duplication.

Whittling the perfect snapshot

The next problem was trying to fix the linting errors which proved more than a little difficult. If we look at our options for creating snapshots we can use the react-test-renderer (RTR) previously seen or Enzyme. The latter doesn’t have a RN adapter as of yet and the former doesn’t provide the “host” elements (which we’ll visit later). This means using Enzyme is usually preferable for web, and RTR for native.

Unfortunately to provide “placeholder” components in the snapshot for external dependencies with Jest, jest.mock is the preferred approach. I don’t love global mocking myself and it doesn’t quite sit right with me that I globally mock something just to curate a snapshot of it. Unfortunately createNodeMock doesn’t work with RN either.

On the web side of things we have a much more powerful tool belt in Enzyme where we can dive and filter for pretty much whatever we like. There are also three flavours of rendering on offer from the enzyme-to-json package giving us a lot of control during the curation process.

With this in mind I wanted the DX for making a snapshot to be as nice as possible. This would mean not caring too much for Enzyme/RTR, using a single API to whittle the snapshot and have total control over every aspect of it. It’s far from ideal to have to use jest.mock for some things, the Enzyme API for others and then custom serializers for anything else. This led to several POCs in jest-serializer.

Tano had already created the initial concept of stripping out all the RNW classNames which made the snasphots infinitely more readable…however, this also stripped out all the classNames making some of the snapshots meaningless. This was tackled with the rnw transformer to allow the developer to selectively snapshot the styles they care about.

Several iterations and refactoring later…

We now have a standard (albeit slightly hacky) way of traversing the snapshot tree which effectively acts as a transducer, applying each given transformer. Developers can now write some code that looks like this:

addSerializers(
expect,
compose(
print,
minimalNativeTransform,
minimaliseTransform((value, key) => key === "style")
)
);

where expect is the global Jest method and the others are exposed from jest-serializer. The end result when using RTR on native, is a minimal snapshot (“irrelevant” props removed) with no styles so that the test can focus on some sort of structural presentation. This approach may go against the original concepts of snapshots in that it doesn’t look at the “big picture” and ignores some of the rendering logic, but it does provide independent snapshots that don’t change with unrelated code features/fixes.

For web we have a couple more options. We can use RTR with a single serializer such as

addSerializers(expect, rnw(["height", "width"]));

which will print out something rather useful in a similar (though admittedly uglier) way to jest-styled-components:

exports[`Placeholder renders a large placeholder 1`] =
`<style>
{
"S1": {
"height": "300px",
"width": "970px",
},
}
</style>
<div>
<div className="S1">
<div>
watermark
</div>
<div dir="auto">
ADVERTISEMENT
</div>
</div>
</div>
`;

or if we want a deeper render of the tree, we can use mount from Enzyme such as

addSerializers(
expect,
enzymeTreeSerializer(),
compose(
stylePrinter,
replaceTransform({
ArticleFlag: justChildren,
IconDiamond: propsNoChildren,
...meltNative
}),
minimalWebTransform,
hoistStyleTransform,
rnwTransform()
)
);

This is extremely experimental in terms of functionality and DX but essentially the enzymeTreeSerializer renders the “host” objects such as <TestComponent><div>...</div></TestComponent> allowing us to act on TestComponent if we so wish. Native doesn’t have this luxury so we have to use jest.mock and we could use filtering in Enzyme, but as discussed the idea of this experiment is to create a single API to “compose” a snapshot. The replaceTransform then swaps out nodes by either giving us the children because we don’t care about them or the skinny version of the component itself. The bizarrely named meltNative is just a convenience feature because we write everything in RN and use RNW which leaves Native elements in the tree. minimalWebTransform is very similar to the native one and again (perhaps controversially) will remove props not pertinent to the test. If nothing is passed to rnwTransform then all classes are simply stripped.

The commandments of Justin

To conclude our snapshot journey, Justin Searls (who knows a couple of things about testing) has this to say on why he doesn’t like snapshot testing. I wholeheartedly agree and have felt each point to some degree.

  1. You don’t understand them

Each PR can feel obvious but as they grow and grow they become noisy and unintelligible. Even if you limit their length the snapshot file as a whole can balloon and they don’t necessarily represent what is being tested. This is where jest-lint steps in, ready to wave a flag when the snapshot has warning signs of being irrelevant. There still needs a certain amount of human intervention to ensure tests are correctly written of course.

2. Can’t author them

As it stands, no you can’t….but our experiments with jest-serializer have shown that it’s at least possible to start to engage in snapshot topiary. If there were a nicer way of serializing per test and mocking components per test render, the DX will be fairly nice.

3. Easy to update without care

As soon as developers start saying, “We need a global snapshot update npm script”, you’re pretty far down the slippery slope. We’ve always tried to put in a certain amount of friction to updating snapshots, because while their intended purpose is to be “easy to update”, devs don’t read them. This is part cultural shift and part getting them to the point where they are as valuable as the code/other tests so they’re taken seriously and only change when appropriate.

4. False negatives

This is borne out of the points raised above. By curating correct snapshots and being disciplined on PRs enough to call out snapshots that updated when they shouldn’t have, this should be addressed.

Fin

Is snapshot testing even worth this effort? Despite the pain, I would say yes. As someone who reviews many PRs being able to separate a visual representation of your rendered component from the test code makes it much easier to read. When done correctly a good diff will make the PR almost instantly mergeable which gets us much closer to our CI/CD aspirations.

For our use case which is verifying the rendered state of many dumb components it still feels like a natural fit but potentially we just need to spin the original intention of them round. I think they serve much better as just another visual assertion which should be carefully selected like any other unit test rather than everything representing a “big picture” that’s easy to update.

Where we were misusing snapshots other tools have filled in such as; dextrose and ayespy which give us actual visual regression testing at both the PR level and when integrated into the various platforms. To support dextrose in particular we did a lot of work around making our storybooks react/storybook agnostic, to enable other types of testing for RN in particular. We’ve also introduced expo into our PRs to give immediate visual feedback on a device, and all of these compliment snapshot testing so it doesn’t need to do too much.

--

--