A decade of shade: automated testing from someone who wants the robots to take it already

Katy Williamson
The Storyblocks Tech Blog
9 min readOct 25, 2023

I’ve been in the industry of software application development for almost 10 years (wow, old). While I’ve worked on teams of all spots and stripes, I’ve only worked at two companies, and I hope that’s the reason I haven’t seen the mythical unicorn — a Painless Automated Testing Strategy (TM) — in the wild. It’s out there, right? Somebody has it down? It feels like most of us are “still figuring it out” or “just trying to make the old sh** work” so we can get on with the more interesting parts of our day. At Storyblocks, we have mixed feelings. Some good…

  • In general, pull requests include tests
  • Passing tests is a step in each CICD pipeline
  • Our QA team is invested in our automated test coverage

… and some bad:

  • No clear strategy on when to write what type of test
  • Some apathy towards writing “good” tests (I feel you)
  • A lot of legacy tests that are not good, pin us to old technologies, or flake often, which give us negative feelings about our suite as a whole

We’ll name our (very real) Painful Automated Testing Strategy “Patsy”. A good name, because Patsy gets blamed for a poor developer experience, lag in getting code to production, and — for all the lines of code that compose her — she never gives us the coverage and confidence we need. When she’s slow, we’re mad. When she doesn’t catch an issue, we’re mad. When we can’t debug her, we’re mad. Poor Patsy.

Photo by Stephen Leonardi on Unsplash

Although I’ve seen more bad than good in my career, I’ve developed a few opinions along the way.

Some of them are about deciding to create new tests that are valuable to you… and worrying about the old stuff later. There’s value in simply not making the problem worse.

Others are about how to get test writing off my plate — while still writing good code that we’re confident in, of course.

Test requirements, not implementation

Katy’s simple truths of engineering posit that:

  1. The same goal can be accomplished in a million different ways. This is beautiful.
  2. Engineers, as protectors of a secret and powerful realm of knowledge, hold the power to do things how we want. This is also beautiful.

I got a little righteous there, but what I’m getting at is: we don’t have to make test writing hard on ourselves. Implementation details change over time and shouldn’t be the subject of our tests. If you find yourself writing a test to confirm a certain function is called with certain arguments, what you’re really doing is adding friction for yourself or another engineer when the target code has to change. Think instead about what would make you confident your code is working as designed. That’s often about outputs rather than inputs.

At a previous job, I worked on a team where our Business Analyst got involved in the testing strategy. Plain english requirements were translated into “it(‘should’)” Cypress test signatures before development started, and the manual QA team even scaffolded them in a PR, so that feature engineers could fill them in throughout development. This had a few benefits:

  1. Non-engineers got way more invested in writing complete requirements, including edge cases.
  2. Non-engineers were more able to make a priority call when a test broke or failed, because they shared ownership of the test (and the requirements).
  3. Engineers, an often mischievous breed, were not left to their own devices / allowed to skip tricky tests.

It was a lot of overhead to get this practice started, but I liked the theory of it. Is it part of your mythical Patsy?

Photo by Jason Leung on Unsplash

Feed it to the robots

I am so ready for robots to take over test writing. When AI is more able than me to analyze requirements and code and generate tests to cover it… boy howdy.

One thing we’re trying out at Storyblocks is Github Copilot. My copilot AI buddy is getting better every day at using context clues and my personal habits to suggest code snippets. I can definitely see the potential to pass off test-writing responsibility in the future, and focus more on quality and coverage.

Copilot found a gap in my test scaffold!

Re-evaluate your feelings about snapshot tests

Snapshot testing is an approach to making sure all downstream effects of a change were intended by the developer. By adding a snapshot test, you’re saving an output at a point in time to source control, so that next time you make a change, any impact to that output is clear in a diff. Usually used for DOM elements, they could be used for function outputs or state too.

Say I added a snapshot test for this element, called SideNav.

Check out https://www.maker.storyblocks.com , it’s pretty cool!

Here’s an old test we have to confirm the right options are shown in SideNav for a Visitor (as opposed to a Member). It’s ok, but it’s actually a pretty lossy test.

it('should show the right set of drawers for a visitor', async () => {
renderWrapped(<AssetRailNav />, { initialState }});
expect(await screen.findByText('Stock')).toBeInTheDocument();
expect(await screen.findByText('Uploads')).toBeInTheDocument();
expect(await screen.findByText('Styles')).toBeInTheDocument();
expect(await screen.findByText('Text')).toBeInTheDocument();
expect(await screen.findByText('Overlays')).toBeInTheDocument();
expect(await screen.findByText('Logos')).toBeInTheDocument();
expect(await screen.findByText('Folders')).toBeInTheDocument();
expect(screen.queryByText('Record')).not.toBeInTheDocument();
expect(screen.queryByText('Brands')).not.toBeInTheDocument();
expect(screen.queryByText('Templates')).not.toBeInTheDocument();
});

It doesn’t make any assertions about the order of SideNav items, which icons go with which labels, or whether the elements are interactive. It also requires manual rewrite if any of the labels change.

Here’s a snapshot test for the same component.

it('should show the right set of drawers for a visitor', async () => {
renderWrapped(<AssetRailNav />, { initialState }});
expect(wrapper).toMatchSnapshot();
});

When you run the test, a snapshot is generated, and you add it to git. It holds a lot of information. The items in SideNav are clearly buttons that contain SVGs and labels, in a specific order.

Beginning of the new snapshot file in GitHub Desktop

Uh oh, Design wants the items in SideNav to be reordered! Your wish is my command. Make the change in the component, run the test again, and see that the snapshot doesn’t match anymore.

Snapshot failure in Terminal

Looks exactly like I intended! Accept the changes, commit, move on with my life.

This example doesn’t capture the main complaints I hear, which are: Too many snapshots have to get regenerated too often! The generated files are huge and hard to read! It adds friction to making a code change!

In my opinion, they’re still super readable if you show restraint when generating snapshots. Stick to the component level, don’t snapshot whole web pages. If you’re well-versed in semantic HTML, the diffs should be readable at that level… and if they’re not, either your code or your component is too complex. They could also illuminate something you didn’t mean to change, or help your teammates review your code better.

Friction is minimal, imo, because visual diffs are the easiest way to understand changes, and updates are as simple as pressing u while the tests are running in watch mode.

There’s lots more to read about in the debate, but if used responsibly, snapshots can be useful for teams weary of writing custom tests.

Photo by Patrick Miyaoka on Unsplash

Choose the test type based on the situation

Think of a software application as an obstacle course. (We are now obstacle course overlords.) Each time we develop a new feature for the app, we have to run the obstacle course again to make sure it’s working as expected, getting sliced by barbed wire and bruised by mechanical arms with boxing gloves that come out of nowhere. We take the fastest and easiest way through, and just get it behind us, every time.

But if we write tests, we can more easily observe that the obstacle course is working. Envision our trustworthy tests like robot assistants. The unit tests are drones with mechanical arms, focusing on specific parts of the course and interacting to get different outcomes. The end-to-end test are cyborgs that run the course for us, while we sit in out comfy chairs. We, the overlords, spend less effort and take on less risk.

One important decision, before test writing starts, is “should it be a unit test, component test, integration test, or end-to-end test?” Here are some questions I ask myself before choosing one:

  • Do I want to cover a lot of inputs and verify a lot of outputs? → Probably a unit test. I’ll write it with react testing library and come up with my edge cases ahead of time.
  • Do I want to verify different display experiences based on different conditions? → Probably a component test. I’ll make sure I cover both the Visitor and Member experience.
  • Would I trust the test result more if it’s based on realistic mock data? → Probably an integration test. I’ll make sure that mock server response is up to date.
  • Am I validating a whole user flow? → Probably an end-to-end test. I’ll write it with Cypress and get feedback from the requirements folks.

There are a lot more distinctions you can draw here too, but deciding which type of test before writing will help you write fewer tests in the long run.

Consider smoke tests and sanity tests

Some might say Storyblocks has too many tests! This is a rare problem in the engineering world, where tests are often seen as annoying to write and avoided when possible. But nevertheless, here we are. Our various test suites provide a decent amount of coverage and the main complaint is that they take too long to run.

One approach I’ve seen in the past is splitting tests into a few different levels. The resulting sub-suites can be run in different situations, and even tied to the severity of the situation.

“Sanity tests” are the smallest possible set of trustworthy tests that cover critical functionality. These should be run before deploying code in the situation of a critical bug or outage. Run time should be on the order of seconds, allowing us to get fixes out faster than our normal pipeline.

“Smoke tests” is a term I borrowed from a QA team I worked with, roughly translating to “what does the QA team need to check to green-light this [change|feature|release]?” Related to the “Test requirements, not implementation” section above, this is where those who didn’t build the change have input into how to verify the change with confidence. Ideally, the tests that are identified here stack up and are automated over time to bolster the general test suite.

Photo by Hannah Busing on Unsplash

Invest in SDETs

Taking a moment to shoutout our QA’s, the folks who keep us honest, catch our mistakes with grace, and understand us better than anyone else! You work hard, and we appreciate you.

I’ve always felt weird about engineers writing tests to prove their own code passes. If only I was allowed to grade my own tests in school…

“Software Development Engineer in Test” is a job title that acknowledges that a trustworthy automated test strategy is a full-time job. Without this specialization, tests are always the first thing to punt under time pressure, and the first thing to blame when bugs sneak through.

Let’s hire and grow more SDETs, people! Here at SB, our QA team is awesome and hungry to learn coding to support their (our) goals. A little bit of Cypress can go a long way to improving quality, confidence, and the developer experience. Company leadership can help make this time and training a priority.

If you ever spot the mythical unicorn…

I’m always listening and learning of new ways to make the day-to-day better for software engineers! If you feel good about your testing strategy, please share your lore with others. And if not, know you’re not alone.

Thanks for reading! Storyblocks is building a Talent Community of awesome people like you. You can also reach out in the comments or find me on LinkedIn to connect.

--

--