Debugging Intermittent Test Failures with Karma and Jasmine

Oliver Carlson
7 min readJun 13, 2023
Test failures! If you’re still using console.logs, there are better ways in Step #5.

Have you ever had an intermittent or random failure in your unit tests?

I did, and I was pulling my hair out trying to figure out the problem. And I’m not alone. Even big name companies like Google and Microsoft struggle with flaky tests with one study indicating that 41% and 26% of test targets that had previously passed and failed at least once, respectively.

Having intermittent test failures causes companies to miss sprint and release targets and lead to a lot of wasted developer productivity.

Below, I will outline my experience dealing with them and some recommendations in a step-by-step manner to help anyone else experiencing these issues resolve them faster.

Background:

Working on a frontend project, I took a maintenance ticket for major NodeJS version upgrades and some npm packages updates. We are using vanilla JS and writing unit tests with Jasmine and running them with Karma.

The Problem

At the outset, this sounded like it would be a rather straightforward problem to solve given that this is a frontend library, so the effects of a Node version upgrade should (theoretically) be minimal and relate only to the build and test steps of the project.

And the first few fixes appeared to be going according to plan: add some simple test setup scripts to more cleanly handle test setup and teardown and modify a few parts of the codebase which were impacted by package updates.

Once I was down to one remaining failure, tests which I had previously fixed began to fail again, with the same error messages as before.

I was greeted with the following 5 runs of tests with no changes in between:

Run #1: FAILURE! 2 Tests failed.
Run #2: SUCCESS! All 200 Tests Pass.

Run #3: FAILURE! 12 Tests failed.

Run #4: FAILURE! 1 Test failed but the test script stopped prematurely and 75 tests weren’t run at all.

Run #5: SUCCESS! All 200 Tests pass.

After I was greeted with these test results, I decided to isolate failing tests followed by suites to see if I had missed some specific combination that was causing problems.

#1 Sanity Check — Isolate the failing test(s) and suite(s)

After many test reruns with no failures, I had a strong suspicion the test failures were due to test order dependency, one of the two likely culprits for intermittent test failures.

Typically tests fail intermittently due to either (a) asynchronous code or (b) incomplete setup or tear down (aka order-dependent) test failures. After thoroughly double-checking my asynchronous code, none of the tests appeared to be failing due to race conditions or unhandled promise rejections. Which led me to think that it could be related to the setup and tear down of my tests.

Test software will default to running your tests in a randomized order to help prevent this exact scenario from occurring in the first place. This was something that I was vaguely aware of, but hadn’t really given much thought to prior to this. Thus, the next step to resolving this issue is to try and get your tests to fail consistently and the the simplest way is to…

#2 Disable random test ordering

Turning off randomization in your karma.conf

(NOTE: if you are running the tests in watch mode, you must manually restart the script for changes to karma.conf to take effect).

In my case, I was lucky and this uncovered several errors, so I began debugging those with the tips mentioned in step 5, below.

Once I resolved the tests that failed consistently by disabling test randomization, I re-enabled randomization and ran the test suite repeatedly to see if there were more errors. Many tests failed from this set of re-runs, so I began to look into how to consistently get some of these to fail.

#3 Let’s get some seed values

A seed value is what allows us to “lock in” a random sequence of test execution.

As an analogy, you can imagine a book with 1,000 pages numbered 1–1,000, which we then randomly swap. So the book still has all 1,000 pages but the numbering doesn’t match with how far into the book each page is.

For example, if we open the book and flip 10 pages in, instead of being at page 10, we are at page 72, the following page after is numbered 341, then 290, then 862, etc.

The seed value can be thought of as how many pages you flip into the book initially. So every time we open the book and flip the page 10 times (seed 10), we get the same sequence of pages [72, 341, 290, 862, … ].

We could instead flip 11 pages in (seed 11) and get [341, 290, 862, … ] or flip X pages in and get a different random sequence e.g [7, 899, 42, 605, …]

Using seed values, we have a way to reliably fail a test means that we can reliably fix a test. Hooray!

In order to get this information output to us with Karma and Jasmine, we need to install an npm package and update our karma.conf file.

Adding karma-jasmine-order-reporter will output the seed value used for each test run. We can now isolate broken seed numbers!

Once Karma is set up, we’re ready to run the tests a bunch and find some reliably failing seed numbers.

#4 Find some broken seed numbers

With a smidgen of bash, we can start running the test suite 5–10+ times with one command to find some bad seeds to start debugging:

$ for i in {1..10}; do karma start >> Karma_Log_Output.txt; done

Depending on the size of your project, this command will likely produce 100,000+ lines of output, so I opted to pipe it into a file.

With consistently failing seed values, you can begin debugging the test(s) in question. Ideally, the stack trace is quite clear and will give you a good idea as to what’s going wrong.

However, in my case, several of the stack traces provided no helpful call stacks, here’s what to do in that case:

#5 Step-by-step debug with Chrome’s Debugger

Conditional Breakpoints:

One of the most critical tools to use, are conditional breakpoints. Adding a conditional breakpoint with the developer tools is easy: right click on a line of code and then type in the condition.

A conditional breakpoint at this line of “!window” will only pause execution at this line when we don’t have the window object.

Looking backwards with Chrome’s Callstack:

If you need to go backwards from a debugger stop point, you can use the function call stack to examine the previous function calls.

The Chrome Debugger’s call stack let’s us see a preview of all function calls prior to the current line.

Clean up the Call stack with Ignore Lists:

It’s likely that there are node_modules and other code that you don’t want to see cluttering these files up, so we can clear that up with the ignore path.

Adding patterns to the ignore_list will help clear up the call stack and let you focus on what’s important

With these tools and some elbow grease, you should be able to dig through intermittent test failures with ease, even if the test failure messages in your terminal don’t display a call stack.

What Didn’t Work:

Now that I’ve outlined what worked and what I would recommend, I would like to mention something that I have seen recommended by others that I would caution against: manually excluding test(s)/suite(s).

After you have consistent failures from a seed value, this can be helpful if you have some specific tests or pieces of code that you want to examine, but without consistently failing tests, you’ll just end up wasting time because you can’t tell whether or not disabling a test actually fixed anything.

Summary

  • Tests run in no particular order and this is the ideal scenario — unless you specifically turn this off for debugging.
  • Double check that the tests are actually failing intermittently by testing them in isolation with `fit` and `fdescribe`.
  • Turn off random execution to try to get some tests failing consistently.
  • After you have fixed any test failures that show up with sequential test execution, start getting some seed values to fail consistently.
  • Use your tools! Conditional breakpoints and line-by-line debugging with Chrome Developer tools are your friend.
  • Avoid disabling tests as a first line tool to fix intermittent test failures.

I hope you found this post useful and ideally, you have taken the effort to investigate and fix any intermittent test failures in your project (your coworkers will appreciate it!).

If you have any tricks or tips of your own when it comes to diagnosing unit tests, please feel free to share them here with me.

--

--