The Change in Testing that Reduced Mobile Release Bugs by 93%

Simon Labute
Dec 1, 2021 · 9 min read

In Samsara mobile, we managed to reduce our escaped bugs at release by 93%, while also reducing manual testing time by 97.5%. We did this primarily by changing our culture around the quality of our testing when writing features, and also by landing some new testing infra which drastically changed the marginal effort to write tests that fully cover a feature. Let me get into what I mean by this.

Previous state of the world: Unit tests & lots of manual tests

Before we overhauled our testing approach, we ensured quality through a few layered checks. In automated testing, we had unit tests around each piece of our javascript stack (where the features live)

Previous testing approach

These gave a baseline for regression detection and allowed us to set an aggressive 90% test coverage enforcement in CI, but they don’t really give you the confidence that your feature works end to end as a customer would experience it. So we kept a long list of manual feature tests that were run through whenever we were evaluating a release build.

The manual tests were interesting. They’re foremost fairly costly in time and coordination, but also prone to human error and they frequently got out of date as engineering updated features. It’s near impossible to maintain an exponentially expanding list of configurations a feature can execute under. We ideally want our QA engineers doing impactful exploratory work, not trying to stay on top of an increasingly large list of regression tests.

Keeping on top of regression tests is especially difficult with our mobile application which is expected to operate as an eventually consistent system. Our users are truck drivers out on the road, where frequently they don’t have cell coverage. Our app requires full functionality while offline, and needs to reconcile that with the server once it comes back online.

Looking at this process, we were facing

  1. An expensive quality process
  2. Still a lot of bugs that made it out
  3. Engineers were spending time writing unit tests which weren’t significantly reducing the burden of (1) & (2)

Overhauling our testing approach

We wanted to enable developers to write tests that resemble user flows as much as possible. They should be written describing what users do, and assertions should be on user-facing items, not internal state. The automated tests we write should be powerful enough that we can confidently remove manual feature testing as a result. There are other frameworks that encourage this style of testing out there, we certainly aren’t the first (react testing library, puppeteer tests, selenium testing, appium, etc).

When thinking through solutions, there are a few technical attributes we thought fundamental to having a framework that would actually be used (and liked!) by engineers:

  1. It should be fast: it can run per pull request (PR), not some nightly thing.
  2. Not flaky — early prototyping using Appium (running tests on a real device) had a 60% flake rate for feature tests. Running real hardware can be quite flaky sometimes, especially for mobile
  3. Not difficult to maintain when making desired code changes.
  4. A secondary goal: engineers should spend less time writing one test for their feature as a whole instead of a unit test at each layer of the javascript stack.

We also observed that most of the annoying parts about integration frameworks (bullets 2 and 3) come from testing with a real server, real production data, or real devices, and that most of our bugs didn’t come from there, but instead just from application layer logic in mobile.

🌊 Introducing Flows: User focused integration tests

Given the above, we built a small in-house framework called Flows (for user flows, the philosophy behind the framework). At a fundamental level, Flows is just an in-memory javascript context object where we mock out anything external to our javascript code. This includes things like server endpoint calls, local device storage, device interactions like push notifications, etc. It then presents a consistent api to test authors to be able to configure and assert as necessary against these mocks.

Flows runs the whole javascript layer

We run all of our javascript application logic (React, Redux, utils, infra) in memory and use standard jest assertions for everything. We render our react layer using react’s test renderer, and we built some utilities around it to enable easy searches for test authors. Here’s what a test ends up looking like in Flows

test("My feature works offline", async () => {
// initialize a "world" to hold the mock device. The world
// holds a rendered react tree, a full redux store, along with
// any other modules (mocked or not) that the app needs to run.
const world = await makeWorld();

// The flow registry holds very common sets of user actions. In
// this case logging in and landing at the home page of the app.
await world.flow.startAtHome(world)();
// A user takes an action. We add testId props to react components
// to interact with
// <Button testId="some-test-id" />
await world.act("some-test-id"));

// Our app loses internet connection (a root requirement of a lot
// of complexity in one of our mobile apps). This is achieved by
// our server connection mock module knowing how to error
// appropriately in a mocked "offline" mode.

// A user takes another action and we see an optimistic update
// applied to the app
await world.act("some-other-test-id"));
// we use standard jest assertions. A piece of text is rendered
"User facing rendered text"
// we go back online and see the queued server request go out (our
// connection mock collects any calls throughout the test)
variables: {
data: "input parameters were wired up correctly"

When we add new infra to our mobile codebase, the onus is on the infra developer to make sure their infra runs in both production and test environments, and they improve the Flows framework as necessary. This means that any work we do mocking out the externalities is done once and available to all test authors. It’s obvious in hindsight, but there’s real power in having an official pattern to sharing test code. Any externality needs to get mocked as part of unit tests that target it, and one aspect of a formalized test runner like Flows is there’s a commonplace for folks to share the mocking work they do.

We also get to offer fairly granular control over our server endpoints by keeping them mocked in Flows. This would be quite difficult in a fully end-to-end environment like Appium. For example, we offer an API to test authors where they have full control over the timing of server responses

// We mock a server endpoint, marking it as manual mode which gives 
// us full control over when the server will "send" its response.
const { flush } = world.core.updateServerResponse(
query: SomeQueryName,
result: {
updateDevice: true,
{ manual: true },
// the user takes actions which trigger the network request
await world.act("some-test-id"));
// The user sees the app respond to their actions while the network
// request is pending
// Now our request resolves and we can assert on the app's state
expect(await flush());

For a second opinion, you can check out a talk from LadderLife who have a similar testing philosophy. They leverage having client & server in the same language to run the server as well instead of mocking it.

💥 Impact a year later

We’ve been using Flows for a little over a year and have seen measurable improvements to quality.

  1. Manual testing time is down 97.5% after adding Flows coverage for them (minus the handful remaining testing things outside the scope of Flows) (left chart)
  2. Release blockers are down 90% (bugs found at manual test time as opposed to during CI pre PR merge) (right chart)
  3. Escaped bugs in releases are down 93% (right chart)
We dropped manual test time by 97.5% (left), Release blockers dropped by 90% & Escaped Bugs dropped by 93% (right)

It’s worth noting that trends (1) and (2) could have been achieved by just not testing anything anymore, but then we’d expect (3) to shoot way up. Seeing all 3 trends together is what gives us confidence our approach had the impact we intended.

💞 Oh, and Engineers Love It

The impact didn’t come with a huge cost to engineering velocity, either. Engineers are hugely in favor of the framework. In our most recent survey of mobile developers, 50% of respondents specifically called out Flows as one of the items they’re most thankful for.

Engineers definitely recognize the value of having fewer bugs. It means less interruptions and more time to focus on new value to customers. But I think there are benefits to a test runner like Flows beyond just bug reduction. Having more integration style tests that aim to run as much of our mobile javascript stack as we can in each test offers clear reduction in engineering lift to get coverage of features:

  • You write one test for your feature instead of one at each layer.
  • When writing a feature, you don’t have to structure your code such that it’s testable. Frequently in my career I’ve seen PR comments to the effect of “if you refactor this to pull out such and such helpers, we can unit test them”. This is additional complexity for an engineer to consider, and a more powerful testing framework means in mobile dev at Samsara, you just write the feature however it makes the most sense, and you know it’ll be testable!
  • When writing infra, you make sure your code runs in production and in testing, and then others never have to think about it. Mocks are written once and shared with all test authors; furthermore, when jumping into another feature’s tests, you already know the common mocks everybody is using.
  • There are great docs to go along with the common patterns. We can actually write these docs since tests aren’t using bespoke mocks on a per test basis.

Final thoughts

Creating a new paradigm and getting everybody to use it isn’t trivial. We ended up allocating meaningful engineering time to have a concerted effort and cover all the previously manual tests in Flows. This meant that we

  1. Got to stop doing the manual tests which were time consuming for Samsara (the 97.5% reduction in manual testing time)
  2. Got a broad footprint of initial tests which influences the culture across teams. Engineers can tweak tests and patterns they come across to fit their own needs which jumpstarted the adoption of Flows. Engineers then hold each other to the new cultural norm for how much Flows coverage is expected with their PRs.

We ended up dropping our 90% code coverage check-in CI after Flows was well adopted. That instrumentation was causing us to have 25% longer test times in CI and at the end of the day, code coverage is a crude proxy to “are the features covered in ways that match users’ workflows”. With Flows, we saw we had coverage for those user flows, and could drop coverage as a metric.

I think one thing that stuck with me from this experience is that as an engineering team, you should think of your testing infrastructure as first class. We’re often very good at investing in making it easy to write features quickly, but sometimes we neglect to do the same for testing them. Engineers spend a large percentage of the product development process writing tests. You should make that easy.

We’re just starting adoption of Flows in our website as well. Hopefully, we’ll be back in another year with similar impact numbers there.

Interested in working at Samsara? Check out our open positions. We’re always looking for great people to join us as we learn and grow together, and if you love learning and building things in a highly collaborative environment, we’d love to hear from you! 👋

Samsara Engineering

Technical thoughts and explorations from the Samsara…