Find bugs before your users do: closing the software development risk exposure gap

ITHAKA Tech Staff
ITHAKA Tech
Published in
6 min readJul 28, 2023

By Dane Hillard

JSTOR has used a variety of methods for finding software bugs through the years. Here is what we have learned. Image courtesy Dane Hillard.

Let’s face it, we all want to deploy software to users that’s bug free, but achieving that 100% of the time isn’t possible. There are, however, ways to minimize risk exposure. Our testing process at ITHAKA has evolved over the years to reduce live bugs, taking into account issues of scale and pace.

Every organization and product is different. Software deployed at high frequency benefits from fast feedback loops to avoid drift, as it’s difficult sifting through coincident changes to find the point at which a bug was introduced. Software deployed at low frequency benefits from novel verification approaches that minimize the exposure of bugs to users, because the next release won’t be for another several months. Our testing process for our digital education platform JSTOR has evolved over the years to address issues of scale and pace. The approach has grown increasingly sophisticated. Here’s how we’ve learned to close the risk exposure gap in software development for JSTOR.

Test after release

The test after release method to software delivery. Image courtesy Dane Hillard.

This is perhaps the most ubiquitous approach to software delivery in the industry thanks to its relative ease of implementation, and one that we used early on. In this mode you first make your changes live for all users, and only then run integration and end-to-end tests.

The major challenge in this approach is that changes are live from the time you complete a deployment until the time the integration and end-to-end test suites complete. During this time, users are experiencing whatever breakages you may have introduced into the code. You’re effectively shipping unverified changes into production and verifying them a while later.

This approach is good enough for a wide swath of use cases; if you have low and infrequent usage or you’re only releasing a few times a year, testing after release might serve you for a long time. You also have some room to optimize your success with this approach. You can improve the speed of your deployment and testing phases, or ensure you can roll changes back using well-known mechanisms so that if something does go wrong you can minimize the duration of the exposure.

On JSTOR, where we have millions of active users at all hours of the day and release changes multiple times per day, exposing any breakage for more than a few minutes typically leads us to call an incident. We’ve come to dislike testing after release, and we’ve explored other methods when possible. Keep reading to see how you might be able to do the same.

Separating availability from liveness

Separating availability from liveness in software development. Image courtesy Dane Hillard.

The main risk in testing after release is the lag between making changes live and receiving feedback about any issues those changes introduced. A valuable improvement to this approach separates the act of making changes available from the act of making changes live for all users. In this new mode, you deploy your changes and run integration tests against them in a sandbox environment, then make the changes live to run end-to-end tests in the full environment.

This decoupling of availability from liveness creates a number of benefits for developers and quality software engineers:

  • Engineers can perform exploratory testing in a sandbox environment using live data before making a change live for all users.
  • Deployments can happen early and often without creating churn for all users.
  • Users aren’t exposed to integration-level failures because changes aren’t live until they pass integration testing.

This approach isn’t without its own challenges; the capability to create a sandbox environment for every change requires up-front investment and may increase your ongoing operational costs or logistical overhead to the point that they outweigh the benefits. For dynamic applications, it may not even be possible to achieve within your infrastructure constraints. On JSTOR, we began by using this approach for micro frontend architecture because we can build arbitrarily many copies of static applications with only marginal increases in cost, mainly attributed to storage.

We gained a lot from this change alone, but the lag between integration and end-to-end test feedback still nagged at us. We also saw an interesting capability arise from this decoupling. By design we needed a signifier to indicate which of many possible versions to serve to all users, and from there it was just a short hop to be able to signify which version to serve to a specific user. This further broadened our thinking to the next phase.

Aside: we found that this change forced us to reckon with language. Words like “deployed” start to become less specific and potentially confusing under this more sophisticated model. We’ve updated “deployed” to mean “available” and started using the word “pinned” to mean “live for all users.” Staff can “override” the version they see for themselves. You should consider your cultural norms and how they need to adapt as you introduce new delivery paradigms.

Separating integration from liveness

Separating integration from liveness in software development. Image courtesy Dane Hillard.

This approach is the current pinnacle of our testing practice for JSTOR, but is also the most sophisticated of the three described here. In this mode you deploy your changes, run integration tests against them in a sandbox environment, then run end-to-end tests against the full environment with your changes stitched in through an override. You can run all functional testing against a set of changes without ever exposing them to external users, only integrating the changes after all checks have passed.

It might sound like a big leap to achieve this level of freedom, but it turns out to be a natural extension of the decoupling created by separating availability from liveness. Because your design must already allow one of a number of deployed versions to be live, this approach requires only that you have a way to signify the live version at the per-request level. Staff and automated suites then need only to know the deployed version of interest and override the system to serve that version. This provides some of the same benefits as feature flagging but is part of the deployment process for every change.

Aside: One pitfall of this approach is people setting overrides for themselves and forgetting to remove them later, so that they keep getting old functionality after new functionality has been introduced. We’ve discussed automated expiration of overrides and visual debugging indicators as potential opportunities to reduce this. It’s a great lesson in how to build on-platform features whose audience is internal staff instead of our typical users.

Where this testing and delivery approach leads

We believe this approach to testing and delivery will support us over the next several years. We’ve continued finding natural extensions to this approach, including the ability to serve different versions to different populations — effectively integrating with our experimentation framework for hypothesis testing, which we’ll be writing about in the future as well.

Because tight feedback loops are increasingly important for our pace of development, we’re also exploring ways to invert the override concept so that developers can run integration and end-to-end tests against their local environments, falling back to the full environment for requests that aren’t handled by locally running applications. This helps uncover surprises earlier, closer to the developer’s working context.

Finally, we’re starting to supplement our functional testing practices with practices like canary deployments, automated load testing, and steady-state monitoring to ensure architectural and configuration changes don’t introduce performance regressions.

However these shake out, they’ll certainly lead to more learning that we look forward to sharing later on. See you next time!

Interested in exploring engineering careers and remote software development jobs, Ann Arbor engineering jobs or New York edtech jobs with ITHAKA? Check out our ITHAKA jobs page to learn more and speak with recruiting.

--

--

ITHAKA Tech Staff
ITHAKA Tech

Insights from the ITHAKA engineering team and beyond.