How to Ensure Release Candidates are Good2Go? Automated Performance Pipelines.

Published in

Intuit Engineering

6 min readApr 12, 2023

This blog post is authored by Intuit’s Harsha Srinivasan, Staff Engineer, and Raj Parameswaran, Software Development Manager, Intuit Persistence Service.

Today’s users of mission-critical financial software expect it to be extremely fast and utterly reliable. Applications must be able to handle hundreds of thousands of transactions per second, while providing 99.999% (five nines) availability. Even a few milliseconds of delay deep in the technology stack can slow things down noticeably for end users, leading them to abandon what they’re doing or, worse, to abandon the product altogether.

Intuit is the global financial technology platform that powers prosperity for more than 100 million consumer and small business customers. To provide the speed and reliability our customers expect, we demand perfection in our lower tech stacks. We use our Persistence Platform to create awesome experiences with TurboTax, Credit Karma, QuickBooks and Mailchimp. The platform relies heavily on performance testing to ensure new features meet key performance indicators (KPIs) and integrate seamlessly into our existing applications.

Rigorous testing is traditionally manual and resource-intensive

Any time we add a new feature to our products, we must be aware of ripple effects across the system, including downstream effects on overall resource utilization and our ability to maintain the service level agreements of existing features. Before we integrate new features, we have to measure these impacts so we can identify and implement any alterations to the code or the platform itself that will ultimately allow for a seamless deployment.

Typically, engineers throughout our industry have relied primarily on manual overlays from monitoring tools like Wavefront and Telegraf to perform this type of testing. Running two different queries across multiple timeframes can help determine whether a release candidate under test meets or exceeds expectations. Depending on the type of application and where the application fits in the technology stack, engineers apply a diverse set of metrics and a variety of methodologies to certify a release candidate.

This manual process is time-consuming and creates the potential for human error. If an engineer happens to misread the results, they could deploy a suboptimal candidate, which can degrade performance and even necessitate a rollback of a critical feature.

Based on sprint data, we’ve found that in a sprint of 80 person-hours, engineers typically spend about eight person-hours just on manual testing. In other words, roughly 10% of an engineer’s effort goes to certifying that a release candidate is suitable for production.

Automated testing can improve efficiency and reliability

To save engineers time and reduce the potential for human error, we sought to create a generic, automated performance decision framework. That framework became Good2Go, our automated performance pipeline and decision-making platform that ensures release candidates truly are ready for deployment.

We started by enumerating the steps an engineer goes through to certify a candidate under test:

Run the baseline test with the candidate for the specified amount of time.
Collect all the KPIs for the application. Typical KPIs include:

CPU
IO utilization %
Load average
Application response time metrics
Request wait times, etc.

For all the KPIs listed in the release process:

Get all the KPIs for the candidate from the critical monitoring tools
Get all the KPIs for the “golden baseline” (a candidate that has satisfied all the entry and exit criteria for a good release to production).
Overlay the data, and manually make decisions on each of the metrics.
If everything looks good, certify the release. Otherwise, provide feedback to the team on what the issue is and recommend a fix.
Repeat as needed.

To automate this process, we had to perform the following steps and analyses:

Identify APIs for gathering the critical metrics from the release candidate and the golden baseline, if available.
Determine the optimal way to gather the data from step 1 and execute the comparison.
Pinpoint the specific metrics that failed the comparison.
Communicate the status of the test to the team for analysis and rerun.

How did we ensure our solution was Good2Go?

As we went through the development journey, we found that well-defined APIs already existed for our use case, so we used the following API sets:

A sample test and report run by the Intuit Persistence Services Team shows such a comparison.

We integrated this automated testing with our CI/CD (continuous integration/continuous deployment) pipeline in two stages:

In the functional testing environment to monitor for early latency changes and stop the pipeline, if we observed a delta greater than our tolerance limit.
In the release pipeline for production to run baseline tests and monitor for KPIs at a more granular level.

To deal with core changes at the infrastructure and the container level [such as Java Development Kit (JDK) upgrades, core library upgrades or EC2 instance type changes], we also run a stress and endurance test to compare KPIs in two steps:

Step 1: Workflow for code check-in

Step 2: Workflow for production release certification

Once the tests are run, the platform sends Slack messages to the teams with a link to the results and the overall status of the run. If failed metrics exist, the platform highlights them so engineers have the context they need to troubleshoot the issue.

Good2Go reduces certification and deployment times by 75%

Integrating Good2Go into our pipeline produced substantial improvements:

The average time to certify a release fell by 75%, from 8–10 hours to about two hours.
Engineers’ efforts are limited to looking at results and troubleshooting failures.
There are times when engineers optimize an API for performance, resulting in a delta that is flagged by Good2Go. With our automated performance pipeline, developers are provided with the data they need to demonstrate that these optimizations outperform a “golden baseline.”

The speed and efficiency we can achieve with Good2Go has other benefits as well. Since the teams aren’t executing manual overlays and comparisons, we have bandwidth to run more experiments and test more configurations in less time. For example, the persistence team experimented with a back-end configuration that looked like it would be cost effective while maintaining performance parity. A test with Good2Go found through multiple iterations that this particular configuration‌ wasn’t suitable for the team, so we shelved it. This not only saved time, but avoided the possibility of rolling out a suboptimal solution.

All in a day’s work!

Further refinements are yielding efficiency gains

We’ve continued to refine Good2Go since it first launched.

The launch edition of Good2Go had a self-contained configuration file. We experimented with different teams and found that some deployment strategies require config files that are located in diverse locations. To accommodate these use cases, we refactored the location of the file to be injectable.

We also learned that some teams use different tools to monitor metrics, in addition to the standard set. To accommodate their needs, we modularized the configuration to enable and disable certain auxiliary monitoring tools so that teams could customize their toolsets.

In the first version of Good2Go, we found that our tolerances were too strict: a difference of even a single millisecond would fail the pipeline. So went back to the drawing board and refactored Good2Go to allow teams to set individual tolerances according to their needs.

We’re working actively toward genericizing Good2Go even further, and making it an open source project. We’re excited to share this flexible automated testing and certification pipeline with the community, so stay tuned!

Many thanks to the team!

Many talented engineers participated in developing Good2Go. The gestation and success of the project is thanks to the efforts of Karthic Muthuvelu, aided by Achal Kumar.