Test Reporting in a Monolith: From Concept to Delivery

Building a Consumer-Driven Software Development Tool for Upstart Engineering

stephskardal

Published in

Upstart Tech

7 min readNov 8, 2022

The Monolith CI/CD Background

I’ve written about it before: we have a Ruby on Rails monolith. Our monolith build process is complicated. We are working to move things out of the monolith via a multi-step process, which involves defining boundaries in the monolith and moving towards enforcement of those boundaries with the help of packwerk. This strategy requires that we move code into engines that each run their own set of tests, and later move that code out of the monolith.

In a microservice world, the build process for a single microservice might be succinct — e.g., an API might have just one type of test, or a backend-for-frontend single-page app might have backend API tests and frontend UI tests. But in our world, we run a number of different types of tests to support our complex monolith. Our monolith continuous integration (CI) process includes:

RSpec tests (No database access allowed)
RSpec tests (Database access permitted, not unit level)
Cucumber tests (feature tests/multi-boundary testing)
Engine-level tests (tests limited to the microservices-to-be)
Jest tests (UI tests)
End-to-end tests to provide contract-like signals (to be later replaced with contract tests)
Static analysis checks to minimize or prevent other suboptimal patterns
And so much more!

A significant amount of complexity is baked into this monolith build. We leverage Jenkins + Blue Ocean to communicate the results of the build.

Early this year, our engineering team spent a lot of time working on the build to optimize our CI pipeline. Having been involved in that work, I concluded there was an opportunity to create a more actionable UI that aggregated and communicated data from the build and test output. While the Jenkins Blue Ocean view is more usable than raw text output, it also includes a large amount of build context, which can lead to a low signal-to-noise ratio. This is especially true as a couple of our steps are labeled as “Unstable” and regularly fail and we proceed anyways — any new engineer jumping in wouldn’t know if that failure was signal or noise.

Allure for Test Reporting

A while back, we added Allure for test reporting. This was an early first step away from raw text output and towards a better user interface.

Allure demo / example visualization from https://docs.qameta.io/allure-report/.

Our Allure integration required the following workflow:

In the monolith, via the Allure RSpec gem, generate XML or JSON for all tests
Send XML to S3
In a separate synchronous process, download S3 and generate static build of Allure for a single build
Send resulting static build to another S3 bucket
Send engineers to a URL for that static build output to allow test reviewing

Some of the time spent optimizing the build early this year was related to improving test reporting. Our build created XML files for every single test (passing, failed, or skipped), for every single build, generating ~75,000,000 files per week that did not translate to action. In early 2022, I modified the Allure data generation to stop generating XML for passing tests since it accounted for a large portion of file transfer and report generation. Another engineer moved report generation to a background process as a further enhancement.

As entropy would suggest, things got more complicated when two additional Allure outputs were also hooked into the build. We had Allure for 3 different test types (RSpec integration, RSpec unit, Cucumber) and sent software engineers to those 3 different static reports (URLs) to debug builds. I believed there was an opportunity here for presenting test and build information in one place and abstracting out much of the monolith build complexity. I also found Allure difficult to customize and the integration unnecessarily complicated.

Enter Hackweek and Greendot

Prior to our 2022 engineering hackweek and from my experiences of working in the build, I hypothesized that there was a better way to communicate the signals from our complex continuous integration. I believed we could present it in a way that was more organized and actionable in a manner that better served our consumers (our engineers — including me).

The value proposition was reducing time spent parsing the build output with a user interface that provided an organizational hierarchy with an increased signal-to-noise ratio. Rather than be distracted by translating various output signals, engineers could instead spend time fixing their tests and getting code shipped!

The high level hackweek technical plan in the earliest stage was:

Generate JSON upstream in build
Send JSON to S3
Consume S3 JSON files in a single page app (static build or server-side rendering, TBD)

A very early stage proof of concept UIs for the Greendot hackweek pitch: A “successful” build.

A very early stage proof of concept UIs for the Greendot hackweek pitch: A “pending” build.

I named the project “Greendot” and pitched a proof-of-concept UI with a very basic setup in React via Next.js, sharing an overall vision of the project. Read more about our hackweek pitch process here.

Hackweek Begins!

The hackweek team of ~9 got to work. We picked up from the proof of concept Next.js app I had shared in my pitch. We moved all untyped JavaScript to TypeScript immediately at the suggestion of one of the engineers. We evaluated two options for retrieving/consuming the files via JSON (server-side rendering and AWS Lambda functions), and ultimately settled on Lambdas leveraging our internal CDK toolkit.

Our hackweek goal was divided into a few major milestones:

Complete the communication between a static build and lambda file retrieval via API Gateway
Build out the UI to display test failure output and test pod output, since our build runs across a large number of pods
Make upstream changes in our monolith to generate basic build information and test output and send to S3

We divided up the work between the team and proceeded. We didn’t have an end-to-end working solution at the end of hackweek, but we did make progress on all of our milestones. We shared the result in a hackweek demo and I was able to harness support to move forward on the project.

Post Hackweek: Greendot Launch

In the weeks after hackweek and with support from several engineers, Greendot was launched while keeping parity with the existing Allure test reporting. I collected feedback from a small group of pilot testers before we were ready to deprecate Allure. After several weeks of testing and iteration, we switched over and deprecated the previous Allure test reports. Greendot was live!

Depiction of the present day Greendot UI: A “successful” build.

Depiction of the present day Greendot UI: A “just started” build.

Depiction of the present day Greendot UI: A “failed” build.

It has been ~6 months since Greendot was launched. Since then, a few other significant changes have been made to Greendot:

We’ve improved usability and optimized data retrieval. We moved to leveraging the popular useQuery plugin instead of our own homegrown file retrieval mechanism, for example. We also added pagination to the Lambda to ensure all of our builds were retrieved for a searchable dropdown of recent builds.
We introduced our in-house React atomic component system that is leveraged by a number of our growing frontends. We moved hosting of the static build to Vercel, which is now supporting our other frontend apps. This provides a layer of consistency to engineers jumping in and out of our frontend code.
We added test reporting for our Rails engines inside the monolith — introducing a fourth type of pod that we collect test output from. This allows us to view test report output for code that is being incrementally shifted out of the monolith.
We built out test metrics on a single dashboard. We also leverage DataDog for this, but the custom dashboard includes additional metrics not available in DataDog.
We built out metrics on a single dashboard to highlight trends in flaky tests or build status — for example, we have better visibility into a large number of build failures due to a test flake or a build infra change. This data reveals prioritized and actionable flaky test management.

What’s Next for Greendot?

Perhaps I am biased, but I consider the adoption of Greendot a success! There are a number of features in development and up for consideration:

We are actively working on release metrics that summarize our release successes in the monolith — giving us visibility to the change sets we roll out every day. This data is automatically generated during the builds and then supplemented with additional information.
We plan to send our Jest (UI) test output to Greendot for an improved usability experience.
We have considered moving the upstream test reporting generation to a Ruby gem or library that would facilitate hooking into Greendot more easily in microservices. However, I hope that our build process for our microservices never gets 10% as complex as our monolith!
We have considered reporting on pod memory and usage stats to better understand the cost of our CI builds.
We have considered leveraging Chromatic in our Greendot build process for integration-style automation.

For now, Greendot serves its primary purpose of abstracting away complexity from the monolith build. It allows engineers to iterate quickly instead of spending that time trying to understand continuous integration build interfaces.

Hope you enjoyed the read & happy Greendotting!