Speed Up CI (Continuous Integration) 2022

stephskardal
Upstart Tech
Published in
6 min readOct 19, 2022

In an exciting late-summer event, the Upstart Engineering Testing Guild hosted its first-ever “Speed Up CI” competition.

One of the awesome graphics created by our graphic designers for our first “Speed Up CI” competition.

Let’s Rewind!

Before we dig into the details, let’s rewind to share a bit of context!

Upstart has a growing engineering organization that is working towards breaking up our monolithic Ruby on Rails application into microservices. Our platform team is busy building out components that can be leveraged by these microservices-to-be. In the meantime, many of our product engineering teams remain focused on building in the monolith.

Earlier this year, we spent significant effort on speeding up our pipeline, shared in this blog post. Before that work began in late 2021, our continuous integration (CI) pipeline runtime for the monolith was hovering at around an hour. We made significant code and infrastructure improvements in that cross-team initiative to arrive under a 20 minute build time.

Fast forward 6 months, and our build times have been steadily increasing as we continue to add to our monolith. Our end user build time was ~20 minutes, but that time represents a total of 24 hours of compute time on parallelized test pods. We run all of our tests in every trunk branch build. We are also actively improving the selectivity of tests to run a subset of impacted tests per feature build — we would prefer to run only the tests that we need for a particular change set.

Because CI time has continued to climb, our Upstart testing guild leaders came up with the fun idea of gamifying CI optimization with a competition. Engineers love gamification! For a monetary incentive and bragging rights, any software engineer was allowed to participate to speed up the build. The incentivization was designed to pay itself off easily in 1–2 months, in good will, increased engineering productivity, and reduced CI costs. The competition lasted a few weeks and pull requests went through the normal process of peer review. Participants were asked to self-report data throughout the challenge.

What was the outcome?

Before I dig into some of the specific changes, I’ll summarize the overall outcome. In a matter of a few weeks, 15–20 active participants reduced the overall build compute time from 24 hours to 15 hours.

Compute times per build and the improvement from the competition. The competition took place from late July to mid August — where there was significant improvement in the overall CI runtime.

The end user saw reduced build time from around 20 minutes to 15. While 5 minutes doesn’t sound like much, 5 minutes across 200+ test pods was significant and considered a big success!

What changes were made?

There were a few high-level impactful changes that drove much of the success, described below.

Owning Teams Evaluating and Improving their tests

When the testing guild board originally conceived the idea for a competition, they expected participants to focus on their team owned tests and dive into optimization there. This happened!

Several participants took ownership of their team’s tests. They ramped up on best practices and advocated for those best practices moving forward. They participated in the competition community and are now in a position to evangelize lessons they learned to make a greater impact.

Reevaluating Tests, Coverage and Removing Tests

A significant amount of speed up happened in the form of removing tests.

We found a large number of integration (Cucumber feature) tests that were redundant. Coverage metrics on integration tests were used to identify redundant test signaling on feature tests and manual examination was able to confirm that UI elements were tested in unit JavaScript tests. As a result, some of the most expensive tests in our suite were removed.

We also found a large number of tests to be removed which were no longer necessary because of process change or feature deprecation. For example, we were able to remove a number of localization tests because our process for managing localization has changed since its launch.

Infrastructure Changes

While a number of infrastructure changes were implemented earlier this year, one participant greatly contributed in further improvements and iterations on the infrastructure and test splitting side. These changes included:

  • Expanding spring application to the majority of tests by merging multiple test profiles.
  • Adding the parallel gem for speedier automated annotation checks.
  • Further improving test load balancing, splitting across test pods. This work continues even now, after the competition.
  • Adjusting test pod database configuration for optimization: work_mem, fsync, full_page_writeoffs.

I had previously thought that there were not many opportunities for systemic changes, but was happy to be proven wrong.

Reducing Data Setup

A surprising amount of gains were found in reducing data setup. Because we work in a monolith with minimal restrictions on bounded context interactions, much of our code and tests rely on a base model and its many associations. The code is dependent on understanding the complex relationships coming from ActiveRecord.

For example, it is not uncommon to see something like this in our codebase:

def has_resolved_other_thing?
base_object
.some_association
.thing_that_belongs_to_association
.any? { |z| z.other_association.resolved? }
end

At first glance, that may not look terrible. However, this base object has grown in the 10+ years that Upstart has existed, as has the complexity and dependencies of these associations. This code is hard to test without proper setup or extensive mocking. To expedite test writing, we created custom factories to create all the corresponding necessary objects, for example:

def create_object_with_resolution?
# create base object
# create some_association
# create thing that belongs to some_association
# create other_association on thing_that_belongs_to_association that is resolved
end
def create_object_without_resolution?
# create base object
# create some_association
# create thing that belongs to some_association
# create other_association on thing_that_belongs_to_association that is not resolved
end

With a large, growing engineering team contributing to the monolith and a lack of well-defined interaction boundaries between objects, this test factory abstraction got out of hand! The increase in complexity and model size led to a significant impact on performance and a large disconnect, in that an engineer may unknowingly impact test performance negatively with a code change that does not appear to be related to the tests. These factories became the easy go-to option for setting up all the data when you actually only needed a portion of the data.

The competition highlighted the overuse of this pattern and we took steps to move away from those custom factories:

  • We strongly encouraged moving to unit tests with no database setup
  • We moved custom factory setup to to FactoryBot, with simultaneous efforts to control some bad patterns that come up with FactoryBot, described more in this article
  • If all else fails, we advised minimizing object creation by sharing objects across tests with redundant setup via let_it_be and before_all

Data setup improvements yielded significant gains and set us on a better path to write more testable code with an improved separation of logic.

Another great graphic created by the Upstart graphic designers!

That was fun! Shall we go again?

As a participant, I learned a lot during the competition working in code that I hadn’t touched before. I learned about some of the greatest opportunities for improving our test suite, and looked ahead to systemic changes to reduce the introduction and adoption of these bad patterns.

One lesson learned in the first inaugural competition by the testing guild was that we need a little bit more clarity on how to measure systemic impactful changes. It is difficult to measure the cumulative impact from engineers who make broader systemic changes. We aim to provide better metrics for measuring this impact and look forward to another “friendly” competition!

--

--