Breaking changes detected

Nick Lemoing

Published in

Engineering@Noom

10 min readApr 12, 2024

How Noom created a developer-friendly early warning system for breaking changes using GitHub Actions

Kintsugi is the art of repairing broken pottery with a mix of lacquer and gold dust. While beautiful, the result is never the same as the original.

Backward Compatibility and Narrow Framing

We’d like to be able to say “Never have I ever … launched a backward breaking change!” but unfortunately, we have. Many, many times :)

In the mildest cases, a backward breaking change means a compile error for someone. In the worst cases, entire business systems stop functioning. In either case, our response to a production incident is to follow-up with a 5-whys analysis to determine what went wrong and to keep it from happening again.

Our 5-whys culture is oriented towards identifying the tools and processes that failed instead of putting the blame on a human. If a human forgot to check something, for example, we read it as a gap in our tooling and not the individual’s fault.

Scrolling through past 5-whys documents at Noom, we noticed a clear pattern: a breaking change is inadvertently introduced with a code change, somehow slips past code review and continuous integration, and ultimately wreaks havoc in production. We knew it was crucial that we find a way to correct this pattern.

When we look at the Pull Request that introduced a breaking change after the fact, the problem always looks so clear. For example:

The old date field was removed and was still used by a consumer we didn’t consider
The old enum value was removed, but still used by a consumer we didn’t consider
A new gender enum value was added, breaking a consumer we didn’t consider
The field type was changed without considering a consumer

At this moment it might be easy to say that the code author and the reviewer missed a clear bug. But in the spirit of our 5-whys process of blaming the tools and not the person, we knew we wanted to do better. We wanted to put safeguards in place to help the developers, not blame them.

When a code change is being written, the author focuses on the task at hand. They implement their changes, execute their test plan, and everything seems fine. Both the code author and the reviewer have a narrow scope bias and that’s OK: this is what gives them focus and velocity. But narrow focus makes it difficult for them to fully understand the impact of their change. In the broader scope of the entire Noom codebase, it’s very easy for them to miss an unrelated consumer. So, what if our tooling could automatically raise awareness and let us know if we are changing a public interface in a backward-breaking way? This is where we decided to bring diff reports into play.

What is a Diff Report?

A diff report provides a way to visualize the effects of a code change. Surprises can be cool, but we prefer to save them for birthdays and not code in production. :)

The following example shows how automated tooling raises awareness about effects of API schema changes. The Pull Request in question changed a field from array type to a scalar type:

When engineers see the warning, it encourages them to think about why the warning is appearing and to offer a justification for the change.

In other situations, diff reports can help us with:

Understanding the performance impact of a code change (example in diagram)
Docker image size impact report (example in code)
Breaking changes to APIs
Visualizing changes to auto-generated code or data structures (e.g. OpenAPI codegen, AWS CDK, Kubernetes evaluated config, Airflow DAG structure)
Interpreting data pipeline result changes over a test dataset

Every code change can be viewed as two snapshots of the codebase: base and head. Base is the baseline version of the codebase being compared against, while head is the version of the codebase that has changes applied to it. A diff report is generated by two processes:

the transformation process transforms a codebase to an artifact that is “diffable”
the report process consumes two diffable artifacts and produces a diff report readable by humans.

The transformation process is run on both base and head, while the report process takes the two artifacts and reports their differences. This is illustrated in the following diagram on the example for the performance diff evaluation:

Each version produces an artifact in parallel, and those artifacts are compared to produce the final report.

To achieve noiseless and repeatable reports, the transformation process should be a deterministic function of the codebase: artifact = f(codebase).

Building Diff Reports with GitHub Actions

In order to run our diff reports automatically we decided to use GitHub Actions. GitHub Actions is a popular tool to configure and execute software development workflows. There are dozens of events that trigger a workflow, but for diff reports, the relevant event is the pull_request event. This event triggers when a pull request is opened, reopened or when new commits are sent to the head branch.

Step One: Compare Docker Images

Let’s suppose our codebase contains a Docker image definition in a file named “Dockerfile.’’ We will build a simple diff report that compares the size of the Docker image before and after the change, and assume that git, GitHub and GitHub Actions are used to manage the codebase.

The user story requirements of what we are trying to accomplish looks like this:

On every pull request that changes the Docker image definition, we would like to see a report that answers the following:
How big was the image before this code change? How big is the image now?
What is the absolute and relative difference?

The transformation process will build the Docker image and analyze its size. The output artifact will be a single integer (a size in bytes).

The report process will input the two numbers and write out their comparison in a readable report.

Transformation process source code in bash:

# Build the image (equivalent of compiling for programming languages).
# Although 'dive' could do this inline, we split it for clarity.
docker build . - tag "${VERSION}"

# Analyze the image size and report as a JSON object.
dive "${VERSION}" - json analysis.json

# Extract the size in bytes from the analysis.
jq ".image.sizeBytes" analysis.json > result.txt

The VERSION environment variable will be set to “base” or “head”, depending on the version of the code we are analyzing. The diffable artifact is written to the result.txt file in the last line.

The report process will take the two artifacts and compare them:

BASE_SIZE=$(cat base/result.txt)
HEAD_SIZE=$(cat head/result.txt)
ABS_INCREASE=$(( HEAD_SIZE - BASE_SIZE ))
REL_INCREASE=$(( 100.0 * ABS_INCREASE / BASE_SIZE ))
printf "The image size before this change was: %.2e bytes.\n" "${BASE_SIZE}"
printf "The image size with this change is: %.2e bytes.\n" "${HEAD_SIZE}"
printf "Absolute change: %+.2e bytes.\n" "${ABS_INCREASE}"
printf "Relative change: %+.0f%%.\n" "${REL_INCREASE}"

Step Two: Run Through GitHub Actions

In the first step, we outlined the commands that we would like to automate on code changes. Every GitHub Action workflow run is provided with a runtime context. The official context documentation describes the fields that are available. The context is different depending on the workflow type. In our case the workflow type is a pull_request, and the following relevant context fields are present:

github.head_ref, the source branch reference of the pull request in a workflow run. Other names used are feature branch or head branch.
github.base_ref, the target branch reference of the pull request in a workflow run. Other names used are base branch, or baseline.
github.sha, for the pull_request workflows, this is the commit SHA of the last merge commit between the head_ref and base_ref.

The merge commit (m1) is worth explaining in more detail:

It is created by GitHub automatically whenever a new commit is pushed to the feature branch of a PR. If there are merge conflicts in this commit, the pull_request workflow will not run.
It includes changes on the base branch [b3, b4, b5] that happened after the head branch was forked off.
The automatically created merge commit can’t be fetched locally and inspected.

When it comes to diff reports, we need to pick a pair of snapshots (commits) to be our base and head versions. We have two options nicknamed latest and stable:

The difference between the two choices are the commits [b3, b4, b5]. In “latest,” they are included on both snapshots, while in “stable” they are not. If the base branch hasn’t moved since the PR was created, the two choices will be the same.

Here are some tips to help you decide which option to use:

The “stable” diff is easier to reproduce locally, since all commits are available locally.
The “stable” diff is consistent with the “Files changed” tab in the GitHub PR UI. There, by default, the left side shows the code from the common ancestor commit, while the right side shows the head code.
The “latest” report more precisely portrays the state that will happen when the PR is merged.
The “latest” report is easier to implement in GitHub Actions, since we don’t need to run git merge-base and checkout the commit based on its output.

Step Three: Implementation

Let’s see some YAML! The “latest” option is the simplest to implement. There we first checkout two versions of our repository to different destination paths.

Latest

name: Diff report (latest)
on: pull_request
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout base
        uses: actions/checkout@v3
        with:
          ref: ${{ github.base_ref }}
          path: base/

      - name: Checkout head
        uses: actions/checkout@v3
        with:
          # Note that this is a default value for 'ref:', we could omit it.
          ref: ${{ github.sha }}
          path: head/

The transform process steps can then run their transformations on each snapshot by changing the “working-directory.” For example:

- name: Transform base
    env:
      VERSION: base
    working-directory: base/
    run: |
      ...

Stable

For the “stable” option, the situation is a bit more complex. As a refresher, our base snapshot is the common ancestor commit between the base branch and the head branch (the result of running “git merge-base base head”), while the head snapshot is the latest commit on the head branch.

Checking out the head snapshot is similar to above. The only change is in the “ref:” value:

name: Diff report (stable)
on: pull_request
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout head
        uses: actions/checkout@v3
        with:
          ref: ${{ github.head_ref }}
          path: head/

In order to checkout the base snapshot, we need to checkout the common ancestor snapshot. To do this, we need to clone the whole repository (fetch-base: 0), determine the common ancestor commit and checkout it manually:

      - name: Checkout base, step 1
        uses: actions/checkout@v3
        with:
          # We don’t have to specify the ref here since we will checkout the
          # merge-base manually in the next step.
          fetch-depth: 0
          path: base/

      - name: Checkout base, step 2
        working-directory: base/
        run: |
          # Note that omitting the “remotes/origin” prefix yields the following
          # error:
          # fatal: Not a valid object name master
          # This is because the checkout action doesn't create any local branches.
          BASE_REF="remotes/origin/${{ github.base_ref }}"
          HEAD_REF="remotes/origin/${{ github.head_ref }}"
          MERGE_BASE=$(git merge-base "$BASE_REF" "$HEAD_REF")
          git checkout $MERGE_BASE

End-to-end examples demonstrating both “stable” and “latest” diff reports can be viewed in diff-report-example-playground repo. See also the pull request that shows the diff report in action.

Diff Reports in Practice: Backward Breaking Change Advisory

At Noom, schemas for messages that are sent between services are stored in Protobuf files. To avoid accidentally making breaking changes to these schemas, we implemented a diff report that analyzes changes to these files and notifies engineers if any of those changes isn’t backwards compatible.

We approached the problem by taking the following steps:

Transformation step: for both versions of the code (base and head), create a data structure that describes the schema of all messages we have stored.
Report step: compare the two schema data structures and identify changes that are backwards incompatible. Generate a human-readable report that gets posted to the PR.

Transformation Step

Protobuf has built-in utilities for creating a data structure describing the schemas stored in a directory, making this step a simple CLI command. If schemas are stored *.proto files, one can generate a Protobuf file containing a FileDescriptorSet object, which is a data structure that contains structured information about the schema.

protoc - descriptor-set-out=schema.pb ./**/*.proto

Report Step

With two FileDescriptorSets created, the next step is to compare them and check for breaking changes. We did this in two phases. First, we created a data structure that represented all changes between the two schemas, and second, we filtered out all non-breaking changes. Separating this into two actions allowed us to write simple code in the first step and keep the list of breaking change reasons in one place during the second step.

With a list of breaking changes in hand, the final step in this process is communicating them to the developer. We do this by posting a comment to the PR that lists the breaking changes along with an explanation as to why the change is breaking.

Outcome

With these visual alerts, it’s much easier for developers to understand the impact of the change they’re proposing, and edit their code accordingly before merging, or deciding that although the change is breaking it’s still safe to merge because there are no active consumers.

What’s Next for Diff Reports

For the Backward Breaking Change Advisory, we are working on making the report process more useful by including pointers to the tools that can help uncover all consumers of a specific message or an API endpoint that is being changed in a backward-breaking way. Here we are relying on static analysis (fancy name for code search) and HTTP logs analysis for API endpoints.

When it comes to general diff reports overall, we are closely reviewing our 5 whys documents to see where a code review missed some effects that could be automatically visualized. For example:

We already use diff reports in some of our infrastructure-as-code projects because it’s important to be able to see how resources will be changed.
We started applying diff reports whenever we have code that generates other code. We learned that reviewing both the source code and the generated code is useful to spot bugs.
We are hypothesizing that visualizing the resulting DB schema before and after the SQL migration is applied in a non-prod environment could avoid some outages.

About the authors

Nick Lemoing and Anton Grbin are backend engineers working on Noom’s API gateway team.