Using Continuous Integration pipeline to guard against database performance regressions

Maciej Brencz
Bethink
Published in
5 min readFeb 15, 2021
Database performance regressions tests in action

Nowadays CI pipelines are a common practice. Most of us perceive them as a set of tests, linters and other type of checks run against the code. Even ThoughtWorks mentions the word “code” in their short description of what Continuous Integration is.

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early. By integrating regularly, you can detect errors quickly, and locate them more easily.

Martin Fowler adds that “Continuous Integration doesn’t get rid of bugs, but it does make them dramatically easier to find and remove.”

Is it all just about the code?

Is the code the only measure of the quality of changes we’re merging into the main branch of the repository? Can we introduce regressions other than bugs in the feature? Can we impact the performance with changes other than those made in the code? Sure, we can. And we often do.

Few years ago in one of my blog posts I mentioned that the code is not the only factor we should have in mind when thinking about delivering and maintaining the software:

When we think about software we tend to look from a developer perspective and focus mostly (if not only) on the code. But what IT is all about is processing the data. Code is just the tool and a way to describe the process.

So the question is not “should we”, but “how can we” check other aspects of the software we build. Namely, database tables schema and SQL queries performed. In the repeatable and automated way.

Database checks as a part of your CI pipeline?

So, you already have your integration tests set up and running on every pull request. You use GitHub Actions to set workflows easily. When one of the test cases fails, you get notified immediately.

What if you’re introducing a new database table or modifying an existing one?

What if you’re adding or changing an SQL query?

How can we make sure that they do not turn out to be a change for worse?

We usually find that out when it’s too late and sub-optimal queries are causing problems on production.

The story of Chronos

My team is currently working on a service called Chronos that aggregates user activity on our e-learning platform. We make quite precise measurements here — three weeks of users activity is kept in ~480k rows in MySQL database. We want to have an access to per-day statistics from a few perspectives. With such a volume of data to query through (these numbers are for one week only and will only increase over time) database structure and performance plays a significant role.

Chronos has an extensive set of unit and integration tests. We insert fixtures into the test database and assert the results of queries run against it. Hence, thanks to the tests we have a tiny version of production database (the same schema, but way less rows) and queries collected in the CI environment — namely GitHub Actions.

Initially, we decided to have just a primary key in our tables. And as we get closer to the release date, we’d take a closer look at the database queries and improve their performance. This can be performed manually using Percona’s Toolkit or index-digest. Few hours of adding, and changing indices and you should be done. But then, what if we decide to add a column or modify the existing query? Should we perform this manual labour over and over again?

As mentioned above, we already have all the required pieces on the CI environment — the database itself and the queries. So, why not automate the entire process of “the database performance review”?

Database performance regression testing on GitHub Actions

The idea behind the index-digest action is to allow you to lint your database schema and queries you perform from within your GitHub Actions workflow. There are just a few requirements:

  • You keep your test database instance up and running after you tests suite completes. Ideally with some meaningful data.
  • Your application collects SQL queries log that you can provide index-digest action with. Alternatively, you can use MySQL slow query log and make the database collect logs for you.

With these at hand you simply add and set up this step in your workflow:

    - name: Install and run index-digest
id: run-index-digest
uses: macbre/actions-index-digest@0.7.0
with:
index-digest-version: "1.5.0"
index-digest-dsn: "mysql://test_user:test_password@127.0.0.1:3306/test_db"
index-digest-sql-log: "/tmp/queries_log.sql" # use an absolute path here!
index-digest-report-file: "./index-digest-report.yml"

This will fetch and run the Docker container image of index-digest. Provided MySQL DSN address and an optional SQL queries log will be used to perform various checks on:

  • tables schema (are there primary keys defined? are there any redundant indices),
  • the queries itself (do they use indices? do they perform full scan / file sort / use temporary table?)

YAML report will be saved to the provided file. Additionally, the above step will set the output data (steps.run-index-digest.outputs.number-of-issues) that you can use to make an assertion later on:

    - name: Verify index-digest results
if: steps.run-index-digest.outputs.number-of-issues >= 0
run: |
echo "::warning::index-digest issues reported: ${{ steps.run-index-digest.outputs.number-of-issues }}"
echo "::group::index-digest report"
cat ./report.yml
echo "::endgroup::"
echo "::error::index-digest database performance check failed"
exit 1

With this step added we can easily make our the CI pipeline not only check our tables and queries. But now it can also block the pull request from being merged due to database performance regression. And thanks to index-digest we get suggestions how to improve our schema or queries.

Failed index-digest action as reported by GitHub Actions

Our database performance regressions testing pipeline is still in its early days. There are obviously some trade-offs — we do not perform these tests on production-scale data nor in the production environment. However, a small fraction of data can still provide a quite good proving grounds to test queries against. Even if this does not cover 100% of cases that can cause performance issues on production, something is still better than nothing.

We’ll keep you posted on how this pipeline serves us as we develop Chronos and other services at Bethink.

--

--

Maciej Brencz
Bethink
Writer for

Poznaniak z dziada-pradziada, pasjonat swojego rodzinnego miasta i Dalekiej Północy / Enjoys investigating how software works under the hood