Eliminating human error in legacy systems using Configuration as Code and Behavioral Tests

Tzach Zohar
skai engineering blog
7 min readFeb 7, 2024

In this post, we’ll share a recent success story of migrating a legacy DB-based configuration store to Git. This transition made it safer and easier to introduce changes to this critical developer-generated, often-changed configuration, by auditing it, applying a set of rigorous automated tests to it, and no longer allowing human error to enter the equation.

Legacy

The system in question is an internal web scraping system, which underwent a drastic change in how it’s used: while originally built to support usage by analysts to produce internal data on a quarterly basis, we decided to employ this useful capability to produce critical client-facing data in short cycles and with high throughput.

The legacy system, while not robust enough for these new demands, was luckily rather flexible: it allowed users to define a “Source” — a complex JSON object describing the page that should be scraped, the way to obtain it, “selectors’’ to extract specific fields, and some formatting and normalization rules. So new Sources could be added to scrape new sites with no code changes. However, these Source definitions were rather sensitive (mistakes would render them invalid) and far from static, as the pages they describe can change often, requiring a change in the Source configuration. Another reason for Sources to change is the not-uncommon addition of a new field selector.

These Source configurations were stored in a simple model (2–3 tables) in a relational database. The records in these tables were modified via a homegrown user interface providing simple CRUD operations over this model, which gradually became less and less usable, driving developers (with DB WRITE access) to the “bare hands” approach of using UPDATE commands executed directly over the DB.

Configuration used to be updated either manually or via a legacy UI
Configuration used to be updated either manually or via a legacy UI, and read by the scraping system

As long as this system served its original internal purpose, this wasn’t a critical issue. Mistakes, if they happened, could be fixed once the analyst reviewing the data caught them. But once we started onboarding usages generating critical user-facing data daily, this approach couldn’t hold. We started seeing too many incidents of bad data (or no data) reaching our clients due to misconfiguration. It was clear that the process of adding or updating Sources needed way more guardrails — including robust automated tests applied before changes take effect. What should those guardrails be, and how can we introduce them without overhauling a large legacy system?

Alternatives

Once the issue of unintentional or erroneous configuration changes became apparent, we considered a few solutions:

  • Resurrect the UI: The old UI included some basic validations, and it made it much less likely for the wrong records to get updated. However, the codebase for the UI was old and hard to manage, making it difficult to add further necessary validations
  • Move the configuration out of the DB: working directly over a relational database isn’t ideal: it doesn’t provide very good auditing, undo capabilities, diff views, etc.. And yet, moving the configuration out of the DB would require a rather large refactoring effort in the legacy system reading it
  • Improve the Source data model: one cause for errors was the fact that the Source configuration data model was itself not optimal — as any model evolved over years tends to become. But changing the data model would be costly for the exact same reason the previous option was disqualified — it would incur significant changes to the legacy system

These considerations led us to the following “requirements” from our ideal solution:

  1. Full auditing capabilities (who made what changes, when and why, plus easy rollback)
  2. Full validation capabilities before applying a change
  3. Low impact on the legacy system consuming these configurations

Git-based Solution

All of these pointed to an almost-obvious solution: Source configuration should be source controlled (no pun intended). Using source control for configuration (on top being an absolute standard in managing codebases) has become ubiquitous in the relatively recent transition to “configuration as code” and GitOps approaches. As a company, we’ve already transitioned all of our provisioning configuration to code, so it’s only natural to apply it to this case too:

  • We’ll create a new GitHub repository to store the Sources configuration (represented as yaml files)
  • We’ll block changes to the main branch and only allow changes through Pull Requests (as we do for all of our codebases)
  • We’ll build a Jenkins job that tests each pull request (upon creation or change) and validates the change
  • We’ll build another Jenkins job that “releases” any change merged to the main branch, by reading the yaml files, identifying changes, and updating the same DB where the Sources are currently stored

In other words — we’ll build a CI/CD pipeline identical in concept to what we build for any system we deploy — where automated tests enable continuous delivery with no manual testing and low risk. The “release” would update the existing DB from which the legacy system reads.

Our git repository now becomes the source of truth for Source configuration, and its release job the only way to update Sources.

Test Implementation

Moving the JSON-based Source configurations to yaml files stored in Git was relatively easy. The tricky part was designing the tests: we want a fast, stable, reliable way to test a changed Source, which won’t be affected by external changes (e.g. the scraped website was modified). On the other hand, we do want some way of catching such external changes when they happen, as they are just as likely to break our code. We’ll need to have some minimal periodic “smoke” test for all Sources, as well as specific and robust tests for every changed Source.

To account for that, we ended up defining two separate test suites:

Nightly Tests:

  • A suite of real-world end-to-end tests that scrape a single page per Source, and validate the response’s structure — i.e. that all fields are present
  • This suite should catch errors arising mostly from changes to the scraped website, not configuration changes
  • This suite is executed on all Sources nightly, and failures trigger an alert to the owning team

Release Tests:

  • A faster and more reliable suite of tests that use stubbed HTML pages in place of actual scraping, thus avoiding slowness and instability
  • For each Source, developers must provide such an HTML sample, as well as the detailed corresponding expected scraping result
  • The test will run the (real) scraping system while stubbing the HTML page, and verify parsing results (based on the Source configuration) match expectations
  • These tests are executed on every Pull Request for modified Sources only, providing the author of the change immediate feedback
  • These tests are also executed during release, i.e. just before updating the actual production DB with the new version of a modified Source configuration
Our Jenkins jobs: release and pull-request running “release” suite; Nightly runs the “nightly” suite

This separation into two test suites — which differ in goal and impact — proved to provide a good balance between coverage and costs. Tests running on PRs are quick enough to provide immediate feedback and don’t break due to external changes; Tests running nightly run often enough to catch issues before they become incidents.

In practice — both suites are implemented using Python’s most popular behavioral test tool — behave, where each Source has a single Feature file with separate scenarios annotated by nightly and release tags. Here’s an abbreviated example of such a test:

Steps like I expect the following fields to be numeric were implemented once and can be easily reused. Such decoupling of the “what” from the “how” is one of the benefits of behavioral tests, and results in very readable and easy to write tests.

To run each suite separately, we’ve wrapped the behave command with simple invoke tasks that we can easily call from our Jenkins jobs:

Results

Transitioning from a quick (and risky) manual approach to the slower-but-steadier fully tested Git-based approach wasn’t without friction. For simple changes, the new approach felt like it added a lot of overhead, especially in the early days when the tests were not as stable.

However, after stabilizing the tests and getting used to the safety this approach provided, developers on the team could implement Source configuration changes much more freely and confidently. Incidents related to misconfiguration dropped significantly. The ability (in fact, requirement!) to test every change made these changes feel “normal”, like any other code change, which is the exact goal of the Configuration as Code approach. A few weeks’ effort made this burning issue into a thing of the past.

Conclusions

Our key takeaways from this process:

  • We can benefit from Configuration as Code without having to make large-scale changes to legacy systems
  • It’s enough that Git becomes the source of truth for the configuration, without necessarily replacing the existing configuration store entirely
  • Separate failure scenarios justify separate and different testing approaches

With these conclusions in mind, we’re constantly looking to eliminate the few remaining instances of manual configuration changes across our products.

--

--

Tzach Zohar
skai engineering blog

System Architect. Functional Programming, Continuous Delivery and Clean Code make me happier. Mutable state and for loops make me sad.