Validating Environment Stability with Keptn

Andreas Grabner
keptn
Published in
4 min readJul 19, 2021

Like many other organizations, Vitality Group International, Inc. (“VGI”) is building, deploying, and operating true multi-tenant software systems for their global customer base. New builds are pushed through different stages via automation, and in each stage, different tests are executed before the build is promoted to the next stage.

But automation is more than deploying containers into a k8s cluster. It also includes tenant-specific configuration and executing tests that cover the use cases specific to that tenant, e.g.: testing that new feature only available on Tenant 123.

But when testing is over, the next important step is Validation! (And that’s where Keptn comes in)

Keptn automates validation across multi-tenant test executions

VGI adopted the CNCF open-source project Keptn already in its inception days to automate quality validation. What caught their eye to the project was the SLO (Service Level Objective) Quality Gate capability which automates the validation of SLIs (Service Level Indicators) as part of continuous delivery. This capability addressed the need of VGI to expand their automation to also include automated quality evaluation!

To start with Keptn Quality Gates one must define which SLIs (metrics) are important and against which SLOs (objectives) they should be compared. While Keptn allows users to define SLIs and SLOs through YAML files, VGI decided to take a more visual approach: SLO dashboards.

The following screenshot shows the Dynatrace SLO dashboard that was built by the performance engineering team. It contains all relevant metrics to be analyzed automatically after each test run for a specific tenant under test:

Defining SLOs visually through a dashboard is a popular option Keptn provides

The name of the dashboard automatically links it to a Keptn project, service, and stage. This allows Keptn to automate the analysis instead of having somebody manually looking at the dashboard after every test run.

Now let’s have a look at how this looks like in Keptn: Quality Gates in Action!

The following screenshot shows one of VGI’s Keptn projects which is used to automate the validation of tests executed against a number of tenants in a shared test environment. These tests are executed across all tenants whenever a new build is deployed by their Deployment pipeline into the test environment. Keptn then gets triggered after the tests are executed to automate the evaluation based on the SLOs defined in the dashboard shown above:

Keptn automates the evaluation of SLOs across a number of tenants after deployment and tests finished in a shared test environment

If you look closely, then you can see that all the previous tests on that particular tenant failed Keptn’s SLO evaluation. A closer look at the SLO Evaluation Heatmap in Keptn gives more details on which SLOs are failing and contributing to the overall failures:

Keptn automatically evaluates all SLOs per tenant and aggregates them to a total SLO Score

Besides the heatmap, Keptn also provides detailed metrics per evaluation in a table overview — making it easier to understand which SLOs have failed which criteria:

Each individual SLI value and why it failed the SLO validation is shown in the details view

By seeing these results, you may be asking yourself: “So… Keptn basically tells us everything is red all the time? But — what is the reason behind it and how can we get green builds?”

With Keptn towards more Stable Environments

When I was talking with Vitality Group about Keptn, they shared that their big A-Ha moment was to realize how cross-impacting their multi-tenant environments are. Remember: all tenants share the same deployed services, but each tenant has a different configuration and a different use case. One bad feature in one tenant can then impact all other tenants as well. This is exactly what Keptn highlighted — in a fully automated way. Deploying all changes of all tenants in a test environment and then running all tests will always result in an unstable environment and therefore in failing tests.

With these insights, the team is now moving quality evaluations even further left in their delivery pipeline. They will bring Keptn into the QA environment where individual tests against individual tenants are executed. This enables them to automatically detect any regressions on individual tenants, features, and use cases before promoting those changes into the larger test environment. This will help reduce volatility, lead to more stable environments and better production deployments.

If you are interested in exploring Keptn check out our public demo environment or run through one of our Keptn tutorials.

--

--