How the Client Configurator team ensured zero rollbacks for two years!

Clearwater Analytics Engineering
cwan-engineering
Published in
5 min readJul 28, 2022

Introduction

The Client Configurator team is one of the initial development teams formed in our Noida development centre. The team currently owns a few internal tools, one of the core services of our platform with throughput of more than 40k rpm, and a tool for forecasting our clients’ income. The focus of this article is to share some of the best engineering practices we have followed that ensured rock solid production deliveries.

As a case study, we will see how the practices were applied to Client Configurator, an internal tool used widely by our Global Delivery and Client Services team for onboarding customers and their accounts.

The Journey

Two years back when we took ownership of Client Configurator, the initial codebase lacked automated testing and used manual QA to verify features before every production deployment. To improve the state of our system, instead of doing a big-bang improvement in one go, we did several incremental changes over a period which helped in gradual improvement in our development lifecycle.

Quality is Everyone’s Responsibility

To improve the stability of the system, we started with following team rules:

  • Pull requests were accepted only if they accompany unit tests for the changes made.
  • Everyone on the team (and not just the QA engineer) will verify a few features of the application, and thus collectively as a team we thoroughly verify the complete application before every deployment.

These changes helped us in gradually improving our code quality and brought a sense of responsibility towards the quality of our deliverables. This was a step in the right direction, but our destiny of complete automation was far away!

Automation

As we were largely doing manual testing, there were chances of bugs creeping into the system. As a next incremental step, we decided to take a stab at it. Client Configurator has a diverse UI that takes care of several complex use cases. In the backend it is powered by several webservices. To reach our goal of complete automation in a quarter we did following:

  • Ensured a minimum of 80% code coverage for any new code written
  • Created test automation suites for our UI and backend services

This ensured the whole application is tested in a holistic way.

UI Test Automation

Our UI application is implemented in Angular; it’s a diverse UI with a wide array of components being used. We used Cypress to build our end-to-end UI automation suite. As part of this suite, we try to emulate a real user and perform every single operation that is possible. In addition, our suite also contains several erroneous scenarios which helps us in verifying that users are informed with appropriate warnings and error messages. Due to the diverse nature of this suite, it normally takes approximately 30 minutes to execute.

API Test Automation

To thoroughly test our backend services, we leveraged Postman to create a diverse API test automation that verifies every endpoint of our services. Currently we have approximately 500 test cases as part of this suite, through which we try to cover both positive and negative scenarios (wrong inputs). In addition, we also have a separate stress test wherein we verify against bulk inputs. A typical run generally takes 6–7 minutes.

As we continue to add a lot of features and fix issues, it was extremely important that we keep both our automation suites up to date . To ensure this is consistently practiced, updates to our automation suites is part of the definition of done for every single story that we work upon.

Branching Strategy and Early Error Detection

We typically deploy every month and hence follow this branching strategy:

  • Master — This is our long-lived branch. For every production release we create a new tag.
  • Integration — As we do a monthly deploy, after every deploy we create a new integration branch from “master.”
  • Feature — For every feature that is being developed, a branch is carved out of the “integration” branch. This is a short-lived branch which gets deleted once changes are merged into integration.

To ensure that we detect errors early, we have made automation suites as part of our CI/CD pipeline, wherein before any new code is merged into our integration branch, the code of the feature branch is deployed to our test environment and both the automation suites are executed . The build fails if either of the automation suites fails. Thus, any success implies code is built successfully, all unit tests have passed and finally both automation suites are successful. This ensures no bad code enters our integration branch, guaranteeing quality of our deliverable at any point of time.

With automation in place, human verification is no longer required before deploying in production.

About the Author

Anuj Mehta is the Software Development Manager of the Client Configurator team. Over the years he has created several enterprise level products. He likes to learn and apply new technologies in real life projects. In his free time, he enjoys reading books and playing badminton.

--

--