Introducing Sentinel: Cross-Platform Test Metrics Framework

Published in

Airtel Digital

6 min readMay 21, 2020

A single framework for all test frameworks!

At Airtel, innovation has no bounds, and when it comes to technological adaptations, we proudly use and support open source solutions. Henceforth our test engineers use many different test frameworks for frontend and backend automation. With such a vivid stack of test frameworks, it is hard to create and converge the reporting part of them to a “single dashboard” to map, track, and monitor automated test executions and their deviations with earlier executions and indeed across different rotation matrices. For that to occur there has to be some homogeneities to be followed across all the test framework, i.e.;

Structure of logging
A homogeneous test run lifecycle
A centralized storage

However, we didn’t want to force our engineers to adhere to any form of technical norms or structure.

So, what’s the solution?

Sentinel, that’s it!

The idea

A holistic logging framework for automated testing that sits behind like the Éminence grise, this framework supports all test frameworks irrespective of programming languages and require minimal to no change in existing test frameworks. Moreover, it also caters to logging requirements of UI (Web and APP) as well as API automation test executions, again with no to minimal alteration in existing frameworks.

Some salient features of it are:

Centralized reporting portal for different types of automation and executions such as APP, API or UI.
Provide quantifiable data for releases to check release quality.
Facilitate comparison of multiple test runs/releases.
Live tracking of automation executions.
Direct integration with our canary release

The Architecture

To support easy integration and centralized reporting, primarily 3 components are required: facilitation to capture test framework events (lifecycle), a mechanism to send and store logs/events at a centralized junction, and a comprehensible user interface to display that caters to the above-stated requirements — a standard 3 tier application.

To begin with, we needed a way to capture the test run lifecycles, and hence we created subscribers for each of the test framework which is used in our engineering such as TestNg, JUnit, PyTest. These subscribers are a one-time job, and their only role is to subscribe to the events of the test execution lifecycle and report it to sentinel core mediator. Each test framework in the market exposes these (e.g. TestNg) lifecycle events as an interface, and it’s quite easy to create these subscribers. Since these are created in the ecology of the respective test framework itself, these are tightly bound to the test framework and adhere to the principles of that very test framework environment. For test frameworks that do not expose these events can rely on a RESTful service that does the job for them.

Secondly, to build a centralized reporting portal. An extensive logging structure needs to be created. We can achieve it by effectuating a schema which has a holistic enough approach and can cater for various log items as these can contain text/strings/XML/JSON in case of API automation and screenshot in case of UI automation and different fields accordingly. It is accomplished by leveraging the schema registry from the confluent stack which exists precisely for this purpose and provides multi-version support for schemas.

Now to exude these structured logs, maintaining the state and hierarchy of test execution events (running in the multithreaded model) and accordingly trim and format these log items is the job taken care by a mediator which also facilitates other validations and utilities. However, this mediator primarily holds the entire execution state and thread contexts as it directly consumes the events from the subscribers and eventually pushes these events and log items to schema-less storage via a queue which facilitates high throughput, scalability and elasticity requirements. The states of execution are pushed synchronously as to provide real-time display of states on the portal. However, the operation to push logs within a specific test case/suite are asynchronous and in bulk. This mediator removes the dependency of creating such support in the test framework or its subscriber itself and making it as an isolated entity makes it easy to maintain and congregate the core logic to a single module which then can be coupled with any test framework loosely. Moreover, as we benefit from this loose coupling, we are directly consuming this component in java based subscribers, and as for non-java based frameworks, we are consuming this over a RESTful service we wrapped in under

The Technology Stack

Given the DevOps model and our Canary deployments the pace and scale at which we release have been shooting up from last year and test automation running every day is a constant challenge to keep track of. On an average, there are 500+ test cases in test execution, with each at least 20+ log/test items per test case and nearly all releases taking place within 12 PM-5 PM. Thus, the choice of our technology stack mandates us to be elastic; fault tolerates easily scalable and with minimum maintenance. So as for our choice of a queuing system, we had two options RabbitMQ and Kafka. And we choose to go with Kafka for below reasons:

Kafka works as a phenomenal schema-less storage apart from the queuing system without performance penalties with its sequential read-writes.
Kafka with a confluent stack also provides a schema registry with versioning which serves us rightly about choosing between enforcing and loosening the standards and also help us reduce the effort of message validations pre/post-delivery.
Confluent-Kafka also serves us easy integration connectors which saves our efforts for message processing with some mere rest API calls.

Additionally, data in Kafka resides in raw bytes and is “type free” which then needs to be consumed and finally converged for our final goal of a single dashboard. For this, there needs to be an engine that can:

· Consume and store data schemalessly (any No-Sql Db would do)

· Provide rest services which can be stitched with a UI dashboard

· Capable of working in distributed data sets

· Highly available

Elastic search is our best choice for merely three reasons Kibana, scalability, searching and schemaless-ness. Kibana provides us with a beautiful user interface to visualize our data in almost any form and an aspect of saving us from the efforts of a custom user interface that needs coding. Additionally, ES being robustly fast (the most critical part) and schema-less, which requires no maintenance over it with our changing types of data on the producer end. Moreover, the blazing-fast searching over myriad distributed data makes it just the correct choice for our database. Finally, ES being able to stitch with the confluent stack becomes a cherry on the top for us.

Furthermore, Kafka and ES goes along with our engineering ecology. Which means monitoring and alerting for us is already taken care of by our in-house alerting ecosystems since developers also use these stacks.

Conclusion

Thinking out of the box pays out.

With Confluent-Kafka and ES, we can provide a highly scalable, fault-tolerant and robustly elastic test metrics framework that beautifully plugs-in with almost any kind of framework in the market without any structural or code-level changes to existing automation frameworks, put in some properties files and dependencies and that is. Moreover, to build the entire framework, it only requires two components to be coded in all, the subscribers and the mediator (3 if you consider the RESTful layer a separate) of which only the mediator requires some maintenance and subscribers are only a one-time job. On top of that, the stack used exposes metrics over JMX, which enables direct integration with our existing alerting framework over time-series databases.

Future Plan

However, there are some improvements we still wish to achieve. For instance, like provisions to support video recordings, integrations with 3rd party services like Jira, Slack and many more.

Thanking Anirudh Bhardwaj for guidance and support.

Introducing Sentinel: Cross-Platform Test Metrics Framework

Written by Ayush Chaturvedi