Ramanujan Engine, Airtel’s answer to dealing with business decisions on Millions of records quickly! — Part 2

Ashish Santuka
Airtel Digital
Published in
7 min readMar 31, 2022

As the advent of digital transformation takes place, a lot of new systems are emerging at Airtel and similar organizations. There is a need to address business cases published by these emerging systems which require decision & action basis milestones. To stitch this issue “Rule Engine was conceptualized. The “Rule Engine” ingests data from multiple systems (Real-Time, Near Real-Time, Batch Mode) which helps in maintaining speed and agility against the frequently changing rules.
Rule-Engine can really be a game-changer in the following areas:-
1. Incentive Compensation
2. Gamification
3. Sales SR
4. Common Order Monitor
5. Lead Prioritization
6. Dispatcher
7. Risk Monitor
8. Next best action
9. Many more in the future ….
For all the above requirement rules are computed against transactional data which is online as well as offline. When such data are generated across systems business logic is applied with pre-defined criteria and appropriate rules are triggered. Once the rules are evaluated relevant actions are taken.

Recap:

In Part 1 of the two-part article series, we discussed the various approaches, which among those were feasible and reliable at Airtel and would help make smart business decisions over millions of leads. In this part of the series, we will discuss the details of the solution approach which will make this possible and explain its technicalities.

High-Level block diagram of Ramanujan Rule Engine:

As we can see, there are a few key elements to the Ramanujan Rule Management Engine :

1) Rule Management

2) Inference Engine

3) API

4) Rule Management Environment

5) Action Engine

6) Rule Base

Process Flow:

The Process flow diagram is explained below:

  1. Channel sends a request to calculate the total points accrued, this includes the previous and current online data
  2. Rule Engine API receives the requests and executes certain pre-validation
  3. Once the conditions are evaluated it loads the program rules which are available in-memory
  4. The wrapper identifies the appropriate program to be applied and collects the list of rules
  5. The core engine applies the rete method to apply the rule matches in parallel and takes actions accordingly.
  6. The core engine prepares the result and sends the response back to the channel.

Further Considerations:

1. When these rules are loaded, corrective action is taken unless there are no inter-dependencies. When there are rules that depend on one another that’s where we deviate those rules away from rete based approach and follow a traditional forward-chaining method only for those sets of rules. Fortunately, there are only less than 5% of rules which falls into such a category.

2. At Airtel, we prepared a responsive UI based Rule editor which will enable the Business users to understand the existing rule base, define the rules on the fly and easily configure the rules.

3. Generating the statement of those rules applied and their points accrued based on the data that is collected across systems.

Architectural Considerations:

After going through multiple architectures we at Airtel implemented the Active-Active site setup Architecture for our use-case since we required high availability:

1) Application is not exposed to the public Internet, only the Internal workgroup will access this service

2) From the Internal workgroup the traffic is routed to proxy

3) Based on the Active-Active setup the traffic is routed to the appropriate node

4) Based on the request and program, the calls are made across the server.

5) ProtoBufs have been used to interact with internal API calls between services.

6) Appropriate entries are made in the data

7) The data ingestion happens via Kafka and the same is inserted into the DB

Active — Active Node setup High-Level Architecture:

After carefully evaluating various distributed services architecture patterns we have decided to go ahead with the y-Axis scaling approach in which the application is split into multiple services. Each service is responsible for one or more closely related functions. Hence, our architecture closely resembles microservices architecture patterns.

Below is the architecture that is followed currently for the Active-Active setup

Data Ingestion Process:

  1. Data Ingestion is achieved using Open source Apache Kafka to capture real-time changes in source systems e.g. Order Stage Updates
  2. Multi-node Cluster Setup having multiple brokers
  3. Data is collected from different source systems(e.g. BCRM, HBCRM, Retail Portal, OE) via Kafka connect source Connectors and simple message transformations were performed at the app layer level
  4. Rule engine application subscribes to a topic to read data
  5. Topics data synchronized into the Rule engine database via Sink Connectors e.g. Employee data
  6. Kafka Streams for Merging of Topics and Transformation.

Language and Platform used to support the Stack:

Go

Performance — Go excels at concurrent programming and its highly optimized garbage collector helps to prevent memory leaks.

Statically Typed — Go is a statically typed language which means most of the issues are identified during compile time.

Code Formatting/Readability — Gofmt’s style is no one’s favourite, yet gofmt is everyone’s favourite. — Rob Pike

gRPC (Protobuf)

At Airtel, we use gRPC protocol for communication between our different microservices. It uses the Protocol Buffers binary format (Protobuf) for data exchange which enforces Go statically typed nature.

gRPC (Remote Procedure Call) is a data exchange technology that leverages the HTTP/2 protocol making it faster than REST. REST uses JSON or XML that requires serialization and conversion on both client and server, thereby increasing response time and the possibility of errors while parsing the request/response. However, gRPC strongly typed messages automatically converts using the Protobuf exchange format.

Docker

We are using Docker as our container engine. It is one of the most used container technologies. It helps remove the complexities of underlying OS architecture and allows us to focus on the actual logic and code thus resulting in rapid development and deployment cycles.

Kubernetes

We are using Kubernetes as our container orchestration engine. Kubernetes provides a ton of features out of the box like load balancing, replication, self-healing, and auto-scaling.

Logging and Monitoring

Apache Kafka

We are using Kafka’s messaging queues to digest logs from our containers. This helps in decoupling the load of writing logs to a file from the microservices and supplies a centralized place for logs from different nodes and microservices. Each log message is included with all the information to pinpoint the originating microservice, line of code, request.

Filebeat + Kibana + Elasticsearch

Logs from various sources, like microservices, docker, and Kubernetes, are all ingested by Elasticsearch. This allows us to search and filter our logs using Kibana.

Monitoring

Kubernetes does an excellent job of auto-healing when an inconspicuous error occurs. When pods crash for any reason, Kubernetes just restart them. We still monitor our Kubernetes cluster by using the Kubernetes daemon stats. These stats are sent to ELK stack and custom dashboards are created in Kibana to keep a check on cluster node health and utilization, container crashes and resource usage, and each microservice’s stats.

Implementing CI/CD processes

Since we follow Agile methodology there is a need to streamline the CI/CD process.

We have leveraged the Open-Source packages that are available in the market to manage them.

  1. Jenkins
  2. Git
  3. Kubernetes

Automated Deployment — CI/CD pipeline using Jenkins for automated deployment from DEV to SIT to PT to Prod.

Blue-Green Deployments

To deploy the updated version, new Pods are created in the cluster but Kubernetes services still point to the existing version till sanity is completed.

Both, existing and new, versions exist at the same time.

After some time of stability in production, earlier versions of pods are removed.

This results in zero downtime deployment with an effortless way of rolling back to an earlier version in case of any issues.

CI/CD Pipeline

With the Blue-Green deployment strategy in place, we were able to streamline the complete build pipeline.

The latest version of code is automatically picked from the deployed Bitbucket branch, and a new Docker image is created and pushed to a local image repository.

Then the image is deployed to the Kubernetes cluster according to the Blue-Green strategy and the load balancer backend is updated accordingly.

Conclusion:

With this, we at Airtel achieved the business objective to have an advanced Rule Engine system that is built on artificial intelligence — Knowledge reasoning principles with an open-source stack. This finally solves the question of dealing with business decisions on Millions of Records quickly.

--

--