CERE — Configurable Experimentation and Rule Engine

Published in

TravelTriangle

6 min readDec 16, 2019

Code riddled with many if-then-else statements? Product managers need to test various product flows to check which will have more output, but the turnaround time is too much? If you and your team have faced these problems while developing software, then we were in the same boat until half a year back. We designed a solution to tackle most of these problems in an efficient way with low effort and turnaround time and maximum output. We designed CERE.

Why do we need it?

Experimentation
A good way to test what feature best suits your product is to experiment with various flows for that feature. At traveltriangle we are constantly exprimenting new product features against our users, testing time and again what will be the best for our users. The need to control all these experiments gave birth to the experimentation framework. An example would be a traveler puts a request for a trip on our website and we need to check whether we need to send an IVR call or a Whatsapp message and check his/her intent. Here the experimentation framework would help control which variant (IVR or Whatsapp) needs to be sent and if invoked already on the user, the same variant (IVR or Whatsapp) is applied to him/her.
Volatile Logic
Consider a simple case of coupon logic. I need to assign a coupon to a user who created a trip request on traveltriangle in November and is planning to go in December.
Another case can be that I’ve run a promotion for travelers going to Dubai, in January, with four passengers. A user requests the same type of trip and applies the coupon code given.
Now consider the above two cases with hundreds of different conditions and changing every week or month. This can be a nightmare for a software engineer.
Hence a need for a rule engine for handling these volatile logics arose.

How we came up with it?

Rule Engine and Experimentation Engine are pretty much standard designs in product-based software development, but we needed a system that will provide a smooth flow among various services.

For example, my payment service interacts with my rewards service and I need a flow where based on a payment received I can calculate the reward points which will be given to my agents or travelers. Information needed will be like — when the payment was received? Was it made through CC or DC or Netbanking? Which card was used Visa or Mastercard? And much more. But the calculations may vary from time to time and the rules will change constantly. Coding this will be pretty messy.

This is just one example of communication and product flow between payment service and rewards service. What if I need a similar flow between my core service and notification service or any other services.

Event-based communication is good, but it is even better if one service facilitates this communication and apply some logic to these events. The necessity of such a system gave birth to the idea of CERE.

Let’s get technical!

CERE Terminology

Context
Main object against which rules and experiments are processed.
Parent Context
Generally, the relationship between parent context and context is Parent Context has one/many Context(s)
In most cases, context and parent context can be the same
Input
It contains the stored as well as derived data for the context (Input and context, both are required in CERE). Input is all the data needed by the experiment engine and rule engine to process the output.
Identifier
Used to find out all the experiments for a given context and parent context. Other services will provide identifier in case they want to run the experiments
Rule Set
Collection of rules.
Variant
The output of an experiment. It’s a simple string like ‘test’, ‘control’, ‘A’, ‘B’
Action
Experiments will yield a variant and a list of actions and RuleSet will yield a list of actions. It is just a JSON with a set of keys and their values can be configured in rule engine and experiment engine. The structure is defined by an Action Handler where we can add as many keys as possible. We can then configure an action based on the action handler in experiments and rules. Each action handler also has a fixed category which determines in which queue the action is published in async mode.
Event Name
Event Name is mapped to identifiers and rule sets. When, in async mode, an event is triggered, then based on the event name, each identifier(experiments) and ruleset(rules) is processed.

CERE Components

Experiment Engine

Rule Engine

CERE Flow

CERE can be executed in two modes

Async
Event-based mode of execution which will interact with various functionalities of the system to provide a workflow solution to any problem
Sync
Direct call to CERE service to invoke rule engine or experiment engine

Async Mode

CERE consumes various events with homogenous structure from a Kafka queue and based on the mapping stored in CERE, it decides what experiments or rule sets need to be invoked. Actions (Output from Rule Engine and Experimentation Engine) and variants (Output from Experimentation Engine) interact with different services in our system to provide configured outputs like changing the status of a lead, making an IVR call, sending a Whatsapp message or sending an email.

Sync Mode

Rule engine or experimentation engine can be invoked in a standalone method where any service can invoke them by providing the necessary details like context and input. A prime example of this is, as given above, coupon logic where I just need to list all the coupons applicable on the trip lead based on user and lead attributes. Here only Rule Engine needs to be used which will have all the attributes of the lead and gives a list of all the coupons which can be applied.

How are we using it?

Currently, most of our experiments revolve around communication with our travelers using various platforms like email, IVR, SMS, Whatsapp. CERE plays an important role in deciding what platform to use for said communication. For example, a quote is created by one of the agents and we need to test which communication platform will decrease the time between quote sent and quote seen. CERE enabled us to divide our user base on certain conditions and check which communication platform will work best. We created an experiment that will be eligible for travelers going to Dubai. We created 3 variants (Test-1, Test-2, Control). Test-1 variant will be configured for the IVR call. Test-2 variant will be configured for Whatsapp. And Control will be configured to send an email(Default case, already happening). All these variants will have actions which will then communicate with the Notification service to send IVR or Whatsapp or email. Creating this experiment took 2 hours including setting IVR recording, setting a message template and every other work. Normally coding this experiment would have taken 2–3 days including testing and deployment.

Another example where we are using CERE is lead segregation. Let’s say we need to segregate leads for our agents based on lead attributes and user attributes. To put it simply I need a system where I can tag a lead under certain buckets to make it easier for agents to categorize their and work and divide their times accordingly. Now, this type of logic would have many conditions, and they will be changing from time to time. The rule engine in CERE helps us handle these kinds of volatile logic. But remember it is not a replacement for if-then-else. We need to remember that only changing logics that are governed by product and business need to go in CERE. It is not home for stable and rigid logic, because there is a cost in invoking a microservice.