Building a Rule Engine
At Disney Streaming, we take our subscriber user experience very seriously. Our service is serving over 100 million subscribers across the globe. We want to give our subscribers the best experience when watching our content and create a low-friction monthly payment process to streamline the user experience when using the platform.
The payment team at Disney+ is responsible for designing and maintaining a payment platform that enables efficient, low-friction, high-approval-rate monthly and annual transactions for 100M+ global customers. Our team also helps to improve the subscriber experience for Disney Streaming by optimizing payment processing.
This post details how the payment team designed and implemented a rule engine framework to de-structure complex business rules for reducing payment failures at scale.
Before we dive in
The article tries to summarize the problem from a high-level point of view but describes some of the implementation decisions, which assume some knowledge of the Scala programming language, and of pure-functional constructs provided by libraries such as cats.
The Problem: Sporadic Payment Failures
Subscribers to Disney+ are billed on a monthly or yearly basis. There’s a myriad of reasons for which a recurring payment could fail, such as insufficient funds on the balance account or the expiration of a credit card.
When a payment fails, a subscriber will log in to Disney+ or ESPN+ only to discover that they cannot watch content. This is a poor user experience. Therefore, our job is to do our best to prevent this from happening by retrying these failed payments.
The initial implementation of the retry logic was the same for all subscribers. However, as Disney+ progressively launched in new locations across the world, it became obvious that the “one size fits all” logic was suboptimal, and that we needed to experiment with various implementations, by leveraging user-specific information customized for each account.
For example “if the user is in country X, retry next Monday”
The challenge of maintainability
With the ongoing experimentation on various retry logic implementations, the complexity of maintaining each business rule increases exponentially. Some parts of one implementation might overlap with others and engineers are burdened with rewriting these parts, which can take days of testing and numerous service releases before implementation is adopted.
To solve this problem, we created a microservice that is responsible for optimizing recurring billing failures.
The service embeds a smart-retry mechanism that utilizes invoice information, such as the subscribers’ home country, in order to optimize recurring failures.
We defined the core principles of the service:
- Real-Time Evaluation: We want to create rules to evaluate each retry schedule in real-time.
- Make Decisions Based on Previously Computed Values: The system needs to support expressing decisions based on dynamically computed values. For instance, having rules that mention, “If the transaction belongs to a Disney+ subscriber, the system chooses to use execution A. If the transaction belongs to an ESPN+ subscriber, the system chooses to use execution B.”
- Extensibility: With increasing experimentation on various business rules within a cross-functional team, and the abidance by A/B testing, developers need to be able to easily add and remove new rules from the system.
- Auditability: The ability to track which retry schedules lead to better user authentication rates and re-play the sequence of operations for troubleshooting.
A Custom Rule Engine Framework
Building a scalable optimization recurring failure platform that allows for rapid addition and changes to rules with minimal engineering effort can be challenging. It is helpful to abstract complex business rules in a framework that lets engineers quickly evolve and pivot.
We created our own internal rule management and evaluation library. Our custom rule engine lets us organize complex business use-cases into composable rules.
The rule engine relies on declarative composition as we found it was the best way to accurately reflect business use cases. In brief, the rule engine allows the developers to declare “What to do?” instead of “How to do it?” It allows the developer to clearly see “How the system came up with this solution?” and “How was each ‘decision’ made?”
In the next sections, we will describe the individual components that form the rule engine.
Philosophy Behind our Rule Engine Library
This section will briefly summarize the engine’s underlying technology and philosophy. It skips over each component’s code implementation details.
We adopt the functional programming paradigm to specify our business actions declaratively and use Scala for implementation. Once developers have de-structured the sentences into the rule engine’s semantics, called components, they can implement each component’s logic with the library. Developers will need to fill in the blanks on each of the components (RuleF
, RuleBuffer
, and RuleNode.)
The RuleBuffer serves as the state that can move from one rule to another. During run-time, each RuleF executes an action and changes the state of the RuleBuffer. The RuleNode executes the result tracker to append a new log from the execution results. Then, it relays the updated RuleBuffer and the result tracker to conditional anonymous router functions, which decide what RuleNode should evaluate the ongoing transaction next.
We abstract complicated instructions into a succinct pipeline using the Monad abstraction to accommodate dynamic sequential operations.
Through this approach, we can avoid any unexpected side-effects in conducting each of the rulesets. Furthermore, the Monad abstraction helps to unify and abstract away boilerplate code needed by the program logic.
RuleF (Rule Function)
Rules define the logic for executing a single business action for specific input. A RuleF
is a case class
which defines the logic for executing a single rule against a specific input type.
case class RuleF[F[_]: Monad, RB, D: Monoid]( ruleDefinition: RB => F[RB], traceDefinition: RB => D)
It consists of :
- The rule definition, which can be the main business logic of the application or a series of actions to execute when certain conditions are met. It is a function that takes in a
RuleBuffer
(which we will explain in the next section), and returns an updatedRuleBuffer
. - The trace definition tracks all the intermediate results for auditing purposes. It is a function that takes in the updated
RuleBuffer
and returns updated trace results.
The rule engine will evaluate RuleF, execute its rule definition, and provide a tracking mechanism by evoking the trace definition. RuleF will automatically evoke the trace definition and combine the current function execution into the existing tracking system.
As an example of what a RuleF could be, in the context of payment retries, we might want to avoid retrying on a Friday if the current transaction is from France (because of how banking works there). RuleF will consist of the logic that will mark Friday as a deny-list. The result tracking system can be retrieved at the end of the ruleset evaluation.
A code example of Do not Retry on Friday RuleF
:
val doNotRetryOnFridayRule =
RuleF(
ruleBuffer =>
denyList(ruleBuffer,Friday),
updatedRuleBuffer =>
List(s"updated ruleBuffer ${updatedRuleBuffer}")
)
RuleBuffer
To allow the model to transition from one rule to another, the RuleBuffer class is used to encapsulate data flow between one rule and another. It provides an abstraction to unify the input and the output model so that each RuleF
gets the required model to execute business actions.
A RuleBuffer
object is passed between nodes during the evaluation of the rule. The node updates the state of the current evaluation by performing a shallow copy of its received RuleBuffer input.
At the end of the execution, we can transform the exiting RuleBuffer into the desired output model.
The library provides a BufOperation
interface that developers have to implement.
trait BufOperation[RB, D, O] {
def build(buf: RB, d: D): O
}
The build method
above takes in the RuleBuffer
of type RB
and the trace result of typeD
and it returns an output of type O
.
Developers are able to use any shape as a RuleBuffer
— as long as they implement the interface above. The diagram below illustrates high-level concepts on how RuleBuffer
is used in the Rule Engine.
RuleTree
RuleTree
is an interface that encapsulates the routing component that helps the users to define clear setting conditions in the computation model. RuleTree contains a number ofRuleF
instances.
Instead, Developers can create more complex rules (including conditional ones) by composing existing rules and leverage pattern-matching for dynamic routing. As a hypothetical scenario, the business rules, “Do not retry on a Friday if the current transaction is from France. Then, mark the first and second day of the month as the retry day.” We will decouple this business logic statement by implementing it as two RuleF
encapsulated with two RuleTree
.
val retryOnFridayRule = ??? // ruleF definition
val markFirstSecondRule = ??? // ruleF definitionval ruleTree = RuleTree(doNotRetryOnFridayRule) chain RuleTree(markFirstSecondRule)// chaining with certain condition ExampleRuleTree(doNotRetryOnFridayRule) chainWithCondition{rb =>
rb match {
case RuleBuffer("sunday") => someOtherRuleF
case _ => someOtherOtherRuleF
}
}
Let us call the first RuleF
as RetryOnFriday
, which will mark Friday as the deny-list, and another RuleF
as MarkFirstAndSecond
, which will mark the First and Second of the month as the allow-list. Then, RuleTree
calls a conditional function that triggers one rule to the next. Together, RuleTree
takes in RuleF
as the corresponding action to execute after the condition is met.
RuleEngine Package
RuleEngine manages multiple RuleTrees using a key-value store. A collection of these key-value stores is called a Forest.
val forest = Map(
"1" -> ruleTreeOne,
"2" -> ruleTreeTwo,
.... and so on)// getting specific ruletree
val ruleTree = RuleEngine[Map,String].get(forest)("1")// running that ruletree
ruleTree.run(rb = exampleBuffer, traceInit = List.empty[String])
The container that contains all RuleTree
, a.k.a Forest, is generalized. Therefore, RuleEngine
requires to specify the type of Map
in the function definition. This allows developers to seamlessly substitute in-memory stores for a persistent store when they want to change between performance and durability tradeoffs. The developers can easily interact with the engine by providing input and selecting a ruleset to execute. Together, the entire rule engine forms a trie structure.
Summary of Development Workflow
In brief, the framework will guide the clients to deconstruct business logic descriptions into a series of components:
- The developers can construct RuleF on each predicate in the sentence.
- The developers can construct RuleTree on each conditional statement in the sentence.
- Multiple sentences construct a forest.
- The framework will encapsulate the description of the forest into a rule engine.
What is Next
Our Rule Engine Library had tremendous success in helping developers to build the service. Engineering efficiency took a significant boost that enabled cross-functional teams to conduct A/B testing experiments, switching various rules within the ruleset, and optimizing certain rulesets through machine learning models.
As a next step, we want to make more platform enhancements to our rule engine library by creating a one-stop UI for rules management. Engineers and product managers can construct rules based on the business use-case and continuously evaluating other payment domains’ opportunities to adopt rule-based solutions.
Please check out our other articles to learn more about our work on building streaming infrastructure at scale. Lastly, we’re usually looking for talent, check here to see our open roles at Disney Streaming!