How to Design Software — Rules Engines

Walk through the components of a minimal rules engine and understand some of the thought processes behind approaching its design.

Joseph Gefroh
The Startup
Published in
11 min readJul 9, 2020

--

Imagine this scenario

You’re a busy engineer working in a company with too much to build and too few resources to do it. Departments constantly ask for changes only an engineer can make. Marketing continuously asks for email on-boarding changes. Operations wants more and more batch tooling. Sales wants to test new pricing structures every other week. You and your team are overwhelmed with requests.

What are your options?

Stop progress on planned work to make these requested changes? A constant stream of interruptions is terrible way to get any real work done, and who knows if these asks are even worth interrupting the real work for.

Ignore it? You’ll make progress on your planned work, but you’ll also have a bunch of external stakeholders now suddenly very angry at your department. Besides, isn’t technology supposed to help the business? Is it really helpful to have the business miss out on potentially significant opportunities just because you couldn’t spare an hour or two?

Triage the asks around a planned schedule? Planned interruptions are better than unplanned ones, but you’re still potentially limiting the progress of other departments and forcing them to work around your schedule — not quite as collaborative as one could be.

Imagine if there was a way to prevent these requirements from every reaching you, and to give the other departments tools to pursue these opportunities without your involvement. They’d be able to make internal pivots without having to rely on engineering to perform the work.

If such a solution existed, you’d save a lot of time and be able to focus on more important problems.

Enter the rules engine.

What is a rules engine?

A rules engine is a system that performs a set of actions based on specific conditions which can be configured during runtime. This means that an effectively set up rules engine would not require engineers to change a system’s business logic.

Engineers build rules-based systems all the time, albeit unconsciously. Every time you code an if-else statement, you are effectively creating a hardcoded rule that is followed by the system.

You can go a step further and make these systems configurable dynamically. When I say dynamic, I mean that the behavior of the system is not determined by flows coded by engineers, but by configuration through data.

These so-called “codeless” systems are not new. Software like Zapier, Hubspot, and IFTTT operate off of the same concepts, as do more technical implementations like Boomi or Drools.

A rules engine is a system that performs a set of actions based on specific conditions which can be configured during runtime.

The fragility of concrete

Why are rules engines even needed?

By default, most engineers concern themselves on the details of what the rules are of the code they are writing. They were concrete implementations specific to the business use case they are encountering.

This makes sense — code does exactly what it says it does, so engineers need to know what needs to be done to write the code properly.

However, this coupling to the details of the use case makes the code they write churn significantly when the details of the use case changes. While this approach works in many cases, sometimes the rate of change can become overwhelming, especially on smaller teams or more dynamic environments like post-investment startups.

It’s easy to see how these things can change. Business logic is implemented to fulfill a requirement in what behavior the software should exhibit. These requirements can come from many sources — business demands, improvements in UX, regulatory needs, technical drivers, etc.

Examples include:

  • A “welcome” email must be sent 2 hours after a user signs up
  • If a payment account holds more than $100,000, over 3 consecutive days, the entire amount must be disbursed immediately
  • Employees that are a member of this union should accrue vacation time at double the standard rate for 2019, but not 2021.

All of these can also vary in levels of stability, or how often the details change.

For example, a requirement based on data obtained through user analytics may change weekly, whereas a requirement based on a financial regulation may change once every couple of years. A requirement based on a union contract may only change when the contract is renegotiated.

When it does change, the implementation must change alongside of it.

The evolution of an implementation

Let’s examine a hypothetical implementation of the “welcome” email requirement as it evolves over time.

A “welcome” email must be sent 2 hours after a user signs up

The first stage

Developers can be relied on to do the easiest thing. Most engineers boil business logic down to an if-this-do-that: if this happens, perform this action.

The first (and unfortunately often last) implementation most developers take is the direct approach.

It’s so easy — a single line of code can fulfill this requirement:

def on_user_create
WelcomeEmail.send(in: 2.hours)
end

The second stage

As time passes, the business often demands changes to details that seemed concrete and set in stone in the past.

Change is a constant, and software is SOFTware for a reason — it has to be malleable to fit the situation and circumstances.

What if we discover that 2 hours is too long, and we want to change it to 15 minutes? If we do, an engineer needs to go and change it.

def after_user_create
WelcomeEmail.send(in: 15.minutes)
end

Change is a constant, and software is SOFTware for a reason — it has to be malleable to fit the situation and circumstances.

The third stage

If the business is performing experiments to see what kind of on-boarding leads to the best retention and conversion rates, it’s likely this value will change a lot.

If this happens enough times, the developer will hopefully get smart and move this detail out of the code so it can be configured during runtime.

def after_user_create
WelcomeEmail.send(in: Configuration.get('welcome_email_wait'))
end

Configuration.get could access the database, read a configuration file, look at an environment variable, or perhaps even randomly decide. The important factor is that it is now determined at runtime instead of hard-coded.

The fourth stage

Each individual requirement taken in isolation often ends at the third stage.

However, if there is a collection of related requirements, patterns can start to emerge which the intuitive engineer can pick up on.

Let’s say the business also wants to send a tips and tricks email. An engineer is likely to code the following implementation, being consistent with prior work:

def after_user_create
WelcomeEmail.send(in: Configuration.get('welcome_email_wait'))
TipsEmail.send(in: Configuration.get('tips_email_wait'))
end

Once again, however, it requires a engineer to add support for the new kind of email.

Examining the evolution

As we’ve seen above — each change is easy and small, but it adds up over time, quite insidiously. It’s easy to see how it can quickly grow out of control in a system where hundreds or thousands of requirements may be implemented in parallel, often under high delivery pressure.

Even in this simple circumstance, time has caused unrelated emails to be tightly coupled to record lifecycle, which makes systems more complicated.

In the above scenario, the developer made the judgement to couple the mechanism to the use case. This means the developer tied together what was being done to how something should be done to why something was being done. They coupled the business domain to the technical domain — the mechanism became tied to the use case.

They coupled the business domain to the technical domain — the mechanism became tied to the use case.

When we introduced the tips and tricks email, we essentially had to repeat the implementation. The implementation relied on specific details — namely when something was being done, how long to wait, and what was actually being sent. These details then changed, forcing changes to the implementation (aka. extra work for engineers).

The details don’t matter

The truth is that these details don’t matter.

From a technical perspective, it shouldn’t matter whether a welcome email is sent 2 hours after the user signs up or 15 minutes after. Nor should it matter that it was a welcome email or a tips and trick email.

That’s the realm of the business — these are merely use cases for the base technology.

The implementation itself should remain the same — as far as the system should be concerned, it sent a Foobar notification upon creation of a Whatsittoya record, and can use (mostly) the same exact code to send a Fizzbuzz notifcation upon the deletion of a Whatchamacallit record.

When it is sent or what is being sent are details that just aren’t as concrete. These are the details that should be abstracted away because these are the details that are the most likely to change over time.

You don’t want to have to engineer something every time a requirement changes. You want to give that power to the business, barring some contractual, security[1], or regulatory requirement.

[1] Security also includes job security.

Building a rules engine

So, now that we’ve examined the context in which a rules engine would be useful, how do you actually build one?

There’s a lot of ways, but conceptually a minimal rules engine includes the following components:

Rule

A rule is a business policy. Within the technical domain of the rules engine, it is a collection of a set of triggers, conditions, and effects that are applied by the rules engine.

Trigger

The trigger is the thing that determines whether the engine should attempt to run through a rule or not. In most cases, it is contextual.

In simple systems, it could be a simple string check or even hard-coded, such as a string on a Rule with the value on_create and PaymentAccount to indicate that the rule should only be executed when a record of type PaymentAccount is created.

In more complex systems, it could be a full context check that looks at things like whether the user is logged in or the kind of record being worked on.

Condition

The Condition determines if the Rule should be applied in that particular circumstance and on that particular record.

It has some minor overlap conceptually with a trigger, but also enough differences to warrant a separate discussion.

A trigger is more of a generalized determination of whether a rule’s conditions should even be checked, whereas condition is more of an instance-specific check. Smaller systems can potentially fold the trigger in to a a condition, but more complex systems will likely benefit from separating the two.

The Condition itself will likely have some sort of reference to actual code that performs the check itself, with some parameters to pass in to the function.

For example, you could store a Condition record in the database that stores the name of the condition class as well as some accompanying parameters. The code could then dynamically initialize the condition class it identifies, passing in the parameters and the record to check against.

Effect

The Effect is what happens once a Rule is triggered and its Conditions pass. It is typically a function or execution of a function. It may be something as simple as setting a field or something as complex as kicking off an entire workflow.

Like the Condition, it will likely have some sort of reference to actual code that applies the Effect itself.

Engine

Finally, you have the engine itself. This is the thing that will actually perform the bulk of the work. It’ll accept records, load a list of rules, check whether those rules should be applied based on specific triggers and conditions, and then apply the effects of the rules.

Some other areas of concern

In addition to the above components, you’ll also likely encounter secondary areas of concern when building the rules engine. Things like auditing changes to rules, tracking the history of rules being applied, enabling the configuration of rules to specific people, event-based effects, rule DSLs, etc. are all related subsystems, but also outside the scope of the core of the rules engine.

The flow of a rules engine

How would this rules engine actually work?

  • Step 1: Trigger the engine
  • Step 2: Get the Rules
  • Step 3: Check the Conditions
  • Step 4: Apply the Effects

Step 1: Trigger the Engine

The first step that occurs is the engine gets triggered. The entry point could be a function Engine#run. This function is passed a record and its associated context (eg. if the record is newly created, or the engine is being called in an update, etc.)

Step 2: Get the Rules

The engine will get the list of rules that apply for that specific context. Perhaps it will load it in real-time from the database. Maybe it pre-loaded it from a configuration file when the system booted.

Either way, these Rules will have associations to Conditions and Effects, which is what the important part is for the next couple of steps.

Step 3: Check the Conditions

The Condition would be a boolean check that determines whether the Rule’s Effects should be applied or not. If a Rule has multiple Conditions, further boolean logic should be performed to determine how the Conditions are evaluated to determine whether the Rule applies or not (eg. all or nothing, any, quorum).

Step 4: Apply the Effects

If the Rule’s Conditions pass, it is time to apply the Effects. The Effect itself is arbitrary. You‘ll want to determine how to handle cases in which the application of an Effect failed.

Once this happens, the Rule has been applied. You’ve done it!

A concrete example

Suppose you had to set up a user account to be marked as a “Popular” member once a certain number of views for the profile has been reached. You could theoretically hard-code this, but it would require an engineer to change in the future. Good engineering minimizes the cost and impact of likely future changes.

If you had a rule engine that you could use to configure this instead, engineers would not be required.

A Rule could be created that:

  • had the Trigger of view
  • had a Condition of view_count being greater than 100
  • had an Effect of setting a field of the user to popular

The record would then run automatically through the rules engine, and could also be changed in the future.

Sample code

Having a hard time conceptualizing this? Not to fear — I’ve created a small Github repository to illustrate some of the concepts here.

Note that this isn’t intended to be a production-grade system, but rather a simplified illustration of the above concepts. Actual rules engines have many more moving parts and take into consideration a whole host of other cases.

As a final disclaimer: improperly implemented rules engines or rules engines applied outside of the appropriate context can also make your system incredibly complicated and lead to a loss of organizational agility, so be judicious in their usage.

Rules engines are highly useful tools — in the right context. If implemented properly, they can significantly ease development burden and improve the agility of other departments within your organization.

Did you like this article? Let me know in the comments, or connect with me on LinkedIn!

This article is a part of my series How to Design Software.

Become a Medium member and help support my writing as well as thousands of other authors!

--

--

Joseph Gefroh
The Startup

VP of Product and Engineering @ HealthSherpa. Opinions my own. Moved to Substack. https://jgefroh.substack.com/