How to handle 100k rows decision table in Drools-Part 1

Ryan Zhangcheng
Nerd For Tech
Published in
7 min readJan 30, 2021

TLDR;

How to handle 100k decision table rows in Drools?

When handling large rows of decision tables, one of the pain points is performance. In this article, I prepared a prototype setup to a simple scenario to simulate the large decision table use case and provided three solutions to utilize drools ( a rules oriented application framwork). I would focus on the decision table rule execution performance.

For the sake of explanation of the core concept of problem solving, I prepared two decision tables 10k & 100k row data to simulate decision making procedure usage in rules application.

I provide 3 different solutions in three git branches, for coding fans, feel free to jump directly into the following github links and check the code.

rule-template-solution — Use Rule Template + XLS raw format decision table

precompile-rule-solution — Use Kie-Maven-Plugin to precompile Formatted Drools Decision Table

row-as-fact-solution — Use Large row data as Fact instead of Rules as a solution;

For an overview, the performance comparison in my demonstration code is:

10k Rows Decision Table Scenario
100K Rows Decision Table Scenario

(Warm up time includes: Load Rules & Facts, Xls, create kiesession etc)

The performance difference is obvious, however each solution has their pros and cons.

If you are interested in this topic, let me guide you dive into the details.

Scenario Briefing

Recently I helped a customer to handle a very large number of decision tables in drools and provided reasonable performance.

Typically in insurance, healthcare, or bank, logistic industry, it ‘s not a rare thing to have to maintain huge numbers, rules or keywords, or values for computing a result.

Usually the decision table is recommended for such scenarios because it ‘s easy to understand and maintain from a business user perspective.

Decision table is very convenient for handling large amounts of rules. However performance is a big concern. As you can see from the previous tables. Does my rule framework(eg: drools in this article is a rule frameward) can satisfy my performance requirement?

It can take minutes to just compile rules, and it could also cost seconds to serve one rule request.

So in my setup, I prototype a rule usage scenario that can reproduce the large decision table rule firing usage.

Assuming I have a decision table, which would match a keyword, when a keyword is matched, the result is decided as true or false.

When A ClientObject description match a keyword in my list then clientObject.result is false, otherwise clientObject.result is true( Note that I make the true as default).

Of course in reality, there are more complex rule when making a business decision. Drools have sophicated solution for complex rule handling, for example using drl or integrate the other rules into the same decision tables. However, in this article, the problem we want to solve is handling very big number of rows, so I hide the complex rule configurations.

100k Decision Table

For the purpose of comparison and efficiency, I provided two decision tables:

10kTable.xls and 100kTable.xls which contain 10k rows and 100k rows.

ClientObject

We also want to separate business rule code from application code. So that we have setup two separate maven projects

  1. rules — Use to store and manage rules logic
  2. myapp — Use to maintain the generic application code
Project Setup

In production, we usually would run rules in a standalone process instead of embedded into myapp. However, in our setup, we mainly focus on rules execution performance, so we simplify it to run the rule in embedded mode but we package the rules in a seperate jar file((a.k.a kjar). There should be no performance difference for rules execution in embedded or standalone mode.

Also regarding to cpu and memory configuration, my laptop is 8 Core x 64G. Due to the large decision table, it would take a lot of compute resources to compile & run in some test run. If it takes too long for you to run the 100k decision table test scenario, I suggest you to just run the 10k decision table scenario. I also deactive 100k decision tables by default. You need to follow the Readme in the github to active 100k decision table test.

Solution 1 Rule Template + Xls file

In drools, we can utilize the rule template to handle Excel files.

In my example code, all you need to do is mainly 2 steps

  1. configure kmodule.xml and package the kjar in maven project (by utilize kie-maven-plugin)

2. Trigger your rule in client code as following

Decision Table is plain xls format:

Rules configuration is managed in rule template

The syntax almost explains for itself. First you define the variable for each column, as “description” and ”result”, then you reference the variable in drl as @{result}, @{description}.

As long as you are familiar with the basic syntax of drools rule language, it’s quite straightforward.

Notice that the rule template would generate 1 rule per row in the decision table. So assume that you have a 100k row table, you would have 100k rules in your runtime memory.

Pro

This solution does not need a special header for Excel data. The header and condition configuration are configured in the rule template.

As you might notice, our decision table is plain simple:

By contrast, let’s see what typical drools domained header looks like for spreadsheet decision table:

So the first good point is the rule data is easy. A plain spreadsheet is easier to understand and maintain by any user who does not yet have knowledge of drools domain syntax. You probably already maintain your business rules in such forms before you even adopt drools for rule management.

The configuration is managed in a separate rule template file which can be managed by a different person or team.

Secondly, by utilizing the rule template, it’s very flexible for rule conditions and actions since you can add multiple lines of code logic just like any coding block instead of putting them in Excel columns which would lose readability when it becomes multiple lines.

Also, you can store the rule data in DB or CSV if you like, since it’s damn simple row data without meta information. Drools provided interface to handle this.

In some cases, users might want to customize the rule data governance by developing their own solution for editing, managing. Although usually I would suggest users to utilize kie workbench which is sophisticated and rich functioned rules authoring and governance tools. However, if you want to develop your own portal or integrate the rule authoring experience into an existing application, then you can edit the business rules via DB row data and convert them to drools rule through rule templates.

Con

The disadvantage is also obvious from my point of view.

  1. Raw Excel is not easy version controlled

Believe it or not, it’s vital to separate the business logic from application code in rules oriented applications. Therefore rule data version control is vital and difficult.

Although drools community provided sophisticated and fully functional governance tools “kie workbench” to manage the rules authoring and version control features etc. Unfortunately, it can’t recognize both “rule template” and raw Excel format.

So basically you can’t import them into kie workbench or export them.

2. Performance is not good

This can be observed by executing the client code in my demonstration code:

So obviously for a large decision table, this would not be an ideal solution considering that one rule execution might take several seconds, although it’s quite flexible and rules data is easy to manage by excel file.

I think there are two reasons cause the slow running:

  1. There are huge numbers of rule, in my case, there are 10K or 100k rules in my rule execution session;
  2. Converting row data into drools rules is slow and it would slow down the application, possibly it would cause a lot of JVM overheads.

Although this article mainly focuses on performance, I think rules governance is at least same important as performance if not more. Otherwise I can actually handle this logic in plain java instead of involving drools.

I also see some user choosed a partial solution:

  1. Separate the big row data from rules engine
  2. Keep other rules data in rules engine as usual

It indeed can fix performance issues by separating the challenges outside of drools domain.

Personally I think it’s far from an ideal solution as well, because it doesn’t keep all the business rules in one place. It would leak into or let’s say pollute your generic application code.

For rules oriented application, a half barrel is like an empty barrel. It’s very hard to manage the business rules software lifecycle, such as editing, version control and deployment if you don’t utilize the rules application framework, such as drools.

I would describe solution 2 precompile rule solution in my next article , the problems I want to fix are :

  1. Don’t dynamically load Excel data at runtime, let’s precompile it at build time;
  2. Use drools spreadsheet decision table so that it can be “version controlled” by KIE workbench;

--

--

Ryan Zhangcheng
Nerd For Tech

Red Hat Senior Consultant. Focus on App Dev, DevOps, OpenShift technology.