Making Sequential Execution Fast(er)

Clayton Long
deliveredtechnologies
6 min readMar 13, 2017

You may or may not have read my last post A Small Step Towards Simplifying Software Development. In that post, I described RuleBook, a simple and intuitive business rules abstraction (aka a sequential rules engine). There are a lot of great things to love about sequential execution, including predictable execution, which is critical if you need rules for workflow-type applications. But the one stigma that weighs down sequential execution is performance.

I’ve heard “sequential execution doesn’t scale” and “sequential execution is too slow for larger applications.” Those are two very different but related statements. The first implies a growth in execution time or resource consumption beyond O(n). The second statement implies that O(n) is not sufficient.

As far as the first statement goes, properly implemented sequential [rules] execution should scale linearly, or at O(n). So, the remainder of this post will deal with the second statement in the above paragraph: sequential execution is too slow for larger applications.

My assertion is that predictable execution and ease of maintenance should outweigh raw performance and that for most applications, sequential execution should be sufficient. If you have so many rules that they can’t be reasonably run without leveraging a specialized [and likely complex] algorithm for the needed performance, then it’s doubtful that the aggregate of those rules is fully understood [by anyone]. Still, if additional performance can be [easily] achieved while still providing the predictability afforded to sequential execution, then why not have that additional performance?

What Does It Mean to Execute Sequentially?

Sequential execution is really an expected outcome, not necessarily the process. If step 1, step 2 and step 3 were executed in the order step 3, step 1, step 2 but the result is the same, including any state changes along the way, then the execution can be said to still be sequential so long as those conditions hold true for every possible scenario.

This is not a new concept. Out of Order execution (OOP) that behaves like sequential execution has been around for a long time. It is how instructions are executed on many CPUs. Dynamic scheduling (e.g. Tomasulo’s algorithm), pipeline operand forwarding and branch prediction are all examples of how out of order execution is used to improve performance and at the same time provide predictable results that mimic sequential execution.

What Does It Mean To Be Faster?

First let’s look at the limitations of sequential execution, using RuleBook as an example. RuleBook uses references for all the data it stores during execution. So, it heavily relies on pointing to data that already exists instead of creating new data. This means that in terms of memory, it’s actually pretty efficient. Instead, it’s inefficiencies lie in how it processes rules.

RuleBook uses the Chain of Responsibility (CoR) pattern to chain rules together. And although CoR is a very flexible pattern that provides for predictable sequential execution, it’s focus is on simplicity and flexibility over performance. This is not a bad thing. But it does leave room for performance optimization. Since each rule object is chained to the next in sequence, it means that given n number of [equivalent] rules, it takes n steps to evaluate all of the rules over some time based on n. Making that execution faster means reducing that time threshold to less than n.

The Approach: Using OOP to Improve Performance

Again, this is nothing new. Rete, an inference rules algorithm, relies on out of order rules execution. The big difference between what Rete achieves and what we are trying to achieve here is that Rete does not presume an ordering between rules. But we are trying to first simplify development and second provide improved performance. Basically, we have the goals of Rete inverted.

Still, we can learn a few things from Rete and other OOP approaches (and by learn, I mean borrow/steal). Rete supposes that all rules are independent. But it also resolves data dependencies when they exist. And like other OOP approaches, we must do that same. However, unlike Rete, we also want predictable results that mimic sequential execution. In other words, the only real difference between our approach and using say, vanilla CoR (e.g. RuleBook), should be the time it takes to process.

Step 1: Determining the Best Case

Continuing with the example of RuleBook, the best case scenario (for performance) is when each rule is completely independent. In this case, the order that any of the rules are executed in really doesn’t matter. All that matters is that all rules must complete their execution before we can say execution is complete.

Step 2: Determining the Worst Case

In the worst case, all rules modify the same object(s) in some sequence. This scenario means that every rule must be executed sequentially in the order in which it was specified. Here, executing each rule, one at a time, is the only option.

Step 3: Determining the Average Case

In the average case, some rules will have dependencies on others and some won’t. In the cases where a rule has a [data] dependency on another rule, those rules must be executed sequentially in the order specified. In the cases where rules have no [data] dependencies, no ordering must be maintained and those rules may be executed out of order or in parallel.

The Solution: Dependency Tree

The great part about this solution is that since the expectation is that rules are added in order (remember, we are moving from a pure sequential execution model), a dependency tree can be built as rules are added.

  1. If a rule modifies known dependencies, it’s added as a leaf node on the branch for that dependency. Existing read nodes for that branch are then collapsed into a single branch with the write.
  2. If a rule modifies new dependencies, a new branch is created for those dependencies.
  3. If a rule only reads existing dependencies, then a new branch is created from the node where the dependency was last written.

Using a tree constructed as detailed above, nodes in different branches have no order dependency and can be executed in parallel or out of order. Where a single branch diverges, the dependency is forked. Execution based on the dependency tree can then be as follows.

  1. Start at the root node.
  2. Create a different thread for each node at level 1 from left to right.

In each thread

  1. Execute the rule.
  2. Create a different thread for each Subnode[1… n]
  3. Node = Node -> Subnode[0]
  4. Repeat from Step 1.

An Example

Let’s take a sequential list of rules, each with various dependencies, denoted as dn. An asterisk denotes that a particular dependency is changed. A missing asterisk denotes that the dependency was read by not changed. Using purely sequential rule chaining, like that used in CoR, each rule must be executed in order, so the time required is t = n.

Pure Sequential Execution

By applying the solution illustrated above, however, we get the following dependency tree.

Dependency Tree That Mimics Sequential Execution

Rules are ordered based on their dependencies and the order in which they are expected to be executed. The blue lines denote the individual threads. And each level in the tree has an execution time t of 1. For the same rules, using this example, the dependency tree solution executes in t = 4, compared with t = 10 for a purely sequential solution. That’s a pretty good improvement. And it’s pretty close to the average case.

This [relatively] simple approach allows for parallel and possibly out of order execution for rules that yields the same results as sequential execution. However, rules found later in the chain, won’t have to wait to be scheduled until after non-dependent rules execute. As shown in the example above, the total number of rules executed wasn’t decreased. But by taking advantage of parallelism that is available between two independent rules, the time required to execute rules, even in the average case, can be improved significantly. And the best part is, to the rule creators, no shift in thinking is required. To them, it’s still just as easy as sequential execution. The only difference is faster average execution time.

--

--

Clayton Long
deliveredtechnologies

Cloud Automation Manager by profession, Programmer in my free time, and lover of anything that makes my life easier.