Enumerating test cases by multiplying state machines

Nilesh Trivedi
ClearTax Engineering
6 min readJan 29, 2019

State machines are very useful for cleaning up the code but also very under-utilized by programmers. Richard Clayton wrote a great intro to state machines here. There have been vociferous debates about them on Hacker News. Essentially, if your entity has a property called status or multiple boolean variables like hasStarted, hasSucceededetc, your code can benefit from using the state machine design pattern.

In one of my projects, we had built a B2B SaaS tool with a very typical customer life-cycle (free trial period of 14-days, ability to upgrade to one of the PRO plans and also downgrade at any time including going back to a free plan). If the trial expires, then the customer is automatically moved to a free plan. Here is what the state machine diagram would look like.

And we needed to bill the customer on pro-rated basis at the end of each calendar month — taking into account free trial period, upgrades / downgrades etc.

This creates quite a few scenarios to think about and test when it comes to billing:

  • What if the trial period is still ongoing at month end?
  • What if customer upgraded to a paid plan but also downgraded to a free plan within the same calendar month?
  • What if the trial period for a customer was more than 30 days so it could overlap not two, but THREE calendar months?

If you are trying to score enterprise clients, a billing error would leave a very bad impression. We need to think about this problem in a more systematic way so that any change in requirements doesn’t explode the complexity.

Well, first we need to make the idea of passage of time very explicit. Instead of sprinkling Time.now or new Date() everywhere in our code (which makes our functions impure and hard to test), we need to realize that calendar months too can be modeled as state machine:

And now we see what’s really going on. Two independent machines are transitioning states in parallel and it was the implicit nature of time that was hiding this complexity and making it harder to reason about it.

Now, a few questions arise. What is the list of states the combined system can possibly be in? And what possible transition sequences can lead to any given state or event? It looks like some sort of Cartesian product (in plain language, one of set B for each item in set A) can be used to enumerate all possible transition sequences. We need to define multiplication for two arbitrary state machines.

Let’s talk about visualizing it first. One of the reason why programmers under-utilize state machines is the problem of state explosion. Essentially, adding a new aspect to the state machine multiplies the number of states that need to be modeled, and creates a disproportionately high number of transitions. This is what our combined state machine would look like:

Combining one machine that has 10 transitions with another that has 1 transition has led to a combined machine with 24 transitions!! That’s state explosion for you. This is messy and makes it hard to work with changing requirements (what if trial period becomes 45 days long?). Our goal is to write testcases for all possible scenarios when a MonthChange event happens so that we can perform end-of-the-month billing correctly, but it’s very difficult to enumerate those even using the above picture.

Fortunately, a solution for reducing visual complexity was found decades ago. It’s a visual formalism called Statecharts. Even front-end developers have started using this to design better code for widgets. See this, this, this, and this. David Harel wrote a whole book about modeling reactive systems with statecharts. Seriously, just go read the paper.

Our billing system has parallel state machines, but no hierarchies and no guards so out statechart would simply be this:

While this diagram lets us document the state machines easily, it doesn’t help us in enumerating the possible paths. We’ll need to write some code. Let’s model each state machine as a labeled multidigraph with nodes as states and edges as transitions. One of the states is called the initial state. Here is the FSM for customer’s billing plan changes:

# Here, an FSM is represented as an ordered hash
# the keys of the hash are the names of states
# the values are array of outgoing transition events
# An event has a label and a destination state
# the first key is the initial state
customer_fsm = {
"FreeTrial" => [
{event: "TrialExpires", to: "FreeForever"},
{event: "Upgrade", to: "FreeForever"},
{event: "Upgrade", to: "Basic"},
{event: "Upgrade", to: "Pro"},
],
"FreeForever" => [
{event: "Upgrade", to: "Basic"},
{event: "Upgrade", to: "Pro"},
],
"Basic" => [
{event: "Upgrade", to: "Pro"},
{event: "Downgrade", to: "FreeForever"},
],
"Pro" => [
{event: "Downgrade", to: "FreeForever"},
{event: "Downgrade", to: "Basic"},
]
}

And here’s one for the calendar month change. We only care about the first two months so there’s no outgoing transition from SecondMonth state:

time_fsm = {
"FirstMonth" => [{event: "MonthChange", to: "SecondMonth"}],
"SecondMonth" => []
}

Then we implement the Cartesian product of these two structures. We write a function that generates all the possible sequences of transition events by traversing these two state machine graphs:

def cartesian_product(first_graph, second_graph)
# multiply two graphs
product = {}
first_graph.keys.each do |i|
second_graph.keys.each do |j|
k = i + j # combined state name is concatenation of two names
product[k] = []
first_graph[i].each do |f|
product[k].push({event: f[:event], to: (f[:to] + j)})
end
second_graph[j].each do |g|
product[k].push({event: g[:event], to: (i + g[:to])})
end
end
end
return product
end

We then write a method that generates all paths using Depth-First-Traversal. This method accepts a lambda argument: exclude_edge_if for NOT traversing certain edges in certain conditions:

def find_all_paths(graph, first_edge, exclude_edge_if, path_so_far=[])
outgoing_valid_edges = graph[first_edge[:to]].reject { |e| exclude_edge_if.call(e, path_so_far + [first_edge]) }
if outgoing_valid_edges.empty?
puts (path_so_far + [first_edge]).collect { |e| (e[:event] + "(" + e[:to] + ")") }.join(" -> "), "\n"
return
else
outgoing_valid_edges.each do |edge|
find_all_paths_dfs(graph, edge, select_path_if, exclude_edge_if, path_so_far + [first_edge])
end
end
end

And now we can generate all possible event sequences in the multiplication product graph with:

product_graph = cartesian_product(customer_fsm, time_fsm)find_all_paths(
product_graph,
{event: "start", to: product_graph.keys.first},
lambda { |edge, path| false }
)

Since our FSM has loops, this enumeration ends up generating infinite number of sequences. This is because currently the user is allowed to upgrade and downgrade unlimited number of times within the same calendar month! Our construction of parallel FSMs has already helped us in detecting a pathological case that can cause bugs and headaches. To avoid this, we can put in a restriction (only 1 downgrade allowed in each calendar month). The list of allowed state transition now depends on sequence of events that have happened so far. We use the exclude_edge_if lambda to implement this:

def downgraded_in_current_month?(path)
last = path.reverse.detect { |e| ["MonthChange", "Downgrade"].include?(e[:event]) }
last && last[:event] == "Downgrade"
end
find_all_paths(
product_graph,
{event: "start", to: product_graph.keys.first},
lambda { |path| true }, # select all paths
lambda { |edge, path| ("Downgrade" == edge[:event]) && downgraded_in_current_month?(path) }
)

And that leads to these sequences:

start(FreeTrialFirstMonth) -> TrialExpires(FreeForeverFirstMonth) -> Upgrade(BasicFirstMonth) -> Upgrade(ProFirstMonth) -> Downgrade(FreeForeverFirstMonth) -> Upgrade(BasicFirstMonth) -> Upgrade(ProFirstMonth) -> MonthChange(ProSecondMonth) -> Downgrade(FreeForeverSecondMonth) -> Upgrade(BasicSecondMonth) -> Upgrade(ProSecondMonth)

start(FreeTrialFirstMonth) -> TrialExpires(FreeForeverFirstMonth) -> Upgrade(BasicFirstMonth) -> Upgrade(ProFirstMonth) -> Downgrade(FreeForeverFirstMonth) -> Upgrade(BasicFirstMonth) -> Upgrade(ProFirstMonth) -> MonthChange(ProSecondMonth) -> Downgrade(FreeForeverSecondMonth) -> Upgrade(ProSecondMonth)
....
  • This gives us all the possibilities, which can be used as an input for our test cases.
  • One often overlooked feature of generating exhaustive set of inputs is the ability to go over each variation and confirm that nothing can crash your business logic!
  • You can pick certain edge cases and interesting examples and assert the output.
  • One very exciting thing you can do next is to combine this approach property based testing as well to automatically generate your asserts based on certain invariant business rules.

By using an appropriate construct, we have made our thinking process structured and systematic, instead of haphazard and ad-hoc. The same approach will nicely scale to systems with more numerous and more complex state machines.

Discuss this on Hacker news, /r/programming, or Twitter.

--

--

Nilesh Trivedi
ClearTax Engineering

I like making things. But I also like making things up.