Improved Simulation Performance with Recursive Vectorization in Python
Running a step-by-step simulation can be a time-intensive process if you’re dealing with large datasets, complex modeling logic, slow infrastructure, or all three. Compared to forecasting with regressions or classification, the speed can be exponentially slower.
But that’s just how it is. Simulations are necessary for modeling complex logical steps. And maybe that means switching to faster languages (C++, C#) or throwing hardware at the problem to cut down on processing time.
Yet using a more approachable programming language like Python can still extract decent performance using a trick I came across leveraging recursion.
Usually recursion is slower than looping in Python because of how it references the call stack. It’s usually only meant for problems where the functional logic is inherently recursive, like a Fibonacci sequence, where it keeps the code simple at the expense of performance. Otherwise recursion makes the code more complex.
But when combined with vectorization, which has its own performance benefits, recursion can reap 10 fold faster speeds in some circumstances.
Take for example a simulation that loops through time and calculates a vector for each period. In a classic for loop, the logic would look like this:
output = 
def attr1(vector, i): return 0
def attr2(vector, i): return True
def attr3(vector, i): return “Blue”
def attr4(vector, i): return 17
for i in range(100):
vector = [attr1(vector, i), attr2(vector, i), attr3(vector, i), attr4(vector, i)]
A pretty straightforward way to approach the problem, but this can be slow because of how Python appends lists, especially with big data. It’s seeking the end of the list at every append, and that task gets increasingly large with the size of the data.
There can be many ways to speed up looping in general, see here for some examples, but things like list comprehension aren’t really viable for a simulation where there might be hundreds of steps to process (maybe possible but hard to say).
Using map/reduce logic is a common way of taking advantage of the performance benefits of vectorization, but because each period is based on the one before it, you have to go step by step.
That is, you can’t just say:
F([a, b, c, d], i=100) = [f(a), f(b), f(c), f(d)]
It’s more like:
F([a, b, c, d], i=100) = [f(a, 99), f(b, 99), f(c, 99), f(d, 99)]
And then you’re back to looping again.
Structurally, this is a good example for recursion. It turns out that recursion combined with mapping in an object-oriented structure can take advantage of the performance benefits of vectorization.
So you can do:
_attr1, _attr2, _attr3, _attr4 = 0
def attr1(self, i): return self.attr1(i-1)
def attr2(self, i): return self.attr2(i-1)
def attr3(self, i): return self.attr3(i-1)
def attr4(self, i): return self.attr4(i-1)
def output(self, i):
# using map() to calculate every period simultaneously
return [map(self.attr1, i), map(self.attr2, i), map(self.attr3, i), map(self.attr4, i)]
# get 100 periods of simulation
In essence, it starts from the end, period=100, and then calculates each previous period until it gets to the beginning (period=0). From my testing I was getting a regular 10x speed improvement with this approach compared to looping.
1. This is definitely not complete code here. To avoid infinite recursion, it needs logic to prevent infinite recursion. Depending on how you want your output to look, the output vector may need to be coalesced differently.
2. For more complex logic, lru_cache comes in handy for memoization to avoid repeat method calls.
3. This is assuming the logic can be handled with tail recursion — recursive logic at the end of the method rather than the beginning. I haven’t tested it with head recursion, but I assume you are more likely to have issues with maximum recursion depending on how the code is structured.
4. There may be limitations on the number of periods that this can handle based on max recursion limit in settings. I never hit it, but I can imagine it happening.