cayenne: a Python package for stochastic simulations

srikiran chandrasekaran
Heuro Labs
Published in
4 min readAug 8, 2020

TL;DR; We just released v1.0 of cayenne, our Python package for stochastic simulations! Read on to find out if you should model your system as a stochastic process, and why you should try out cayenne.

Mathematical models breathe structure into our understanding of the world. Oftentimes, they are cast as ordinary differential equations that describe how things change with time. In most cases, the insight from such models is sufficient for the question at hand. However, if you are trying to model the number of people who are going to be infected by SARS-CoV-2, a differential equation may not be the right tool for at least a couple of reasons:

  1. Variables in differential equations are continuous and take values like 0.3, 2.7, etc. This isn’t ideal if you are talking about the number of people, which can only be integers like 1 or 2.
  2. A differential equation always predicts the same output for a given input. However, not everyone who comes in contact with a COVID-19 infected person will develop the disease — the event is actually probabilistic, or stochastic.

Let’s look at this with a concrete example. Infection and recovery from SARS-CoV-2 can be modeled with Susceptible-Infectious-Recovered (SIR) model. In this model, when an infectious person comes in contact with a susceptible person, the susceptible person gets infected and can now infect others. Eventually, the infected person recovers and cannot spread the disease anymore. These are represented by the equation below:

The infection and recovery process

Giving some numbers for the parameters “k1” and “k2”, we can simulate the model as an ordinary differential equation in Python like this

If you plot the result of the simulation, it will look like this:

Simulating the SIR model with differential equations

You can see that i) the number of people is being represented by fractional values & ii) there is no notion of “probability” in the plot, it is just a smooth curve vs. time.

To address these limitations of differential equations, we use the more elegant built-for-purpose mathematical construct called a Markov jump process. Such a process is well suited for something like modeling the number of people. When the number of infected people increases from 5 to 6, there are instantaneous “jumps” between the integer values, instead of the gradual change (e.g. 5.1, 5.2, …, 6) seen in differential equations. Below we see the same SIR model from above, but simulated as a Markov jump process.

Simulating the SIR model with a Markov jump process

The number of people in this Markov process jumps between integer values at random time-points, at a frequency determined the rate constants k1 and k2. The simulation stops when the number of infected individuals is zero, around time= 70 days. Repeating this simulation with a different random number seed gives a different simulation trajectory, but with the same general trend:

Simulating a Markov jump process with a different seed

Here we see that the infection ends quickly (t=25) as the number of infected individuals becomes zero sooner. Since each simulation trajectory is probabilistic, we generally run a large number of them to get an overall picture of the process.

For simulating these Markov jump processes (also known as continuous-time Markov chains), we have developed an accurate, easy-to-use Python package called cayenne. You should be able to install it easily from here. And below is the minimal code for simulating the SIR model 10 times and plotting the simulation trajectories.

Results of simulating the SIR model as a Markov jump process, 10 repetitions, on Cayenne

If you are someone who runs stochastic simulations, we think you will enjoy trying out our package (get it here, examples here and a tutorial here) if you care about:

  1. Accuracy: We tested all our algorithms for accuracy using the stochastic SBML test suite. Other packages that we tested were not as accurate as ours, as we find in our benchmarks.
  2. Fast code: Our backend is written in Cython for a nice balance between speed and ease of writing new algorithms. And different repetitions of the algorithm can be run across multiple CPU cores out of the box.
  3. Quick prototyping: We leverage the cool antimony library to provide an easy and intuitive model writing interface. There is no need to write out the stoichiometric matrices.
  4. The little things: Stochastic simulations usually mean outputs logged at stochastic time points (e.g. at t = 19.3, 19.7). But with cayenne, you can get an accurately interpolated value at the time points of your choice (e.g. at t = 19.5).

We’d love for you to check out our package and let us know what you think! Currently, we are writing up a manuscript.

--

--