Simulating a Parallel Queueing System With SimPy

Aaron Janeiro Stone
The Startup
Published in
6 min readNov 2, 2020
A Join the Shortest Queue simulation.

For newer versions of SimPy, particularly following SimPy 2 (now called SimPy Classic), there are unfortunately few resources online covering advanced queueing systems. Given that the newer editions of the package are much easier to use due to their adherence to the changes in PEP standards, it can be confusing for new Python adopters to opt instead for SimPy Classic in an attempt follow its wider range of examples. Instead, in this article, we will be covering how to replicate the “Several Counters with individual queues” example with an up-to-date SimPy installation, followed by an example of extending it to a load-balancing system.

SimPy Preliminaries

SimPy is a Discrete Event Simulation (DES) package for Python. DES refers to a simulation system which will periodically introduce a specified event as it runs, discrete insofar as each event can be thought of as “happened” or “not happened (yet, or ever)”.

In SimPy, the DES system is contained within its Environment class. The user specifies a system which will run for the Environment upon executing Environment.run().

Within the Environment , the user can define Process and Resource objects which will dynamically interact as the DES is run. In particular, Process objects are the generators of the discrete events we will be working with while a Resource defines an object to be interacted with in this DES “universe”.

In Python, generator objects can be thought of as distinct from functions. Whereas functions have the form of (for some manipulation, foo(a), like a+2 ):

def function(a):
return foo(a)

generators have the form of:

def generator(a):
yield foo(a)

where yield instead denotes a single output which will not cause the generator itself to stop being considered. To make this clear, it is possible to have:

def generator(a):
yield foo(a)
print('Wait, I forgot this!')
yield bar(a)

which is useful for an object which will be existing in our universe for a period of time. return would instead immediately end the function and output foo(a) .

Parallel Queueing System Example

A queueing system with two servers in parallel

This example is a translation of this example from SimPy Classic. In this scenario, we are implementing what is known as the Join the Shortest Queue (JSQ) algorithm, wherein a job will choose the queue with the smallest wait time. In our DES universe, the jobs will be “Customers” and the servers will be “Counters” at a bank. There are two counters currently servicing patrons and the patrons are smart enough to not to join the bigger of the two lines.

To start, let us run our system until either 30 customers are generated and complete their time in the environment or if some period of time (400 arbitrary units of time) have passed. Let us also choose an average time for each customer to spend in the bank and the mean of our arrival process. For simplicity, we will assume both the service and arrival processes to be exponential.

Now comes our definition of a Customer, a Process which will be generated by an arrival process. The logic for how the Customer will behave in our simulation is:

We begin by logging the arrival time (env=Environment()), followed by using Qlength to store the current queue sizes. The arriving customer then decides to think out loud, printing their arrival time, name, and the length of each queue into our console (his name will be defined later, we are yet to define our arrival process!)

They then choose the smallest of the queues and wait in line ( yield req ). When it gets to their turn, they then rudely exclaim the current time and how long they had waited to the teller. A random service time is then chosen ( tib ) and is modeled by yielding a “timeout” for this time. Following this, our patron states that they are finished and the time, leaving the bank for the great unknown.

As for their arrival process, it is defined as:

where number will be filled by our maximum of 30.

Lastly, after initializing our simulation, the program will look like:

Executing the script should output the following:

Bank with multiple queues
0.0000 Customer00: Here I am. [0, 0]
0.0000 Customer00: Waited 0.000
0.2044 Customer00: Finished
5.3892 Customer01: Here I am. [0, 0]
5.3892 Customer01: Waited 0.000
12.4838 Customer01: Finished
22.8307 Customer02: Here I am. [0, 0]
22.8307 Customer02: Waited 0.000
27.1357 Customer02: Finished
27.4258 Customer03: Here I am. [0, 0]
27.4258 Customer03: Waited 0.000
30.9531 Customer03: Finished
...
370.0944 Customer28: Waited 97.025
373.5097 Customer28: Finished

We see, in this case, that it takes up until t=88 for our queues to begin forming. Moreover, the maxTime was not passed as all 30 customers (0 to 29) were first serviced.

Extension to JSQ(d)

A parallel system with N queues

An important advance in network engineering has been the adoption of routing algorithms which do not parse all queues when finding the smallest queue to join. Rather, Join the Shortest Queue-(d) (JSQ(d)) will parse d of the queues and join the shortest of these. In a world where cloud computing networks might have a farm of processors numbering in the hundreds or even thousands, the time needed to parse each can be costly. Such has reigned in the study of “load-balancing” algorithms like JSQ(d).

As an example, let us assume the red dots in the following to represent a job under JSQ(2)

The job which just arrived is placed in the square box and shall now choose its queue by counting the jobs of 2 randomly selected queues. In the following, if the blue circles represent the sampled queues, then clearly the optimal choice is the second queue to the right.

JSQ(d) in SimPy

JSQ(d) is the easiest load balancing method to extend our bank example. In this case, we only technically need to modify the prior code to include the logic for choosing from a subset of queues.

Noting first the inclusion of doPrint , a parameter which can be set to reduce the amount of output in the console, we arrive at the following modification of the job process:

In this case,

choice = [k for k, v in sorted(Qlength.items(), key=lambda a: a[1])][0]

refers to the action of first sorting the items of the dictionary (which are of form {queue: queue_size for queue in queues} ) from smallest to largest and then selecting the smallest queue now the first item in the list. Do note that in Python, the 0th element refers to the first item in an interable object.

At last, along with the addition of an noJobCap parameter to allow for easy switching between having or not having a maximum allowed number of jobs in the system, we have:

Running this will give you the following plot:

Clearly, we can see why this would be considered a “load balancing” algorithm; over time, the workload is distributed evenly across all queues.

--

--

Aaron Janeiro Stone
The Startup

Statistician, MMath Statistics Student at the University of Waterloo