Understanding the Performance Characteristics of Systems via Simulations
Malith Jayasinghe and Krishni Hewa
Discrete Event Simulation (DES) has been used extensively in numerous fields such as manufacturing, health care services, computing (e.g. network simulations) and financial investments. DES models the operation of the system using a series of events and the simulation occurs according to computational and mathematical principles. DES constructs a conceptual model of a real system. This conceptual model is an abstract representation of the real system. The term “system” is used to describe this abstract representation (or the conceptual model). The system contains the set of entities (e.g. servers, events) which interact with each other over a period of time. The entities can have properties. For example, a server will have a processing rate (10 requests/second), scheduling discipline (e.g first-come-first-served, first-come-last-served, processor-sharing, etc.), and processing time distribution (more formally the service time distribution). At any given point in time, the system has a state. The states of variables in the system represent the state of the system.
An alternative method to DES is analytical modeling. Analytical modeling involves mathematically modeling the system using modeling techniques such as queueing theory. Once we develop an analytical model, we can directly compute the performance results (under different conditions) by substituting values. Unlike in DES, analytical modeling does not require simulations. In our earlier article, we discussed how to model and analyze the behavior of a server analytically. Although analytical modeling has certain benefits (e.g. does not require you to simulate) there are limitations with the analytical models. Analytical modeling can become difficult when modeling certain types of systems under certain types of traffic. In certain situations, a solution for the system being modeled may not even exist (i.e. analytically not tractable). In such cases, we can simplify the model by making certain assumptions. The accuracy of results obtained using such models will depend on to which extent the assumptions are valid. DES, on the other hand, does not require you to develop complex mathematical models, rather we program the simulation model using one of the DES simulation software packages and obtain the results by simulating it.
There are several DES software packages we can use to perform DES (see here). In this article, we use Simpy. We will show how to simulate and analyze the behaviour of a single server. In particular, we will look at how the arrival rate into the server and processing time (also called service time) of the server will impact the waiting time in the queue and the number of requests in the server.
Simulating a single server
Let us now discuss how to simulate a single server and analyze the results. We focus on M/M/1 server which consists of a queue and a server (processing element). In M/M/1 system requests arrive at the server with a rate lambda (see the figure below). The time between these arrivals (i.e. inter-arrival time) follows an exponential distribution. The server processes the requests on a First-Come-First-Served (FCFS) basis until completion. The processing times (service times) of requests follow an exponential distribution (with a given mean). The following figure illustrates the system.
The following is the Simpy code for the above system.
The implementation provided above uses Simpy pipes which allows us to interconnect different components (processes) in a simulation model. The packet_generator method is responsible for generating a specified number of requests. Here both the time between requests and the service (processing time) of requests are generated using exponential distributions. These requests are then sent to the server (refer to method server(env, in_pipe)) through an out_pipe. The server processes the requests in an FCFS manner until completion.
Analysis of waiting time (in the queue)
First, let us investigate the impact of arrival rate on the waiting time in the queue. The following figure shows how the average waiting time and 99% (99% percentile) waiting time vary with the average arrival rate. Here, we maintain the processing rate of the server at a fixed value of 50 requests/second. This also means that the average (mean) service time is 0.2 seconds.
We notice that the average waiting time increases with the increasing arrival rate. As the arrival rate increases, there is an increase in the number of requests in the queue which results in the queue length to increase (see the analysis presented in the next section). As a result, the average time a request has to wait in the queue also increases.
For example, when we increase the arrival rate from 25 requests/second to 45 requests/second the average waiting time and 99% waiting time increase 9 x and 5 x respectively. As the arrival rate approaches the processing rate (50 requests/second), the waiting time approaches a very large value (i.e. infinity).
The following figure shows the waiting time distributions under different arrival rates (note: processing rate = 50 requests/second, average processing time = 0.2 seconds).
We note that there is a significant increase in the tail waiting times (latencies) under higher arrival rates.
Let us now take a look at how the processing time impacts the average waiting time under a fixed arrival rate. Here we maintain the average arrival rate at a fixed value (25 requests/second) and vary the mean processing time. The following figure shows the behaviour.
As the processing time increases, we notice a significant increase in the waiting time in the queue. The reason is that, as the processing time increases, the time the requests will have to wait in the queue increases and as a result the average and 99% waiting time increase.
For example, when the processing time increases from 0.01 seconds (processing rate = 100 requests/second) to 0.02 seconds (50 requests/second) the average and 99% waiting time increases 4.5 x and 3.3 x respectively.
This also means that if we improve the mean processing time of requests (e.g. by optimizing the code) from 0.02 seconds to 0.01 seconds, then we will get a 4.5 x and 3.3 x improvement in the average waiting time and 99% waiting time respectively. Note that we notice this behaviour under the arrival rate of 25 requests/second. Under different arrival rates, the factor of improvement may vary.
The following figure illustrates the waiting time distribution under different processing times.
Note how the tail waiting times increases as the mean processing time increases.
Analysis of the number of requests in the system
Let us now analyze the behaviour of the number of requests in the system.
The number of requests in the system = number of requests in the queue + number of currently processing requests
The following figure shows how the average number of requests in the system varies with the arrival rate under a fixed processing rate of 50 requests/second (i.e. mean service time of 0.2 seconds)
Note how the average number of customers increases with the arrival rate. Earlier we noticed a similar increase in the waiting time with the arrival rate. When we increase the average arrival rate from 25 requests/second to 45 requests/second the average and 99% number of requests in the system increase 5.8 x and 8.6 x respectively.
The following figure shows the histogram for a number of customers in the system.
We note that the probability of having a higher number of requests in the system is higher under higher arrival rates (note: how the tail grows with the increasing arrival rate)
Let us now look at how the processing time impacts the number of requests (customers) in the system. The following figure shows how the number of requests in the system varies with the increasing mean service time.
We note that when the processing time is small, both, 99% and the average number of requests in the system are small. As the mean processing time increases, the number of customers in the system increases. This is because the increase in the processing time results in an increase in the queue length. If one wishes to reduce the number of requests in the system or the queue-length, he can do it by improving the processing time of requests. For example, we reduce the mean processing time from 0.03 to 0.02, then the average number of customers in the system reduces from 2.7 to 0.5. Note that we notice this behaviour under the arrival rate the arrival rate of 25 requests/second. Under different arrival rates, the factor of improvement may vary.
The following is the histograms of a number of requests in the system. Note the impact of the processing time on the tail of the histogram.
In this article, we provided a brief introduction to discrete event simulation (DES), We discussed in detailed how to simulate a single server and presented the results under different workload conditions (i.e. arrival rates, service times). We are currently using DES simulation to model and analyze the behavior of use cases that involve micro-services, ESBs and API management. Our initial analysis has revealed a lot of interesting findings and we will share the results in our future blog posts and research papers.