Performance Testing with a Think Time
When evaluating the performance of a given system, it is important to perform the perform tests under realistic conditions. This is important since it allows us the understand the performance characteristics of the system in the production environments. The “think time” plays an important role when doing performance tests. It is defined as the time between the completion of one request and the start of the next request. When generating requests (using load testing tools such as JMeter), we do not normally add a think time. This, however, may not represent the users’ real behaviour (access patterns) in the system. For example, when accessing a web site the users typically wait between page requests.
Therefore, including a think time in the performance test makes the performance test more realistic as it represents users’ actual behaviour in the system (more accurately).
In most of the scenarios, the think time cannot be represented using a constant value and it is a random value which is distributed according a probability distribution function such as exponential distribution  .
The load on the system is determined by the number of requests being processed by the server. The number of requests processed by the server will depend on the following
- The number of concurrent users
- The think time.
Under a given system load, an increase in the think time will result in the number of active requests processed by the server to decrease and as a result the number of concurrent users the system can support will increase.
In this blog, I will investigate the effect of think time on the performance (i.e. throughput and latency) of a WSO2 API Manager (APIM) under different concurrency levels.
Think Time in a Queuing Process
Queueing theory provides a stochastic and probabilistic approach to investigate the operation of queues. The following figure shows a basic queueing process
The basic idea is that incoming requests are placed in a queue and serviced according to given scheduling policy (I will discuss how to use queueing theoretic modeling techniques to model and analyze the performance of systems in a separate blog. Here we simply want to understand a basic queuing process to understand how the think time is represented in a queuing a process). The waiting time of requests serviced by server (i.e. service node) depends on the following:
Arrival pattern of requests: The arrival pattern describes the distribution of time between request arrivals. The arrival pattern to a queueing system is typically described in terms of the average time between two successive arrivals or the average number of arrivals per some unit of time. The exponential distributions are typically used to model the time between the arrival of requests  and 
The arrival pattern in a queuing process represents the think time in a perform test.
Service pattern of tasks: The service pattern describes the distribution of services times. The service times of web requests closely follow long-tailed distributions. This means that there a large number of requests which have a very small service times and a small number of requests have a very long service times.
System capacity: This is the maximum number of request allowed to enter the system. This quantity is often referred to as the buffer size and this can be either bounded or unbounded.
Service discipline: Defines how the requests are served (e.g. First-come-first-served (FCFS), Last-come-first-served (LCLS) and shortest remaining processing time (SRPT), Processor-Sharing (PS))
Poisson Arrival Pattern => Exponential Inter-arrival Times
As pointed out, the exponential distribution can be used to represent the time between arrivals   and therefore, when we create our test plan (in JMeter etc) we need to configure it such that it generates think time using an exponential distribution.
Let us now consider the case where the distribution of time between two successive arrivals following an exponential distribution. The probability density function of the exponential distribution is given by
lambda is the mean (average) arrival rate per unit time.
(1/lambda) is the average inter-arrival time (time between two successive requests) between two requests.
For example, if the average arrival rate = 10000 requests/second, then the average time between two successive arrivals = (1/10000) seconds
There is an important relationship between the exponential and Poisson distributions.
If the time between successive arrival has an exponential distribution the number of arrivals (i.e. requests) in the time interval (0,t] has a Poisson distribution.
The poisson distribution is a discrete distribution which describes the probability of observing k events in a given interval of time while the exponential distribution is an continuous distribution which describes probability of time between successive arrivals. The probability mass function of the Poisson distribution is given by
Note that lambda in the above equation also represents arrival rate per unit time.
JMeter: think time
Apache JMeter is a popular performance testing tool which allows us to test the performance of systems and analyze the performance under different work-load scenarios. When we create a test plan in Jmeter, it does not include a think time between the requests by default. A think time to the JMeter test plan can be added by adding a timer to the JMeter test plan as shown below.
There are different types of timers available in JMeter (e.g. Poisson Timer, Uniform Random Timer, etc). As pointed out, we would like to generate the think times from an exponential distribution as it represents the users’ real access patterns (more closely).
The following figure shows the Poisson Timer in JMeter which generates the think times according to an exponential distribution (in the previous section, I have explained how exponential distribution relates to the Poisson distribution).
The lambda in the above figure is the average time between the completion of one request and the start of the next request (i.e. think time) which is generated using an exponential distribution.
Note that the lambda parameter in the exponential distribution (refer to the previous section) represents the average arrival rate while 1/lambda represents the average time between between two successive arrivals.
It is worth mentioning that Lambda in JMeter (poisson timer) is not the same as lambda in the exponential/Poisson distribution. The relationship between these values two are as follows:
Lambda in JMeter Poisson Timer= 1/Lambda in exponential distribution
When we increase JMeter lambda, it will result in the arrival rate of requests to decrease.
WSO2 API Manager
WSO2 API Manager is a complete solution for designing and publishing APIs, creating and managing a developer community, and for securing and routing API traffic in a scalable way. WSO2 API manager has 4 main components 1) API Gateway, 2) API Key Manager, 3) API publisher and 4) API Store. API Gateway secures, protects, manages, and scales API calls. The Gateway communicates with the Key Manager to check the validity of tokens, subscriptions and API invocations. The development and management of APIs are done using the API publisher (which is a Web interface). The API store provides API publisher to host and advertise their APIs and for API consumers to self register, discover, evaluate, subscribe to and use secured, protected, authenticated APIs. A detailed description of these components can be found here.
Workload Generation and Simulation Setup
The performance results that I present in this blog were obtained by running APIM 2.0 on an 8GB/4 core VM (stand-alone set up). Note that I have used multiple JVM deployment model where I deploy key manager and gateway on separate JVMs. We use JMeter to generate HTTP requests. Each performance test is run with an initial warm-up period of 10 min. In each performance test we include a Poisson timer which adds a think time according to an exponential distribution.
Performance results: Latency
In a typical client-server (e.g. HTTP server/client) model, the latency (of a request) is the total round-trip time, i.e. difference in time between the time at which the response is received and the time at which the request has started
The following figures illustrates the behaviour of latency percentiles under different concurrency levels for different think times (note: that Lambda in the figures below = Lambda parameter in the Poisson Timer in the JMeter. As pointed out this represents the average think time between two successive requests when the think time is generated from a exponential distribution.).
The lambda values used in the performance tests are 5 ms, 25 ms and 50 ms (note: there is no particular reason for selecting these values. One should be able to derive these values from the server logs).
Note that the latency is measured in ms
Under a given concurrency level (e.g. 75), the latency percentile values increase with decreasing lambda (which indicates a degradation in the performance). As lambda decreases, the think time between two successive requests decreases. This results in the arrival rate of tasks into the system to increase. As a result we get an increase in the latency (note: higher arrival rate results in higher contention and more GC)
It is worth noting that if the lambda is high, then the latency of the system can be low even if the number of concurrent users are high.
For example, let’s assume that there is a quality of service requirement which states that 90% percentile latency of the system should be < 31 ms. If lambda is 5 ms, to satisfy this requirement, the number of concurrent users accessing the system cannot exceed 25. Note that 90% percentile latency when concurrency = 25 and lambda = 5 ms, is =31 ms.
On the hand, if the lambda is 50 ms, then system can support 75 concurrent users and still satisfy the QOS requirement. Note that 90% percentile latency when concurrency = 75 and lambda = 50 ms, is =26 ms (< 31 1ms).
The latency distribution (probability density function of latencies)
The following figures illustrate the probability density function (PDF) of latencies under 2 different concurrency levels (i.e. 75 and 100). The PDF of latencies describes the relative likelihood (the probability) for the random variable (i.e. latency) to take on a given value.
As pointed out, Lambda in above figures represents Lambda parameter in the Poisson Timer (in the JMeter).
Note that as Lambda decreases, we see a shift in the probability density function to the right direction. The reason for this behaviour is as follows:
As lambda decreases, the think time decreases and as a result the arrival rate of requests into the system increases. This results in an overall increase in the latency values which results in the distribution to shift to the right direction.
Performance results: throughput
The throughput is the number of requests/tasks/messages a system/application can process in a unit time.
The following figure shows the throughput of WSO2 APIM under different levels of concurrency levels and different think times.
NOTE: The units of throughput = requests/second (in the graph below)
Note that the above results have been obtained under a stand-alone setup of APIM. Under distributed deployment scenarios (recommended setup), the system will perform significantly better.
We note that under a given concurrency level (e.g. 75), the throughput of the system increases with decreasing the lambda. As pointed out, the lower lambda values implies, higher number of requests arriving at the system and therefore, the system will process relatively large number of tasks under low lambda values and as a result we notice an improvement in the throughput (note: this assumes that the system is functioning in the steady state which is true for the results presented above)
In this blog, I have discussed the importance of including a think time in load testing. We have looked at the parameters of a basic queuing process and we noted that arrival pattern in queuing process is related to the think time. The think time (in most of cases) is a random variable following a particular probability distribution function. The most commonly used distribution for generating think time is the exponential distribution. In this blog, we discussed how to configure the JMeter to generate think time using an exponential distribution. We then looked at the effect of think time on the performance of WSO2 APIM under different concurrency levels.