Benchmark Design: Choosing Between Open- and Closed-Model Tests

Published in

IBM Data Science in Practice

7 min readNov 17, 2021

a hallway with tiles on the walls as well as the floor with a stripe in the middle of the floor. the hallway turns to the left, and there is a sign on the right hand side wall as the hall begins to turn — Photo by Matt Palmer on Unsplash

When designing a benchmark, one of the most important aspects to determine is how to best represent the use case with a load test tool. The two most common methods derive their names from queueing theory: open model and closed model.

This blog post will answer these two questions about designing benchmarks:

Simply put and with commonplace examples, what are open- and closed-model use cases?
Are there other considerations to choosing between open- and closed-model tests?

Closed-Model Tests

Closed-model tests are used to evaluate system throughput, or to evaluate the length of time it takes to process a fixed size or batch workload.

For example, a closed-model test would be used to answer the following kinds of questions:

What is the throughput in a data-entry system given some number of users? Imagine a room full of users, each performing data entry. They get data to process, enter it as fast as they can, and save the data to the server.
Given an overnight batch processing window of three hours, how large can a batch be before it doesn’t finish before people show up in the morning to use the data? Imagine a batch system, where the variables are the number and complexity of items in a batch queue and the number of threads of execution. Each thread in the batch system selects a job from the batch queue and processes it as fast as it can. Another job isn’t selected by a thread until processing is completed on the previous job.

The processing in a closed-model system is synchronous — as the number of threads is increased, the result cascades: the number of concurrent requests increases, resulting in an increase in server utilization, and, in turn, longer latencies. The throughput is ultimately limited by the speed of the server, because the latency has increased and each thread can’t start its next request until control is relinquished by the server.

One common type of problem that calls for a closed-model test design is the Coordinated Omission problem, which I describe here. When the workload calls for a closed-model test design, be aware that long queueing delays in the system that is being tested will only affect the latencies for requests that were already submitted and are being blocking behind the request that is waiting. If the problem was a transient issue, it could easily result in a misleading outcome where most requests get acceptable latencies, and only a few are affected by the blocking event. For such a Coordinated Omission problem, the recommendations are to:

Look at all results to ensure blocking is not occurring.
Consider using an open-model test design, even if the workload calls for closed model, to evaluate blocking and application scalability.

The queueing diagram for closed model for a single service center is shown below. In this scenario, jobs to be processed arrive at the left, wait in a queue, are processed, and when processing is completed, the job is routed back to the server to process the next piece of data.

two images with a set of lines and then a rectangle with the word “processing” in it. the one on the left is labeled “closed model” and has a line coming from the rectangle with the word “processing” in it and goes to the set of lines. the one on the right is labeled “open model” and has an arrow pointing into the set of lines and an arrow pointing out of the rectangle labeled “processing” — Closed- and Open-Model Queueing Diagrams

Open Model Tests

Open-model tests are used to evaluate latencies and resource consumption as a function of request arrival rate. In open model tests, requests are independent and identically distributed; that is, requests are not related to one another (they are independent of one another), and the statistical distribution of time between requests doesn’t change over time (the requests are identically distributed).

Open-model tests typically have an exponentially distributed inter-arrival distribution. Even though requests arrive randomly, a characteristic of an exponential inter-arrival distribution is that the request arrivals appear in bursts, in turn causing a higher level of contention for shared resources in the system under test, such as CPU or buffers. Increased contention translates to longer, and ultimately more realistic latencies. In contrast, tests with a uniform request inter-arrival distribution are less realistic, producing less contention and lower latencies.

a sparkler firework in the dark shooting sparks off in all directions — Photo by Frame Harirak on Unsplash

For example, imagine a call center with a room full of people answering support telephone calls. They wait for a call, and when one is routed to them, they use the computer to solve the customer’s issue. Once the call is completed, they go back and wait for another call. In this scenario, an open model test would be used to answer the following kinds of questions:

What is the vertical scalability of this application? In other words, if the arrival rate is doubled and the system capacity is doubled by doubling processing resources on the server, what is the effect on latency?
What is the horizontal scalability of this application? In other words, if the arrival rate is doubled and the system capacity is doubled by doubling the number of servers, what is the effect of those changes on latency?
Can the system handle a transient spike in requests? If the arrival rate is temporarily doubled but system capacity is held constant, is the system able to handle the transient spike in requests without significant blocking and without error?
How are system resources affected by arrival rate? If the arrival rate is doubled but system capacity is held constant, what is the rate of change of the utilization or consumption of the various system resources? For example, the CPU used per request could increase as a function of load and concurrency, depending on the algorithm used to manage contention. Calculate this using the formula, CPU_per_request = average_CPU_utilization * seconds / number_of_requests, where: 1) average_CPU_utilization is the system metric sampled while the system under test is at steady state, 2) seconds is the duration of the sampling interval, and 3) number_of_requests is the number of requests that were served during the sampling interval.

The processing in an open-model system is asynchronous — as the number of threads is increased, the requests are handled in parallel and request latency increases as the threads contend and wait for shared resources to become available.

The queueing diagram for open model for a single service center is shown above. In this scenario, jobs to be processed arrive at the left, wait in a queue, are processed, and when processing is completed, the exits the system on the right.

Other Considerations in Benchmark Design

a set of scrabble tiles that spell out different words in different lines. the first line spells “refine”, the second line spells “pause”, the third line spells “observe”, the fourth line spells “consider” the fifth line spells “repeat” — Photo by Brett Jordan on Unsplash

The following are other factors to take into consideration while designing a benchmark test:

Simplicity in test design and expediency in test development: Using some test tools, the simplest script uses closed model. For example, in Apache JMeter, the default thread group uses closed model. This might be worth using in some cases where the most accurate representation of the workload would be open model; for example, a quick and dirty test can be used evaluate the health of a system, and the test wouldn’t need to be perfect.
Open-model request inter-arrival distribution: Some test tools provide a mechanism to sustain a given request arrival rate, such as Apache JMeter Arrival Thread Group or K6 Constant Arrival Rate executor, but without regard for request inter-arrival distribution — they simply submit as many requests as required to sustain the desired arrival rate. These are open-model tests in that they submit requests independently of whether the system under test is read to process them, but the requests are submitted using a uniform, rather than exponential inter-arrival distribution.

Summary

The key differentiators between closed- and open-model tests are as follows: with

closed-model tests: requests are submitted only when the system has completed processing the previous request and is ready to process the next request
open-model tests: the arrival of requests is independent of the capacity of the system processing the request. This manifests as a difference in the way latency is measured.

Both closed- and open-model tests measure latency from when a request is submitted to when it completes, but in closed-model tests, a request is only submitted when the system under test is ready to process it, while in open-model tests, requests are submitted as soon as they arrive as discrete events to the system.

a riverine body of water at sunset with an open bascule bridge. there is a brightly lit building with a spire to the left in the background, and a building in the center farther back with an American flag flying above it. — Photo by Alex Shutin on Unsplash

When designing benchmark tests, it’s important to keep these differences outlined above in mind. While I discussed Apache JMeter and K6 in this post, these and other tools for load testing are listed here. To learn more about load testing, see this excellent overview here. I hope this blog helped clarify your own benchmark design process and happy testing!

Benchmark Design: Choosing Between Open- and Closed-Model Tests

Closed-Model Tests

Open Model Tests

Other Considerations in Benchmark Design

Summary

Written by Harold Dubnow