Use simulations to optimize customer wait time, systems load, and cost
Let’s say your infrastructure is a fleet of M identical machines, each processing a single task at a time. Each new customer is placed in a queue waiting for a machine to become idle. You know how long each processing task takes to complete, but it’s not a fixed number — rather, there’s a minimum duration, and a maximum duration, and some most likely duration. You have a rough idea what the customer influx (customers / second) will be. How many machines do you need to process all these customers — what is the value of M?
Or, you have a fixed budget that allows you to build exactly M machines. How many customers per second can the fleet handle?
Or perhaps each customer needs to be processed through a succession of tasks, each executed on a different cluster, each cluster with different processing times — but all customers need to be processed through the whole succession of tasks. You need to answer the same questions as above.
Or perhaps the different processing tasks, each with a different processing time, compete for resources on the same machine cluster — so tasks for each customer are looping back to the same cluster, until each customer is processed.
There may be resources, such as energy or materials, that are used to complete the tasks, and you may want to optimize the process so the resources are spent in the best way.
And finally, there may be costs associated with running these machines, but also costs associated with customers waiting too long — and you may want to minimize the overall cost.
How to find the right size of infrastructure that makes sure customers are processed quickly, while optimizing resource usage and minimizing costs? Or, given a certain infrastructure size, what are the best outcomes in terms of processing time, cost, resources? In a trivial setup, you could do a ballpark estimate, but if things get complex enough (imagine multiple different tasks competing for machines on the same cluster) it may not be obvious what the best estimate is.
One solution that almost always works for all of the above is simulation. If you can characterize the entire workflow precisely enough, you could simulate the whole thing in software, let the simulation run forward in time, collect the results, and analyze them.
Advantages
If the infrastructure you’re planning to build is big enough, it’s almost always cheaper and faster to run a simulation. It may even be easier to simulate than to run a scaled-down model of the infrastructure.
Since the simulation is cheap, you could run it many times, changing the parameters every time, looking for ways to optimize the outcome. You could even do a systematic search in the parameter space of the simulation to make sure you find the best outcome possible — in terms of cost, time, etc.
Caveats
A simulation is only as good as the data you put in it. Make sure you characterize the infrastructure in detail. It is important to use real-world data in your simulation.
For example: how do you know how long each task is going to take? You could build one machine of each type, run hundreds of tasks, and collect data about task duration. It is likely that you will get some kind of time distribution there, which you will then use in your model.
Some preparation is mandatory. Do not just make up the data, or else the simulation will not be accurate.
Our example
Please refer to the first diagram, the one at the top of this text. We are simulating the simplest example: there is only one cluster, and each customer only generates one task that needs to be processed.
We are not discussing cost or resource optimization in this article. Cost and resource consumption are basically yet more sets of variables (cost per customer, cost per machine, etc) that you will need to track and summarize in your model. As the simulation moves forward in time, keep track of costs, resources, and so on. To keep the example simple and clear, we will not take cost and resources (other than various times and durations) into account here.
We will assume the size of the machine fleet is fixed: M = 100 machines.
The processing time per customer is around 2 minutes, but it is not a fixed duration — rather, it varies, with the shortest time being 30 seconds and the longest 4 minutes. The overall distribution is triangular.
We suspect the influx of customers is less than 1 customer / second, but not much less. There may be surges of customers where the average rate is greatly exceeded for a short time.
We are trying to find out what is the highest rate of customer arrivals we can handle, given all of the above.
Let’s generate the distribution of processing times, and the distribution of customer arrivals in time.
import pandas as pd
import numpy as np
from scipy import stats as st
from matplotlib import pyplot as plt
from matplotlib import ticker
import seaborn as sns
sns.set_theme()
sns_col = sns.color_palette()######## Customer arrival parameters ######### duration of initial wave in seconds
cust_peak_length = 3600
cust_peak_center = cust_peak_length // 2
# number of customers in the initial wave
cust_peak_count = 5000
# val=3 makes sure the edges of the peak are near zero
cust_peak_sigmas = 3cust_peak_dist = st.truncnorm.rvs(
-cust_peak_sigmas,
cust_peak_sigmas,
loc=cust_peak_center,
scale=cust_peak_center / cust_peak_sigmas,
size=cust_peak_count,
)# steady state inflow, customers / second
cust_flow_rate = 0.8
cust_per_min = 60 * cust_flow_rate
cust_per_hour = 60 * cust_per_min
# steady state begin moment
cust_flow_start = cust_peak_center
# steady state end moment
cust_flow_end = cust_flow_start * 10
# total number of customers in the steady state flow
cust_flow_total = int((cust_flow_end - cust_flow_start) * cust_flow_rate)# Generate customer arrival times distribution
cust_flow_dist = np.random.uniform(cust_flow_start, cust_flow_end, size=cust_flow_total)
cust_dist = np.concatenate((cust_peak_dist, cust_flow_dist)).astype(int)######## Infrastructure parameters ######### number of machines
M = 100# processing time central value in seconds
proc_time_center = 120
# processing time minimum
proc_time_min = 30
# processing time maximum
proc_time_max = 240# Generate pre-ordained processing times
proc_time_dist = np.random.triangular(proc_time_min, proc_time_center, proc_time_max, size=cust_dist.shape[0])
Let’s visualize the distributions:
fig, ax = plt.subplots(1, 2, figsize=(16, 5))
label_format = '{:,.2f}'
bins = 50ax[0].hist(proc_time_dist, bins=50, color=sns_col[0])ax[0].set_title("Distribution of processing durations", fontsize=18)
ax[0].set_xlabel("durations of tasks (sec)")
ax[0].set_ylabel("count of tasks")ax[1].hist(cust_dist, bins=50, color=sns_col[1])
ax1_bin_width = cust_flow_end / binsax1_xticks_loc = ax[1].get_xticks().tolist()
ax[1].xaxis.set_major_locator(ticker.FixedLocator(ax1_xticks_loc))
ax[1].set_xticklabels([label_format.format(x / 3600) for x in ax1_xticks_loc])ax1_yticks_loc = ax[1].get_yticks().tolist()
ax[1].yaxis.set_major_locator(ticker.FixedLocator(ax1_yticks_loc))
ax[1].set_yticklabels([label_format.format(y / ax1_bin_width) for y in ax1_yticks_loc])ax[1].set_title("Rate of customer arrivals", fontsize=18)
ax[1].set_xlabel("time (hours)")
ax[1].set_ylabel("customers / second")
plt.show()
As we said, processing times are on a triangular distribution with the peak at 2 minutes.
The customer arrival rate we’re going to try first is 0.8 customers / second, or 48 customers / minute. But initially there will be a surge of 5000 customers, all arriving within 1 hour, on a truncated normal distribution.
We’ve defined the model parameters. Let’s take a look at the code.
The actual simulation code
The simulation is performed entirely in this function:
def avg_wait_time(mn, cust_dist, proc_time_dist, DEBUG=False):
# the state of each machine
# 0 = idle
# 1 = busy
machine = np.full((mn,), 0, dtype=int)
# total number of customers
c = cust_dist.shape[0]# status column:
# -1 = not here yet
# 0 = waiting customer
# 1 = processing customer
# 2 = customer is done
state_dict = {
"pt": proc_time_dist, # pre-destined processing times
"status": np.full((c,), -1, dtype=int),
"machine": np.full((c,), -1, dtype=int), # machine ID assigned to customer
"time_went_in": cust_dist, # arrival time
"time_went_out": np.zeros((c,), dtype=int), # task complete time
"time_wait": np.zeros((c,), dtype=int), # length of wait
}
state = pd.DataFrame(state_dict)
# average wait times of currently waiting customers
# updated every time increment
proc_history = {
'curr_wait_avg': [],
'busy_machines': [],
'waiting_cust': []
}t = 0
while True:
#### update history
# current average wait time for waiting and processing customers
cwt = state[(state['status'] == 0) | (state['status'] == 1)]['time_wait'].mean()
cwt = 0 if cwt != cwt else cwt
proc_history['curr_wait_avg'] += [cwt]
proc_history['busy_machines'] += [machine.sum()]
proc_history['waiting_cust'] += [state[(state['status'] == 0) | (state['status'] == 1)].shape[0]]
if DEBUG and t % 100 == 0:
print(t, cwt, machine.sum(), proc_history['waiting_cust'][-1])
# clock tick for waiting customers
dfmask = (state['status'] == 0) | (state['status'] == 1)
state.loc[dfmask, 'time_wait'] += 1
# customers have just arrived, put them in the queue
dfmask = (state["status"] == -1) & (t >= state["time_went_in"])
state.loc[dfmask, "status"] = 0
# processing has just completed for these customers
# take them out of the machine pool
dfmask = (state["status"] == 1) & (t - state["time_went_in"] >= state["pt"])
state.loc[dfmask, "status"] = 2
state.loc[dfmask, "time_went_out"] = t
machines_go_idle = list(state.loc[dfmask, 'machine'].values)
machine[machines_go_idle] = 0
state.loc[dfmask, "machine"] = -1
# find any idle machines
# if there are any, find waiting customers for them
idle_machines = list(np.where(machine == 0)[0])
dfmask = (state['status'] == 0)
waiting_customers = dfmask.loc[dfmask == True].index.to_list()
if len(idle_machines) > 0 and len(waiting_customers) > 0:
num_changes = min(len(idle_machines), len(waiting_customers))
for i in range(0, num_changes):
cust = waiting_customers[i]
mach = idle_machines[i]
machine[mach] = 1
state.at[cust, "status"] = 1
state.at[cust, "machine"] = mach
if np.all((machine == 0)) and (state['status'] == 2).all():
break
t += 1
return state, proc_history
The parameters of the function are:
- the number of machines
- the distribution of customer arrival times
- the distribution of processing times
The state data frame keeps track of the state of the simulation. Each line is a customer. The various columns keep track of data assigned to each customer:
- their processing time (since it’s random, it can be pre-assigned)
- their status: not arrived yet, arrived but not processing, processing, done
- the machine assigned to the customer
- arrival time, end time, and processing time (the last one is redundant but convenient)
The state data frame will be carried through the whole simulation, and will be updated at each step, depending on outcomes per customer.
proc_history contains several lists where we keep track of various statistics at each step of the simulation — think of it as a log:
- the average wait time at each step of the simulation
- the number of busy machines at each step
- the number of waiting customers at each step
Every time the simulation iterates forward in time, new values are appended to the lists in proc_history. This variable keeps growing in time.
The algorithm:
- we start at t = 0 and iterate forward in time
- at each step we update proc_history
- if a customer is waiting, we update their wait time
- if a customer has just arrived, we change their status to “waiting”
- if a customer is done processing, we update their stats and we update the machine states
- if we find idle machines, and there are customers who are waiting, we start processing them
- we increment time by 1 second at each step
- we stop the simulation if all machines are idle and all customers are done processing
The function returns the state data frame and the proc_history lists. This is typical for a simulation — you may want to return at the end a detailed final state, along with a history of important variables throughout the whole run.
Vector math
You may have noticed there is only one for-loop in that whole function. This is by design.
In fact, there are many loops hidden in the function (most code blocks in there are effectively loops) and there is a way to write that function with many loops in it — iterate through all rows in the data frame and so on. That would be extremely slow. The code as shown takes about 1 minute to run on my laptop. With for-loops it would take at least 10x longer, and maybe much longer than that.
You have to use this coding style — the vectorized form of Numpy and Pandas code — whenever possible, or else the simulation will be slow. In fact, that lone for-loop can probably be eliminated as well.
The code may be hard to read if you’re not familiar with the vector form, but the performance boost is definitely worth it.
Results
Let’s call the function and collect the results.
%%time
final_state, proc_history = avg_wait_time(M, cust_dist, proc_time_dist)
The total run time on my laptop was 1 minute and 4 seconds.
There are many ways we could explore the results. Let’s visualize a few cross-sections of that data:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 9))
plt.subplots_adjust(hspace = 0.4)ax1.plot(np.convolve(proc_history['curr_wait_avg'], np.ones(100), 'valid') / 100, color=sns_col[0])
ax1.set_title("Current average wait duration for customers", fontsize=18)
ax1.set_xlabel("time (hours)")
ax1.set_ylabel("wait duration (sec)")
ax1_xticks_loc = ax1.get_xticks().tolist()
ax1.xaxis.set_major_locator(ticker.FixedLocator(ax1_xticks_loc))
ax1.set_xticklabels([label_format.format(x / 3600) for x in ax1_xticks_loc])ax2.plot(proc_history['busy_machines'], color=sns_col[1])
ax2.set_title("Current number of busy machines", fontsize=18)
ax2.set_xlabel("time (hours)")
ax2.set_ylabel("busy machines")
ax2_xticks_loc = ax2.get_xticks().tolist()
ax2.xaxis.set_major_locator(ticker.FixedLocator(ax2_xticks_loc))
ax2.set_xticklabels([label_format.format(x / 3600) for x in ax2_xticks_loc])ax3.plot(proc_history['waiting_cust'], color=sns_col[2])
ax3.set_title("Current number of waiting customers", fontsize=18)
ax3.set_xlabel("time (hours)")
ax3.set_ylabel("waiting customers")
ax3_xticks_loc = ax3.get_xticks().tolist()
ax3.xaxis.set_major_locator(ticker.FixedLocator(ax3_xticks_loc))
ax3.set_xticklabels([label_format.format(x / 3600) for x in ax3_xticks_loc])ax4.hist(final_state['time_wait'], bins=50, color=sns_col[3])
ax4.set_title("Final distribution of wait durations", fontsize=18)
ax4.set_xlabel("wait duration (sec)")
ax4.set_ylabel("count")plt.savefig("user_load.png", bbox_inches="tight")
plt.show()
Top-left, the blue line, is the average wait time for all “currently waiting” customers at various moments in time. During the initial surge the wait time is large, but then it decreases close to the average processing time.
Top-right, the orange line, is the number of busy machines. They are mostly busy, most of the time.
Bottom-left, the green line, is the customer queue length. This is in line with what you would expect from the arrival rates.
Bottom-right is the final distribution of wait times across the whole population. Most customers only wait 1 … 4 minutes, per the processing time, but there are a few outliers who wait far longer. Let’s take a closer look at just the outliers:
fig, ax = plt.subplots(figsize=(7.22, 4))
ax.hist(
final_state[final_state["time_wait"] > proc_time_max]["time_wait"],
bins=50,
color=sns_col[3],
)
ax.set_title("Distribution of wait durations for outliers", fontsize=18)
ax.set_xlabel("wait duration (sec)")
ax.set_ylabel("count")
plt.savefig("outlier_wait.png", bbox_inches="tight")
plt.show()
These are mostly customers caught in the initial surge. Some of them wait a fairly long time, over 30 minutes in some cases, but they’re not very numerous.
Explore different scenarios
Let’s run the simulation again. This time we’re increasing the customer influx from 0.8 to 0.9 customers / second. Everything else stays the same.
Apparently, not much has changed. But there’s a warning sign: the current wait times have become spiky. There are moments when, due to random fluctuations, there are significant increases in the wait time. That could indicate we’re close to a threshold.
Let’s increase the customer influx to 1 customer / second.
You’re basically looking at a disaster. All machines are running flat out throughout the simulation, and yet that’s not enough: there are many customers stuck in waiting, and the wait times are bad. A look at the outliers confirms it:
Hundreds of customers wait up to 30 minutes, and there are some that wait several hours. Clearly, we’ve hit a threshold in terms of what this machine cluster can do.
One more thing to try: let’s keep the constant customer influx at 1 customer / second, while eliminating the initial surge. In other words, can the cluster handle this rate of arrivals if we assume there are no large fluctuations?
The answer is no. The queue length just keeps increasing. The initial surge played only a small role in the final outcome. The cluster is simply not capable of handling this influx.
Further analysis
What we’ve done so far is just visualization, and we will leave it at that. This article is intended to be an introduction to basic concepts related to simulations.
But in a real-life scenario, you may have several variables involved: multiple clusters with different sizes, different processing times, various resources being consumed, etc. You may have to take some outcome as the optimization metric (say: average wait times), and systematically explore the parameter space (like GridSearchCV in scikit-learn) looking for the best outcome.
It’s not unusual to run such a simulation many, many times, looking for the best combination of parameters. If you’ve done hyperparameter tuning for a machine learning model, this is conceptually very similar — you’re looking for the best outcome in a parameter space.
If that sounds very open-ended, that’s because it is. In some ways, you have more freedom here than you do with a machine learning model, but some of the same general techniques still apply.
In terms of time (task durations, customer wait time), 100 machines and 5000 customers are the same as 10 machines and 500 customers — but the former model will give results that are less dependent on fluctuations. The central limit theorem works to your advantage. Always run the biggest simulation you could afford, or run a small simulation multiple times, to make sure fluctuations don’t impact the results.
Optimizations
We’ve vectorized most of the code, but Numpy and Pandas are single-threaded, and that’s a fundamental limit with these libraries. For very large simulations, you may need to look into distributed processing libraries such as Apache Spark.
The outermost loop of the simulation is sequential by default, you cannot parallelize it because it reflects the flow of time and the moment t depends on all moments t-1, t-2, … that precede it. But you can definitely parallelize the inner blocks of code.
For an even greater speed boost, drop the data frame, use arrays everywhere and use a GPU-accelerated Numpy equivalent such as Cupy. For small simulations this is not worth the trouble, and in fact it might be slower. For large simulations the speed boost is very large — typically at least one order of magnitude, and it only gets larger as the simulation gets bigger.
Of course, you could write distributed code that runs across multiple GPUs for the ultimate speed boost. But code complexity will be very substantial.
Final words
A notebook with all the code is available here:
All images were created by the author.