Two-Stage Stochastic Staff Planning Optimization

5 min readDec 2, 2019

Uncertainty in the input data of Decision Optimization models can be managed using different techniques. This post illustrates some of these and details how to formulate and solve a two stage stochastic model.

Uncertainty

Most times, Decision Optimization is used to prescribe what is the next best action, i.e. it applies on decisions in the future.

These prescriptions are based on:

some snapshot of the current data,
some predictions of the data at the time the decisions should apply,
some formulation of the constraints and objective which are applying on the decisions.

There might be some inaccuracy on any of these 3 inputs. One might:

not have a good picture of the current business, data may be so large that not all information may be captured, and some approximations have to be done,
not have a good picture of how data would evolve up to the moment the decisions should apply,
not have a complete description of all the relations between data and decisions that should be included in the model formulation.

In this notebook, I give an example of the second case, when part of the data to be used in the model is uncertain, and I will use a Staff Planning example.

Staff Planning example

The Staff Planning example is one of basic examples used for Decision Optimization for Watson Studio. See the documention.

Imagine the manager of a restaurant or a retail shop. He needs to decide how much employee to contract to cover some attendance. Some data is perfectly known, such as the working regulations (how much time an employee can work) or the costs of different types of employees. But the demand is not known exactly. The restaurant may have more attendees depending on the weather, the day of the week, and many other inputs.

There are different ways to handle this uncertainty.

Different possible input data distributions.

The simplest way is to estimate an average demand and then use it with a deterministic model. This is what is done in the basic example and in this notebook.

Another way is to consider different possible demand scenarios, and solve what is called a stochastic model.

Two Stage Stochastic Optimization

As commented initially, the root of uncertainity is that some decisions which are impacted by some uncertain data have to be taken before the data is known. In the case of your restaurant, you need to contract some resources before you exactly know who will come to eat.

In many cases however, decisions are split between:

some decisions that needs to be taken initially, before the uncertain data is discovered,
some decisions that can be taken later, after the uncertain data is certainly known.

This is referred as a two stage stochastic model.

See this example in wikipedia on multi stage stochastic optimization.

Two Stage Stochastic Staff Planning Optimization

In the Staff Planning problem, as two types of employees are considered, it makes sense to consider that:

the fix employees have to be hired before the attendance is known, these are the first stage decisions, applyable for all scenarios,
but let’s make the assumption that the temp employes can be hired at the last minute, at a time the attendance is known, these are second stage decisions, specific to each scenario.

Properties for the two types of employees

This is the problem that will be solved in this notebook, using N different random demand scenarios which we will suppose to have the same probability of 1/N. The scenarios will be randomly generated, using a very simple method.

Decision Optimization Model

The docplex.mp python package will be used to formulate the model, and CPLEX will be used to solve it.

Create the decision variables

There are two levels of decisions variables.

The fix decisions are taken first, so will be the same for all scenarios, and hence are not indexed by scenarios.

The temp decisions are taken after the demand scenario is known, so they will be different for each scenarios, and hence they are indexed by scenarios.

For each type, decision variables are created for:

number of demployees starting to work at a given period,
number of employees working at a given period,
number of employees working at a given day
number of employess working in total

# fix_start[t] is number of fix resource starting to work at period t
fix_start = mdl.integer_var_dict(keys=periods, name="fixed_start")# temp_start[s, t] is number of temp resource starting to work on scenario s at period t
temp_start = mdl.integer_var_matrix(keys1=scenarios, keys2=periods, name="temp_start")# fix_work[t] is number of fix resource working at period t
fix_work = mdl.integer_var_dict(keys=periods, name="fixed_work")# temp_work[s, t] is number of temp resource working on scenario s at period t
temp_work = mdl.integer_var_matrix(keys1=scenarios, keys2=periods, name="temp_work")# fix_nr[d] is number of fix resource working on day d
fix_nrd = mdl.integer_var_dict(keys=days, name="fix_nrd")# temp_nr[s,d] is number of temp resource working on scenario s and day d
temp_nrd = mdl.integer_var_matrix(keys1=scenarios, keys2=days, name="temp_nrd")# fix_nr is number of fix resource working in total
fix_nr = mdl.integer_var(name="fix_nr")# temp_nr[s] is number of temp resource working in total in scenario s
temp_nr = mdl.integer_var_dict(keys=scenarios, name="temp_nr")

Constraints

Among the constraints the important business constraint is added to state that the work of fix and temp employee is covering the demand, for each period, and each scenario.

# work vs demand
for s in scenarios:    
    for t in periods:  
        demand = int(all_demands.demand[(all_demands['scenario'] == s) & (all_demands['period'] == t)])
        mdl.add( fix_work[t] + temp_work[s,t]  >= demand)

Objective

The objective is to minimize the cost:

for fix employees (this is simple to do),
for temp employees, this is done over each scenarios and taking into account the probability of the scenario.

The objective is hence the expected cost taking into account the distribution of scenarios.

fix_cost = int(all_resources.cost['fix'])
temp_cost = int(all_resources.cost['temp'])total_cost = mdl.sum( fix_cost*fix_nr + temp_cost * mdl.sum(probabilities[s] * temp_nr[s] for s in scenarios) )
n_fix_used = fix_nr
n_temp_used = mdl.sum(probabilities[s] * temp_nr[s] for s in scenarios)mdl.add_kpi(total_cost   , "Total Cost")
mdl.add_kpi(n_fix_used   , "Nb Fix Used")
mdl.add_kpi(n_temp_used   , "Nb Temp Used")mdl.minimize(total_cost)