Unlocking Warehouse Efficiency: A Data-Driven Journey through Operational Statistics — Pt 4: Coefficient Estimation with Constrained Optimization

9 min readFeb 24, 2024

This is part of a 4 part series on Operational Statistics :

Pt 1: Problem Background
Pt 2: Coefficient Estimation and Sample Size
Pt 3: Coefficient Estimation and Information Density
Pt 4: Coefficient Estimation with Constrained Optimization

Problem Background

Over the last 3 articles we’ve set up an intuition around various factors that might impact coefficient estimation of our Linear Regression model. We’ve learned that, in fact, there is an ideal number of training samples relative to the number of categories we are estimating.

While the above theoretical foundation is important to have an intuition over, they offer limited pathways to actually helping your business problem. Knowing more information density would help a little is great, but basically only gets me to “I can or cannot do this task for you” when speaking to business stakeholders. Knowing the relationship with category size and number of samples required is more actionable. I can group my categories into fewer bins to match the amount of data I have available. Unfortunately, this is somewhat of a tradeoff — accuracy vs granularity.

What if I told you you could get coefficient estimation improvements for free just with some smart modeling choices? As a data scientist, I love this sort of stuff — simply making a smarter algorithm solves your problem.

Constraint Optimization

In the last 3 articles, I leveraged an out of the box scikit-learn linear regression model to calculate my coefficients. This strategy is super easy to do. I just show up with the dataset and let the model figure it out by itself. It’s fewer than 10 lines of code to solve the entire problem.

The thing is, you’re actually letting the model solve the problem by itself, and you might have some valuable domain expertise to add to the solution! If you were to work together, you probably could make a better model. Connecting the dots a bit, I wondered, could I achieve the same performance as a 200:1 sample to category ratio I found in article 2 without actually having that much data? How much data could I skimp out on?

Enter constraint optimization. Constraint optimization is a pretty old idea in data science. Effectively, you set some rules on how your model weights can behave when fitting for a solution. Several popular, modern machine learning libraries allow for this. For example, in both XGBoost and LightGBM, you can set monotone-constraints on particular features, which more or less requires that as a feature value increases or decreases by itself, the prediction value does the same. Another type of constraint is to limit the range of values your coefficients can take. Scikit-learn’s LinearRegression model actually takes a positive parameter, which requires that the model coefficients are > 0 (could be very applicable here!).

Depending on what software framework you set up, the constraints you create can be a lot more creative. I’m going to go into two types of constraints I used in my real-world problem and then how I simulated them in my experiments.

The Constraints

To give some grounding, I was trying to estimate how long it would take to Wash a particular item type in our warehouse. We had split up our items into the following categories: {small, medium, large, x-large}. I figured with this data I could make two types of assumtions:

The upper and lower bounds for how long each category could take
The relative time a category would take vs another

To make this concrete, these were the constraints I set up:

min_time: 0 minutes
max_time: 15 minutes
small_time < medium_time < large_time < x-large_time

Kind of no-brainer constraints, but keep in mind that a normal machine learning model has no idea about any of this stuff. It might be able to figure it out, but it might need a bunch more examples than you’d rather give it to learn the ideas you take for granted.

Pyomo and Constrained Linear Regression

So how do we program a constrained linear regression problem? I’m sure there are lots of options to choose from. I like working with pyomo. Whenever I’m working with problems that require constraints, I find I can set up my problem pretty quickly with it.

Since not everybody is as familiar with pyomo (it’s not quite as standard in a data scientist’s toolkit), I’m going to spend this section breaking down the following code block I used for modeling:

from pyomo.environ import *

# Sample DataSet
sample = df.sample(n_samples)

# Fit Model
X = sample.drop("total_time", axis = 1)
y = sample["total_time"]

# Create a Pyomo Concrete Model
model = ConcreteModel()

# Get the column names from X
feature_names = X.columns.tolist()

# Define the coefficients as variables
model.alpha = Var(feature_names, within=Reals)

# Define the objective function (least squares)
model.obj = Objective(expr=sum((y - X.dot([model.alpha[feature] for feature in feature_names]))**2))

# Define constraints
model.constraints = ConstraintList()

for feature in feature_names:
    model.constraints.add(model.alpha[feature] >= 0)
    model.constraints.add(model.alpha[feature] <= 15)

for constraint in constraints:
    if constraint[1] == ">=":
        model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] >= 0)
    else:
        model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] <= 0)
        
# Create a solver
solver = SolverFactory('ipopt')

# Solve the optimization problem
solver.solve(model)

The first couple lines might be familiar to you from previous articles. We’re just getting the training data set up.

Some of you might cringe at the import * at the top (I normally would), but as I’ve corroborated with multiple other mathemeticians, this is pretty common practice for a library that focuses on mathematical modeling and won’t kill you.

# Create a Pyomo Concrete Model
model = ConcreteModel()

We start of defining a concrete model. pyomo let’s you specify two different types of models, Abstract and Concrete. I’ve personally only ever used concrete models, which make more sense to me coming from a data science/python background. Given the problem we are solving we could in theory use either, but to simplify, let’s stick with the latter.

Next we set up all of the weights or coefficients we are trying to learn:

# Get the column names from X
feature_names = X.columns.tolist()

# Define the coefficients as variables
model.alpha = Var(feature_names, within=Reals)

Each alpha would be the time to complete each category that we are trying to estimate. In pyomo we store this as a variable and set a constraint that the values we optimize to can only be real numbers.

Next we set up our model objective:

# Define the objective function (least squares)
model.obj = Objective(expr=sum(
      (y - X.dot([model.alpha[feature] for feature in feature_names]))**2)
)

Some of you will recognize this formula as just the ordinary least squares (OLS) formula. Here we want to minimize the mean squared error of the model as the model learns better values for each 𝛼.

Now for the interesting part, setting our constraints.

# Define constraints
model.constraints = ConstraintList()

# Upper and Lower Bound Constraints
for feature in feature_names:
    model.constraints.add(model.alpha[feature] >= 0)
    model.constraints.add(model.alpha[feature] <= 15)

# Relative Constraints
for constraint in constraints:
    if constraint[1] == ">=":
        model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] >= 0)
    else:
        model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] <= 0)

Here I set the two types of constraints I described earlier in the article.

In the first for loop, I require that every category_type takes somewhere within 0–15 minutes to Wash (my simulated data is actually 0–10 minutes). If I had stronger opinions on specific categories, I could set different constraints on each one. For simplicity, I consider them all the same here.

In the second for loop I set up my relative constraints (e.g. medium_time < large_time). I have not shown you it yet, but constraints is a set containing tuples of relative constraints between variables in the following format:

where for each tuple, indices 0 and 2 are the variable names and index 1 is the equality direction.

Note how simple in pyomo it is to require how variables in the model relate to one another.

Lastly, we define a solver and then tell the model to go at it.

# Create a solver
solver = SolverFactory('ipopt')

# Solve the optimization problem
solver.solve(model)

In this case we leverage ‘ipopt’, which is a nonlinear program solver. Pyomo requires that you specify an engine to run your optimization. Ipopt and glpk are some standard options here. Since my objective function takes a squared term (i.e. a non-linear expression), I chose ipopt instead of glpk, which is a linear solver.

Simulation Code

Now that we understand how the pyomo code works, let’s piece it together in a simulation, similar to how we’ve run simulations in the last two articles. This time around, rather than changing the number of categories and sample size, I’m going to fix categories at 20 and change the number of constraints and sample size and compare this against my standard scikit-learn linear regression model.

To generate a random relative constraint, here are the functions I’ve defined to handle that:

# Generate Constraints

def infer_constraint_type(category_times, key_a, key_b):
    if category_times[key_a] > category_times[key_b]:
        return '>='
    elif category_times[key_a] < category_times[key_b]:
        return '<='
    else:
        # Handle the case when values are equal, you can choose the constraint type as needed
        return '>='

def create_random_constraint(category_times):
    return key_a, constraint_type, key_b

def create_constraints(category_times, n_constraints: int):

    created_constraints = set() # used as a counter to keep track of what we've done so far
    constraints = set()

    for _ in range(n_constraints):

        keys = list(category_times.keys())
        key_a, key_b = random.sample(keys, 2)
        
        # Ensure that the constraint is not created twice or on the same variable
        n=0
        while (key_a, key_b) in created_constraints or (key_b, key_a) in created_constraints or key_a == key_b:
            key_a, key_b = random.sample(keys, 2)
            if n > 50:
                return None # number of constraints probably not doable
            n+=1
    
        # Infer the constraint type based on variable values
        constraint_type = infer_constraint_type(category_times, key_a, key_b)
        
        # Add the constraint to the set of created constraints
        created_constraints.add((key_a, key_b))
        constraints.add((str(key_a), constraint_type, str(key_b)))

    return constraints

I have also wrapped up the pyomo code I went over before into a function, similar to the get_error() function we defined back in article 2.

def get_constrained_error(df: pd.DataFrame, n_samples: int, constraints):

    # Sample DataSet
    sample = df.sample(n_samples)
    
    # Fit Model
    X = sample.drop("total_time", axis = 1)
    y = sample["total_time"]
    
    # Create a Pyomo Concrete Model
    model = ConcreteModel()
    
    # Get the column names from X
    feature_names = X.columns.tolist()
    
    # Define the coefficients as variables
    model.alpha = Var(feature_names, within=Reals)
    
    # Define the objective function (least squares)
    model.obj = Objective(expr=sum((y - X.dot([model.alpha[feature] for feature in feature_names]))**2))
    
    # Define constraints
    model.constraints = ConstraintList()

    for feature in feature_names:
        model.constraints.add(model.alpha[feature] >= 0)
        model.constraints.add(model.alpha[feature] <= 15)
    
    for constraint in constraints:
        if constraint[1] == ">=":
            model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] >= 0)
        else:
            model.constraints.add(model.alpha[constraint[0]] - model.alpha[constraint[2]] <= 0)
            
    # Create a solver
    solver = SolverFactory('ipopt')
    
    # Solve the optimization problem
    solver.solve(model)
    
    estimated_coefficients = {int(c): value(model.alpha[c]) for c in feature_names}

    # Estimate Error
    result_df = pd.concat([
        pd.Series(estimated_coefficients, name="estimates"),
        pd.Series(category_times, name="actuals")
    ], axis=1)

    return mean_absolute_percentage_error(result_df["actuals"], result_df["estimates"])

Lastly, we run our simulation similar to before:

error_dict = {
    "n_constraints": [],
    "n_samples": [],
    "constraint_mape": [],
    "ols_coeff_mape": []
}

max_samples = 1000
sample_size_step = 5
n_repeats = 15

print("OLS Calculation")
ols_errors = []
for n_samples in range(sample_size_step, max_samples, sample_size_step):
    errors = [get_error(df, n_samples) for i in range(n_repeats)]
    ols_errors.append(np.mean(errors))

print("Constraint Calculation")
for n_constraints in range(5, 20, 5):
    
    for n_samples in range(sample_size_step, max_samples, sample_size_step):

        constraints = create_constraints(category_times, n_constraints)

        if constraints is None:
            continue

        constraint_errors = [get_constrained_error(df, n_samples, constraints) for i in range(n_repeats)]
    
        # Write Results
        error_dict["n_constraints"].append(n_constraints)
        error_dict["n_samples"].append(n_samples)
        error_dict["constraint_mape"].append(np.mean(constraint_errors))
        
    error_dict["ols_coeff_mape"].extend(ols_errors)

Results

Below is the plot of our standard LinearRegression (blue) against setting up different numbers of relative constraints.

As you might have guessed, setting up constraints has an outsized impact on the training of the model with fewer data points and has diminishing returns as we get more and more data.

Interestingly, the number of relative constraints had less of an impact on model performance than I would have thought. It seems like setting up bounding constraints on what values each 𝛼 could take was the 80–20 intervention here.

I was curious how diminishing the returns were from using constraints as we collected more samples. To do this, I calculated the % change of MAPE of OLS (our vanilla linear regression model) vs our constrained regression model. The results are plotted below.

As you can see, the % improvement you get out of constraints tends towards 0 as we get more samples. As we get closer to that 200:1 ratio (here 4k samples), the % improvement is closer to 5%.

While not a magic bullet, we can see how constraining could make up for a lack of data, especially early on. You can see at the red lines below, depending on where you are in the curve (steepness / separation) you could get the performance of a model with 30% more data just by adding constraints.

Conclusion

Adding simple constraints to our coefficients can pretty dramatically change the accuracy of our model, especially in lower data volume environments. I find that operations data is usually never big enough and adding some of these smart modeling techniques can allow you to avoid making tradeoffs as you try to deliver a solution.