Optimization with PuLP in Python — Getting Started

Self-guided Quarterly Training for Linear Programming

Jeff Goldkamp
Dec 2, 2020 · 6 min read

Data analytics mostly falls in the descriptive realm, with a little spilling into the predictive space, and barely any reaching the prescriptive state. Mathematical optimization presents a powerful method to transform descriptive and predictive inputs into prescriptive decisions. In this post, we’ll explain what linear programming is, how to identify opportunities to apply it, and walk through the Python implementation with a sample scheduling problem.

Image for post
Image for post

Combinatorial Optimization

Combinatorial optimization is a major subclass of mathematical optimization that finds the optimal solution from a finite set of objects. These problems arise in many industries and a surprising amount of everyday situations. In fact, just about everything can be framed this way. As Stephen Boyd eloquently explains:

Everyone in their intellectual life goes through a stage… …Let me describe this stage of intellectual development. You read a couple of books and you wake up at 3:00 in the morning and say oh my god, everything is an optimization problem.

Everyday examples:

  • What combination of clothes should I wear today?
  • What combination of food should I eat this morning?
  • What combination of roads should I take to work?

Not-so-everyday examples:

  • What combination of staff should I schedule next week?
  • What combination of facility locations should I establish?
  • What combination of deliveries should I assign to my fleet?

The common thread here is that there exists a (1) decision to be made (2) with some behind-the-scenes constraints and (3) underlying objective. Put the three together and you have a classical mathematical program to solve!

…Of course, everything is an optimization problem. What you’ll find out quickly is it doesn’t mean anything to say that. It says nothing. What matters is what optimization problem it is, because most optimization problems you can’t solve

So the issue at hand here is identifying problems for what type of optimization problem they are.

Linear Programming and MIPs

Linear programming is the foundational technique to solve combinatorial optimization problems. The main caveat, is that both objectives and constraints must be linear. Such linear structure yields a convex solution space where many readily-available solvers can find an exact solution for relatively quickly. A simple example might look like the following:

Image for post
Image for post

The problem becomes Mixed Integer Programming (MIP) once integer or boolean variables are introduced to a LP. The introduction of integer decision variables creates a non-convex space. For this reason, most MIPs cannot be solved (in reasonable time). As an alternative, MIP solvers generally give us a really good solution in reasonable time.

This problem class is where many real-world applications fall under. For an excellent primer on MIP modeling techniques, head over to the Mosek Modeling Cookbook.


PuLP is an open source Python LP modeler that calls other solvers, both free (CBC, GPLK) or not-free (CPLEX, GUROBI, MOSEK). There is also a LP modeler in SciPy, but the modeling structure is far too rigid with no ability for calling external solvers; making it unsuitable beyond theoretical textbook problems.

Running Python 3.8.6, I ran into just one hiccup during installation. If running sudo pulptest throws any errors along with the annoying JDK popup on your Mac, it is time to finally create that Oracle account and install JDK.

Simple Scheduling Application

Now let’s use PuLP to model a simple scheduling problem.

Problem Definition: You run a 24-hour lemonade stand offering 2 products: iced lemonade and frozen lemonade slushies. You have an idea of how long each product takes to service, along with the expected demand for a given day. How much staff is needed for each hour throughout the day to meet this demand?

Data inputs

# Data inputs
hours = range(0,24)
demand_iced = pd.DataFrame({0: 7, 1: 11, 2: 8, 3: 8, 4: 5, 5: 3, 6: 8, 7: 20, 8: 52, 9: 56, 10: 85, 11: 76, 12: 102, 13: 67, 14: 82, 15: 68, 16: 65, 17: 56, 18: 50, 19: 43, 20: 47, 21: 23, 22: 29, 23: 18}, index=[0])
demand_slushy = pd.DataFrame({0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 38, 10: 84, 11: 93, 12: 82, 13: 93, 14: 75, 15: 70, 16: 62, 17: 22, 18: 27, 19: 17, 20: 22, 21: 0, 22: 0, 23: 0}, index=[0])
processing_time_iced = 2/60
processing_time_slushy = 5/60

Decision Variables: Number of staff needed at each hour (x_i)

Objective: Minimize your staffing cost (sum(cost*x_i))

Constraints: Expected demand must be met for each hour (sum(x_i-(pt_iced*demand_iced_i + pt_slushy*demand_slushy_i)≥0 for all i), and the stand must be staffed at all hours (sum(x_i)>0 for all i).

prob = LpProblem(“Simple Scheduling Application”, LpMinimize)
# Decision Variabls
staff_level_vars = LpVariable.dicts(“staff_needed”, hours, lowBound=0, cat=’Continuous’)
# Objective Function
prob += lpSum([15*staff_level_vars[i] for i in hours]) , “Total cost of staff per hour”
# Constraints
for i in hours:
prob += lpSum([staff_level_vars[i]]) >= 1, (“Minimum staffing “ + str(i))
prob += lpSum([staff_level_vars[i] — (processing_time_iced*demand_iced[i] + processing_time_slushy*demand_slushy[i])]) >= 0, (“Hourly demand “ + str(i))

status = prob.solve()
for v in prob.variables():
print(v.name, “=”, v.varValue)

Image for post
Image for post

We were able to find an optimal solution! However, this is not really telling us much. Taking the expected demand and dividing by the processing time should give us the same thing, with the exception of the minimum staffing constraint.

Let’s make some adjustments to get more insights. Since we do not have an infinite supply of labor at our disposal, some form of labor or capacity constraints are needed.

Additional Constraints: You only have 5 employees available (sum(x_i)≤5*8) and (sum(x_i)≤5 for all i)

But only adding this constraint results in an infeasible solution. We need to either adjust the demand constraint or introduce a variable to represent the overflow or lost sales. We can also change the decision variables to integer to avoid fractional staff.

Additional Decision Variables: Lost sales in iced (lost_iced_i) and slushy (lost_slushy_i)

New Objective: Minimize your cost of staffing lost sales (sum(hourly_wage*x_i-cost_iced*lost_iced_i-cost_slushy*lost_slushy_i))

New Demand Constraint: Expected demand, less missed sales, must be met for each hour (sum(x_i-(pt_iced*(demand_iced_i-lost_iced_i) + pt_slushy*(demand_slushy_i-lost_slushy_i))≥0 for all i)

prob = LpProblem(“Simple Scheduling Application — Staff Constraint”, LpMinimize)
# Decision Variabls
staff_level_vars = LpVariable.dicts(“staff_needed”, hours, lowBound=0, cat=’Integer’)
lost_iced = LpVariable.dicts(“lost_iced”, hours, lowBound=0, cat=’Integer’)
lost_slushy = LpVariable.dicts(“lost_slushy”, hours, lowBound=0, cat=’Integer’)
cost_iced = 3
cost_slushy = 5

# Objective Function
prob += lpSum([15*staff_level_vars[i] + cost_iced*lost_iced[i] + cost_slushy*lost_slushy[i] for i in hours]) , “Total cost of staff per hour and lost sales”
# Constraints
prob += lpSum([staff_level_vars[i] for i in hours]) <= 5*8 , “8 hour workdays”
for i in hours:
prob += lpSum([staff_level_vars[i]]) >= 1, (“Min staffing “ + str(i))
prob += lpSum([staff_level_vars[i]]) <= 5, (“Max staffing “ + str(i))
prob += lpSum([staff_level_vars[i] — (processing_time_iced*(demand_iced[i]-lost_iced[i]) + processing_time_slushy*(demand_slushy[i]-lost_slushy[i]))]) >= 0, (“Hourly demand “ + str(i))
status = prob.solve()
for v in prob.variables():
print(v.name, “=”, v.varValue)

Image for post
Image for post

We once again reach an optimal solution, but this time a little more informative. The optimal staffing schedule is clustered around the peak afternoon hours, and since we only have 5 employees for the entire day, perhaps adjusting the operating hours would make sense.

You can rerun the same model without the minimum staffing constraint to obtain the following recommended schedule!

Image for post
Image for post

Linear programming is a valuable tool for a comprehensive analytics skillset, and presents a clear path to prescriptive analytics. Python and the PuLP modeler offer an accessible environment to start learning and applying these techniques. We can already imagine the wider use-cases from this simple demonstration like:

  • Weekly schedule recommendations
  • Longer-term hiring planning when projected growth numbers are fed in
  • Analyze different metrics and SLAs to optimize
  • Experiment with varying input parameters for sensitivity analysis

Stay tuned for future posts around MIP modeling, multi-objective optimization, and metaheuristic methods like genetic algorithms!

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store