Lagrangian relaxation can solve your optimization problem much, much faster

Published in

GAMMA — Part of BCG X

12 min readFeb 3, 2020

Authors: C. Deperrois Grisoni, D. Galley, G. Boussac and P. Van Hentenryck

Foreword: For reasons of confidentiality, the following post is based on a “toy example” the authors constructed to reproduce a situation similar to the one they faced on a real project. The conclusions are based on their real-world experience.

Today’s advanced analytics techniques are providing businesses with new insights and opportunities that can improve decision making through optimization. We were recently presented with one such opportunity when a food retailer asked us to develop machine learning and optimization solutions to help the company establish prices that would optimize profit and reduce waste. After initial experiments and tests in real stores, we decided to scale the solution to large markets, which offered greater rewards, but presented new challenges. The main challenge was that on this larger scale the models to solve were significantly larger, which caused our Gurobi-powered optimizer to struggle. This phenomenon is sometimes referred to as “the optimization curse.” The optimization curse refers to the fact that as optimization problems become larger, they get exponentially harder to solve, and is described very well here.

In this article, we will explain how a well-known optimization technique, Lagrangian relaxation, helped our team reduce optimization time by a factor of more than 10 while improving the quality of the outcomes. This technique was very helpful to us, and we believe it can help others who also work on optimization topics. In particular, it can be applied to multiple use cases for which one needs to solve large MIP with coupling constraints, such as in scheduling or stock allocation problems.

Before we jump into content, though, we would like to thank Professor Pascal Van Hentenryck for helping us understand Lagrangian relaxation, as well as for everything he taught us about optimization.

The business problem

A large food retailer we were working with sells perishable fruits, vegetables and grocery items in stores. The retailer can discount products in the five days prior to expiration date. The higher the discount, the greater the demand for a particular product — but at a lower price. Our goal was to choose the price for each item that would maximize revenue during the week. In addition, the retailer wanted to be environmentally friendly, and was committed to wasting less than 10% of his produce by the end of the week.

To solve this problem, we first developed a machine learning model that, depending on the discount, predicted the daily demand for each item. We then built an optimizer to select the best price path to maximize revenue while adhering to the retailer’s waste-reduction constraint.

The first thing to note is that the objective function (maximizing revenue) is opposed to the waste reduction constraint (selling more than 90% of initial stocks). Indeed, the higher the discount, the more volume sold — but the lower the revenue per item. To meet the waste-reduction constraint, we often had to push some discounts further than the optimum, thereby sacrificing some revenue. The question then became how to make the best trade-offs and choose the right products to discount to minimize revenue loss.

This was when the “optimization curse” kicked in. The challenge of maximizing sales was quite easy, since it could be solved independently for each product. Adding in the waste reduction constraint, however, made the problem a great deal more complex, since the optimizer had to find the best trade-off between products. It was at this point that we turned to Lagrangian relaxation.

What is Lagrangian relaxation, and how does it help?

Lagrangian relaxation is an optimization technique made famous in 1971 by Held and Krap when they addressed the travelling salesman problem. The main idea behind the technique is to separate the constraints of a problem between “easy” and “hard” constraints, and then add the “hard” constraints to the objective function, with each constraint multiplied by a Lagrangian multiplier λ. The new problem becomes much easier to solve and has some nice properties that help solve the original problem. Two properties are particularly helpful:

Lagrangian relaxation provides good-quality upper bounds (in a maximization problem). The bounds from the Lagrangian dual are better than those resulting from linear relaxation.
While searching upper bounds, there are several ways to obtain feasible, high-quality solutions.

Mathematical intuition

Typically, we start with the problem (P) below, where f and g are linear functions of the binary variable x:

We then create the relaxed problem with λ ∈ ℝ⁺

Notations: we denote x_λ* an optimal solution of the relaxed problem, and we have Z*_relaxed(λ) the related optimal objective value.

Properties of the new mathematical formulation

If (P) is feasible, and if the set of solutions of P_relaxed(λ) is finite, then the solution max_x{net sales(x) + λ × g(x)} is finite for all λ ∈ ℝ⁺. These two conditions are met for our problem, since the solution space of the initial problem is finite, with or without the waste-reduction constraint:

P_relaxed(λ) is easy to solve for a given λ since there are no more “hard” constraints.
Solving the problem P_relaxed(λ) provides us with upper bounds to the problem (P), thanks to the primal-dual inequality:

with x* a solution of (P), which therefore verifies g(x*) ≥ 0, and
Z* = net sales(x*) the corresponding objective value of (P).

The above generalizes quite well to multiple constraints, as well as to equality constraints. The reader avid for more technical details could refer to the abundant literature on the subject, starting with Appendix 1, 3.

Let’s apply this theory to our above example.

Properties of the Lagrangian Dual:

Each Lagrangian problem is providing an upper bound, the Lagrangian Dual consists in finding the best one, solving the problem (D):

The dual problem (D) above also has a nice set of properties:

It is the higher envelope of a finite family of linear function, and therefore has solutions.
In addition, it is convex and continuous [1].
However, it is not differentiable everywhere thus requiring specific optimization techniques.

Translating our problem into equations

Note that the problem introduced above can be summarized as: What is the best trade off to ensure we maximize revenue while selling enough items to minimize waste? The detailed questions can be found in Appendix 2, but here we focus here on providing the intuition of the problem formulation.

Notations and decision variables

- a : an article, e.g. we have 1000 different articles, such as a yoghurt or an ice-cream;
- f : a flavour, e.g. each article can have 4 different flavors, such as vanilla or chocolate for an ice-cream;
- d: a discount, e.g. each article can be discounted by 10%, 20% … up to 90%;
- t: a day; we have 5 days in this example

The Boolean variable x(a, d, t) ∈ {0, 1} indicates whether discount d is chosen for article a at time t or not. Note that all flavour have the same price.

Objective function

We optimize for next sales, i.e. revenue:

Where P(a) is the price of article a before any discount, and sales(a,f,d,t) are the sales of article a with flavor f at discount d during period t.

We expect the article sales sales(a,f,d,t) to increase with discount d for a particular period (higher discounts are more attractive at a given time), and to decrease with time t for a particular discount d (as the expiration date approaches, a higher discount is required to maintain sales level).

Constraints

There are two types of constraints:

1. Four logical constraints for each SKU:
— Exactly one discount per article
— Cannot sell more than stock
— Cannot sell more than demand
— For a given period, available stock factors in sales from previous periods (stock propagation)

2. Two business constraints:
— For each article, make sure discount cannot decrease over-time.
— For all articles,meet the waste-reduction constraint: Sell more than 90% of total initial stock by end of week to guarantee minimal waste of perishable articles. 1. This is the only constraint that couples the articles together, making it the hardest constraints to meet. All the other constraints are based on individual articles.

Brute-force solving takes a lot of time, even with a really good solver …

Gurobi is not very fast with the above MI: it will take about 5000s to achieve an optimality gap of 0.25%. Such gaps and times are too long for business settings in which the retailer must run the model regularly to review prices. In addition, the lower the MIP gap, the better the solution — and the more revenue!

Without the waste-reduction constraint, the problem can be solved independently for each article, since there are no longer trade-offs between products: GUROBI finds the best solution in 46s! This is the typical use case for Lagrangian relaxation.

How do we apply Lagrangian relaxation in this specific case?

We’ve already mentioned that the idea behind Lagrangian relaxation is to put the hard constraints into the objective function. We therefore define the relaxed problem P_relaxed(λ) associated with our initial problem (P) as:

P_relaxed(λ) is subjected to all (P) constraints except the waste reduction one.

There are now two things to do with this new problem:

1. Find the best upper bound (and therefore the best Lagrangian multiplier), i.e. solve the dual problem:

2. Find good feasible solutions of the original problem (P)along the way.

The dual problem (D) also has a nice set of properties:

It is the higher envelope of a finite family of linear function, and therefore has solutions.
In addition, it is convex and continuous.
But it is not differentiable everywhere.

How to solve the new problem (D) ?

We propose to solve (D) using Sub-Gradient Descent Optimization, an iterative approach fulfilling our two goals above (maximizing revenue while selling enough items to minimize waste). The best lower bound Z_LB and upper bound Z_UB so far are computed for each iteration k, and the best feasible solution found along the way is stored. The algorithm needs an admissible solution to be initialized, which we typically obtain by running GUROBI for a few seconds, following the algorithm below. More details can be found in Appendix 1.

How to choose the step size of the subgradient descent?

There are several heuristics to choose the step size of the subgradient descent, and they are abundantly described in the literature (see Appendix 1 for examples).

Classic approach

The most common way to choose the step at iteration k is:

where cᵏ ∈]0,2]. Whenever the objective upper bound Z_UBᵏ has not improved for more than 10 iterations, cᵏ is halved. This heuristic works on most problems, especially when multiple constraints are relaxed.

With only one constraint to relax, there are simpler methods

In our Lagrangian relaxation problem, we relax only one inequality constraint. Therefore gᵏ is of dimension:

1. We can then use a dichotomy approach, where we define:

- λ_min_feas as the smallest λᵏ visited by the algorithm that leads to a feasible solution.
- λ_max_unfeas as the largest λᵏ visited by the algorithm that leads to an unfeasible solution.
- Initially, λ_min_feas = + ∞ and λ_max_unfeas = 0 .

At each iteration, we update these values depending on the solution that has been found so far, with:

- If λᵏ is feasible: λ_min_feasᵏ⁺¹ =λᵏ and λ_max_unfeasᵏ⁺¹ = λ_max_unfeasᵏ.
- If λᵏ is unfeasible: λ_max_unfeasᵏ⁺¹ = λᵏ and λ_min_feasᵏ⁺¹ = λ_min_feasᵏ.

We then choose the next λᵏ⁺¹ as:

Given that there is only one constraint to relax, we can easily see that if λ is feasible, then λ ≥ λ*$, and if λ is unfeasible, then λ <= λ* [see appendix 2]. Therefore: λ* is in [λ_max_unfeasᵏ; λ_min_feasᵏ] for any k. Details are found in Appendix 3.

This allows us to iteratively reduce the size of this interval through dichotomy.

Which heuristic to choose for your problem?

Numerous heuristics have been developed to address a wide range of problems. The best one depends on the particular structure of each problem and the constraints that are relaxed. Based on our initial experience, we suggest starting with the classic heuristic, and then working from there to find the heuristic or combination of heuristics that best fits each particular problem.

What about results?

As mentioned above, Lagrangian relaxation worked particularly well for our problem. Optimization time went from 5000s down to about 320s, a reduction factor of nearly 14. At the same time, MIP Gap improved from 0.25% to <0.00%, which translated into more revenue. When scaling up to larger markets we also encountered situations where the initial MIP Gap was greater than 5%. Lagrangian relaxation was particularly suited to solve this problem.

Dichotomy heuristic

With the dichotomy heuristic described above, the problem was solved in five iterations. The Lagrangian multiplier improves at each iteration, as seen below, due to the fact that we are only relaxing one constraint.

Classic heuristic

When we use the classic heuristic on this problem, the Lagrangian multiplier does not always improve. The algorithm takes more time to converge; about 100 iterations. These “oscillations” are a typical behavior of Lagrangian relaxation, and can be improved through the use of other heuristics.

When looking at the evolution of the MIP Gap, we achieve more attractive results by using the dichotomy heuristics instead of the classic one. The first figure, in which we use the dichotomy heuristic, shows that the gap decreases at nearly all iterations. The second figure demonstrates that it takes many more iterations to decrease the gap when using the classic approach.

Keep in mind, though, that although dichotomy may perform better, it can be applied only when one constraint is relaxed. The classic approach is often a good way to begin more complex problems.

Conclusion

In summary, Lagrangian relaxation is a very powerful technique for solving optimization problems that have the appropriate structure, such as when constraints can be separated into “hard” and “easy” categories. When there is only one constraint to relax, dichotomy sub-gradient descent heuristics are particularly relevant and provides very interesting results capable of reducing the optimization time by a factor of more than 10!

For more complex problems, interesting results can also be achieved by combining different heuristics — and by designing new ones. There are several examples in the literature of such problems, and we are in the process of trying new heuristics to solve them. We hope to discuss these approaches in a future article.

References

[1] L. Fisher, Marshall. (1981). The Lagrangian Relaxation Method for Solving Integer Programming Problem.Management Science, Volume 27, Number 1, pp 1–18
[https://pdfs.semanticscholar.org/fa52/c5043af1b2046396c82f0e573084dcdc9d5e.pdf]

[2] Master Class on Decomposition, CPAIOR2016, Banff, Canada, by Bernard Gendronhttps://symposia.cirrelt.ca/system/documents/000/000/255/Gendron_original.pdf?1464701244

[3] M. Guignard (2003). Lagrangean Relaxation, TOP, Volume 11, Number 2, pp 151–228 https://perso.ensta-paris.fr/~diam/ro/online/Monique.Guignard-top11201.pdf

Appendix

More details including equations can be found here:

Appendix