Introduction to Mixed Integer Programming

Published in

Efidgy

16 min readMar 20, 2019

I have been working as a software engineer for 15 years and half the time I dedicated to solving optimization problems. During my work I quite frequently faced the situation when the optimization team and the rest part of the company did not understand each other. Those misunderstandings were the results of inflated and unrealistic expectations. Modern IT technologies and power of science are mighty indeed, but they are just assisted methods and only a human can find the right decision and the right way of applying science and research methods. It is always good to know the toolkit you are working with, its features and capabilities, as well as realize its possibilists and limitations. In this and following articles I will try to bridge the gap and help you to understand both your colleagues from optimization team and technologies a bit better.

First of all, let’s see what optimization means.

Generally, “optimization” conception means a process of selection of the best value from various available alternatives.

Be sure, those alternatives are decisions too and you can reach your goal with them, but optimization methods help you to find better decision, the one which will allow you to reach your goal at lower expenses of money, time or any other valuable resource.

Best value is determined by the objective function, and the area of alternatives is specified by conditions. Every time when you make a decision on whatever in your usual life you solve optimization problems. The type of the decision is of no matter. Whether it is an integral decision, for example, when you should set the price of product your factory manufactures, or it is a discrete decision when you need to decide how many containers with this product your factory shall produce. So, if you have multiple options, and all of them are acceptable, but some of them are better than others, then you face an optimization problem. In your own life you deal with optimization problems every day. When you choose the best route to your office, your prime goal is to minimize time. Road network and time of arrival are your conditions. When you decide to buy something, the price of needed goods, product features, your needs and optionally delivery time define your objective function. Availability of the goods on a market is your conditions. In practical sense of everyday life, the optimization can be performed simply and trivially, often by just reviewing and involving your previous experience as in case of the route to your office or goods purchasing, but in certain cases it is hard to make the right decision without complex calculations.

And all those certain cases are highly important in our everyday life, they intangibly influence it and even our money. For example, airline companies shall schedule their aircraft fleet in the most effective and economical way. The right decision saves them a lot of money. But how to find the right decision if solutions space is extremely wide and large which makes our task more complicated? Right this moment computers can rescue us.

We can try to formulate the problem as one of the classical problems and solve it with the generic method.

LP

Linear Programming (LP) is one of the methods of solving optimization problems where objective function and conditions are defined by linear relationships between variables. The word “programming” here means planning rather than programming in a computer language, so do not to be confused here. Let us review a mathematical abstract example first and return to a real, practical example later. A real case usually covers hundreds or even thousands of variables; so to give better understanding of the process through our abstract model we will slightly simplify it and involve just two variables: “x” and “y”. This approach allows us to reflect relative conditions as they shown on Chart 2 (see below). Assume, we have the following objective function:

14 * x + 13 * y

Since our goal is to find and extreme point of target function, we will determine the maximum of that example. It means we need to choose x and y such that the value of our objective would be maximal. Simultaneously, there should be some limitations to make the example meaningful, so let us add some conditions. Here they are:

7 * x + 2 * y >= 16
x + 6 * y >= 8
3 * x - 2 * y <= 14
2.5 * x - 4 * y >= -15.5
7 * x + 6 * y <= 70

And as you can see, both objective function and conditions are linear. No x * x, neither x * y, so this example definitely looks like an optimization problem that could be solved with LP. Please, pay attention that in our example there are more conditions than variables, and this should not surprise you. Since our conditions are not equations, but inequalities, we can establish even more conditions and still there would be some space for the solution.

Let us draw everything we have discussed above on the chart:

Our conditions define the polygon of possible solutions for x and y called the solution space. Any point on the polygon is a valid solution of our problem. The solution space includes better and worse solutions according to the objective function. And there are infinite number of solutions because x and y are real. This polygon is colored using objective function as a reference. Blue tones define smaller values of the objective function, while pink ones define larger values. In our case we want to maximize the objective function which fits our conditions, so it is basically the pinkest point of the polygon. I have already marked the best solution with an asterisk.

Our conditions define feasible task. Now let us review the case when we add a simple condition x ≥ 8. There will be no solution and in this case the problem is called infeasible. On the other hand, our conditions define bounded solution space. Without last two conditions the problem will be unbounded, and the best score will be infinite. Luckily, our problem is bounded and feasible.

Obviously, the solution of our problem is one of the vertices of the polygon. So, the solver could basically inspect them all, but in practice some of the vertices could be excluded because this method speeds up calculations. That is exactly what simplex algorithm does: just finds any vertex of the polygon, and then follows the edges in a better direction until it finds the best solution. There are other methods of solving LPs like the interior-point method. But I would rather not to focus on the details here.

At the end of calculations of our LP example the solver should find the best objective value and values for all variables.

lp = 146.59
x = 4.35
y = 6.59

As you can see now, LP allows us to solve optimization problems where all variables are real.

There is a classical problem called the Stigler Diet. Its task is to find how much of each of foods should be eaten on a daily basis so that the man’s intake of nutrients will be at least equal to the recommended dietary allowances with the cost of the diet being minimal. The cost of food i is defined by Ci. The amount of nutrient j containing in food i is defined by Nij. Recommended amount of nutrient j is defined by Rj. Finally, the amount of food defined by variable Xi. The model looks like the following:

For each nutrient j:
    N1j * X1 + N2j * X2 + ... + Nnj * X1 >= Rj

Minimize:
    C1 * X1 + C2 * X2 + ... + Cn * Xn

Now we discuss the case when we have a discrete problem and we are interested in integer values of variables. That exactly the case when MIP shall be used.

MIP

Mixed Integer Programming (MIP) is a special subtype of LP where some of variables are integers. You may think that there is not so much difference between LP and MIP but, unfortunately, MIP is comparatively harder to solve than similar LP problem. And, as it will be shown below within MIP solving process, a corresponding LP problem will be solved multiple times.

Let us return to our first LP example, but now we make x and y integers. If you have a look at the chart shown below, the polygon still defines possible solutions. But now they must be located on a grid which represents integer values of x and y. Any intersection point is a solution of our problem. To be precise, there are 24 solutions. The best of them is the pinkest point of the polygon located on a grid.

There is a number of different ways to solve MIP problems. The most successful method is called Branch And Bound algorithm. Actually, modern solvers use extra techniques like cutting planes, so the method called Branch And Cut algorithm. But the main idea remains the same.

First of all, the solver removes all of the integrality restrictions. The resulting LP is called the linear programming relaxation of the original MIP. Then this LP is solved with the method surveyed in the previous part of the article.

The best objective value of LP relaxation is 146.59 which tells us that MIP solution cannot be better than this value. This is how the solver can estimate the best possible solution. It is called the best bound. And the value in percentage that shows how much our current solution is worse than the best bound is called the gap.

At the next step the solver picks one of the variables which should be integer but is real. For example, let us suppose our solver picks x which value is 4.35 in our current solution. Different solvers have different strategies of picking variables. Our solver could pick the given x first because this variable contributes more to the objective function than y. Further the solver creates two new copies of the original MIP, and since x could not be real and its value is 4.35, a new condition x ≤ 4 is added to the first copy, and a new condition x ≥ 5 is added to the other. Our x variable is called branching variable here. Obviously if we solve these two new MIPs we can take the better solution and it will be optimal to the original problem.

So, the solver applies the same idea to these MIPs producing the search tree. Each copy of our MIP is called the node of the search tree.

The solver can stop the branching process in three different cases: if it founds the solution, if it meets the infeasibility or if the solver already has solution with better value than the bound of this branch.

Let’s see what will happen if we add x ≤ 4 and x ≥ 5 conditions.

if x <= 4:
    lp = 138.86
    x = 4
    y = 6.38if x >= 5:
    lp = 145.83
    x = 5
    y = 5.83

From this moment the solver is truly sure that the best bound could not be bigger than 145.83. This is how the best bound is updated.

Now the solver may apply the same approach to these two MIP. And use y as a branch variable since x is an integer here.

Below you can see what the whole search tree of our problem looks like.

The way the solver walks through this tree defines how the best bound is updated, how new possible solutions arise and what is the total solution time.

For simplicity our solver will do a trivial breadth first search. Here is the table that demonstrates the search process. Each line represents a single node in a search tree, the result of solving LP relaxation.

 bound    best      lp     gap
146.59          146.59
146.59          138.88
145.83          145.83
145.83     134     134   8.83% *
145.83     134     inf   8.83%
145.83     134     145   8.83%
   145     134     inf   8.21%
   145     135     135   7.41% *
144.67     135  144.67   7.16%
144.67     135     144   7.16%
   144     135     inf   6.67%
   144     136     136   5.88% *
 143.5     136   143.5   5.51%
 143.5     136     inf   5.51%
   136     136     inf      0%

Here is the visualization:

See how the bound is going down, while objective goes up. This is fair for the maximization task.

As you can see the solver has found only three solutions. It did not try to find all of them. In real life the user sometimes expects multiple solutions that are close to the optimal. So he will be able to compare them and verify the correctness. You may use solutions from the solving process, but you are not able to control what they will look like. It might happen that from the user’s perspective all solutions would look pretty similar, so there will be not so many differences for comparisons.

One more thing shall be noticed here. The solver needs some time to find first solution. But before that it already has had some estimation. After the solver founds the best possible solution, it still must be proved to be truly best one and sometimes it takes relatively long time to prove. So, the gap 5.51% does not mean that a better solution exists. It just means that the solution is good enough and there could be better solution.

The solver needs some time to find first solution. But before that it already has had some estimation. After the solver founds the best possible solution, it still must be proved to be truly best one and sometimes it takes relatively long time to prove.

At the end of calculations of our MIP example the solver should find the best objective value and values of all variables.

mip = 136
x = 6
y = 7

Notice that the best MIP solution is geometrically far away from the LP solution. So, it is not that simply as to find an optimal LP and then find the closest grid point to it.

I have not mentioned it, but the solver also tries to reduce initial problem by removing unnecessary conditions and variables. For instance, if we had a condition like x ≥ 0, we could drop because it would not have no effect. Also, some variables could be dropped if they could be expressed through other variables.

Example

When you face an optimization issue in real life you can try to formulate it as an MIP, and it might be tricky in some cases. In fact, that is exactly what an optimization team does. To give a clear example I will solve below a simple problem which is designed to be solved with MIP.

So, suppose we have a product to be delivered from N warehouses to M stores. According to set conditions, for each warehouse i the amount of product shall be Ai and for each store j the demand of product is Dj. The cost of transportation between warehouse i and store j is Cij. Our prime goal is to minimize the total cost.

Let us define variables Xij as the amount of product delivered from warehouse i to store j. These variables are integers since our product is packed into containers and measured by the integer amount.

For our variables we have the following conditions. The total amount of product delivered from the warehouse i must not be greater than the available amount of product located in that warehouse.

For each warehouse i:
    Xi1 + Xi2 + ... + Xim <= Ai

The total amount of product delivered to the store j must not be less than the demand of that store.

For each store j:
    X1j + X2j + ... + Xnj >= Dj

You can ask: Why would we use inequation here? Why not just to add an equality? Because it is better to add some space for the solver and make the field of possible solutions bigger. It is a real case and time spent on solution also matters. Consequently, inequation allows us to receive solutions earlier.

We should minimize the following objective that represents the transportation costs:

Minimize:
      C11 * X11 + C12 * X12 + ... + C1m * X1m +
    + C21 * X21 + C22 * X22 + ... + C2m * X2m +
    + ... +
    + Cn1 * Xn1 + Cn2 * Xn2 + ... + Cnm * Xnm

Total number of variables is N * M, so if you want to draw this problem on a chart you will have to think in N * M dimensions. So, it is better to forget these two-dimensional charts you saw above. Things are a little more complicated. While the whole concept stays the same, the number of dimensions is equal to the number of problem variables.

If you run this task through the solver, you will get the value for each Xij, and the total delivery costs will be minimized.

But what if our solution is infeasible. Let us review the situation when all our stores require more product than the total amount of product in all our warehouses. In this case the solver will give you no solution. And that is all information you will get. It is definitely not what you expect. First of all, you need to know what has happened. Why the solver cannot solve the problem. So, your task as a developer to perform certain checks before running the solver. You have to check if the problem is feasible or not, and your task to tell the user why. Other option is to add penalties to the model. For example, you can add a factory to your model. And this factory may produce necessary amount of product, but this would be more expensive than just to deliver it from a warehouse. You can choose either a real factory or factories, which the business knows about, or a virtual factory (-ies), and you will find out that the product cannot be delivered. Whatever the case, for each store j we define a new variable Yj which will include some amount of product that cannot be delivered from a warehouse. And the cost in such case is P, or, in our real case it is better to say, penalty. The model with penalties looks like this:

For each warehouse i:
    Xi1 + Xi2 + ... + Xim <= Ai

For each store j:
    X1j + X2j + ... + Xnj + Yj >= Dj

Minimize:
      C11 * X11 + C12 * X12 + ... + C1m * X1m +
    + C21 * X21 + C22 * X22 + ... + C2m * X2m +
    + ... +
    + Cn1 * Xn1 + Cn2 * Xn2 + ... + Cnm * Xnm
    + P * Y1 + P * Y2 + .. + P * Ym

If P is too small the solver will give you solutions which store it is cheaper to just apply penalties to than to deliver anything. Again, it won’t do and is not what you want from the solver. If P is too big you will get solutions with a huge gap values, far away from the optimal but actually with a single penalty. It means the gap will give you irrelevant information. The P value must be chosen reasonably. You can ask the user to set this value but I would recommend you to calculate this value from the model data. Or add just a slider for the user between “low” and “high” importance. For example, P can be calculated this way:

P = 2 * (C11 + C12 + ... + Cnm)

At the end of calculations you will get a set of Xij which holds integer values that represent the amount of product delivered from warehouse i to store j and Yj with penalties if any. Again, the total cost will be minimized.

Solvers

The main feature of defining of your optimization problem in a generic way, as MIP, is that you can use generic tools. Although the solving algorithm itself is quite simple, it could be really tricky to write your own implementation. Especially if you want to compete with advanced modern solvers. So usually ready-made solvers are used. There is a wide range of solvers: from free and open source solvers to expensive and proprietary ones. Cloud solvers are also available. Each solver has its own set of parameters, features and capabilities. A good thing here is that regardless their broad variety, you can test your model on different solvers because all of them have more or less similar API and you may find the one that meets your requirements.

Conclusion

It is important to understand the basic concept of MIP algorithm in order to catch its advantages. As it was mentioned above, the algorithm only gives you an estimation of the best solution. Further it is your power to decide how to deal with this information and you might choose to stop the whole process and not waste your time. In most cases you do not need the best solution if your current one is just like 1% worse.

You might choose to stop the whole process and not waste your time. In most cases you do not need the best solution if your current one is just like 1% worse.

You shall be aware of disadvantages of MIP. Since the algorithm works with abstract numbers, conditions and objectives, sometimes it cannot quickly generate the solution, a better one than the user can find just by reviewing and intuitively. But be sure, the user’s decision can never be better than MIP’s one, so activate your patience and wait for results.

The strategy of best solution search supposes to find a feasible solution first and then to improve it gradually and thoroughly. Unfortunately, our capabilities of management of this strategy are limited and it provokes the situation when the user is not always satisfied with the quality of interim solutions and the speed of response.

Your task is to invent the approach to your specific problem. You may divide the problem into smaller paces, use clusterization approach, redefine problem, do pre-calculations, and other tricks which may help to increase the performance. All those tricks are devoted to translating information from a business level of a problem to the uninformed algorithm. Important thing to note here is that generally for the really large-scale problems the MIP solver stops to produce any solution at all. Comparing to other methods, when the size of a problem grows up, raw MIP approach will crash on out-of-memory error while genetic search, for example, will continue to produce some solutions.

From my point of view, in case of large problems, nothing will help you. You must invent an approach, you can combine different methods. MIP solver here could be used just as an auxiliary tool which solves some of your subproblems. MIP, as any other technique, is not able to do all the work for you. It is not like you cram all your data, variables and conditions into MIP and it generates the right decision while you are doing nothing. It is not even designed for this purpose. It is you, not MIP, who shall do the work, but, yes, with the help of MIP.

In the next article I will focus on combinatorial problems and ways of solving them with MIP.

Introduction to Mixed Integer Programming

LP

MIP

Example

Solvers

Conclusion

Written by Vasily