TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Combinatorial Optimization: from theory to code using Google’s OR tools

Mirco Milletarì Ph.D.
TDS Archive
Published in
7 min readJan 28, 2018

--

In this post I will consider a problem from combinatorial optimization, that can be understood as constrained optimization on a general graph. This gives me the opportunity to introduce the concepts and language of complex networks in a more general way than those usually involved in Neural Networks alone. The concepts developed here will be used in a subsequent series on the physics of Neural Networks models.

Rather than discussing network theory from the most general point of view, I will focus here on solving a particular problem, from setting up the mathematical model to its numerical implementation using Google’s operational research (OR tools) libraries. Finally, I will discuss few applications of combinatorial optimization and its connections to statistical mechanics and computer science in general. This post is technical in nature, and I have decided to not shy away from equations but rather walk you through the math step by step. I hope this will trigger your curiosity!

How to share chocolate bars amongst hungry children with the fewest cuts?

Let us start defining the problem we want to solve. There are m chocolate bars of varying (integer) length and n hungry children who want differing amounts of chocolate (again integer amounts). You can cut the chocolate bars and the goal is to ensure that every child gets the amount they want. Write a program to distribute the chocolate so that you make the minimum number of cuts.

Solution: The way I decided to solve the problem is to map it to a Network optimization problem. We have two input vectors B and C of length m

denoting the number of chocolate bars. Each bar has b1, b2 … availabel chocolated units

and n is the number of children, each wanting c1, c2, … pieces of chocolate. We define a fully connected (FC) directed networks (meaning that connections are only from source to drain nodes). The picture below shows an instance of the problem with B = (2,5,7,3) and C = (3,2,6,1,2).

Note that the connections have an arrow to define their direction. Mathematically, the connections are encoded in a connection matrix

whose entries specify the number of units moved from source node i to drain node j. We assign to each connection a cost

We want perfect matching (i.e. 2 — >2 and 3 — >3 in the example) to be favoured, i.e. to have zero cost. This means that the “distance” between the supply and demand nodes must be zero. Taking the Euclidean distance as the natural metric on the graph we define:

At this point we need to define the objective function of the system, or in physicist’s language, the Hamiltonian of the problem:

The expression in the second equality is the “vectorized” implementation of the Hamiltonian; we have defined

as a unit vector of length m while Ic has length n. Note that the product between the matrix M and X is element-wise. This is the Hamiltonian of the minimum cost problem; by minimising the Hamiltonian we choose the matrix elements of X with associated lower cost. We now conjecture that the minimum cut configurations correspond to paths of minimum length ( i.e. cost) on the graph. There are two constraints to impose: one about “current” conservation and the other about the elements of X being positive definite. The two “currents” moving between the nodes read:

If we define the “flux” going through the network as

and a bias vector

then we need to enforce the element wise “chocolate conservation”constraint:

Finally, we need to enforce that the matrix elements of the connection matrix are all positives: X >0 for all i in [1,m] and j in [1, n]. The problem can be recasted in a Lagrangian form as:

where we have introduced two Lagrange multipliers lambda and mu to enforce the constraints. Note that the inequality constraints need to follow the Karush-Kuhn-Tucker conditions [2]: X > 0, mu>0 and

The solution of the example presented above is shown in figure.

Cuts correspond to the number of non-zero elements on each row of X. We consider a cut as a bifurcation, so in the example below unit 2 (counting bottom up) splits the original 5 units in two with one cut. So a cut corresponds to having l>1 non zero matrix elements on each row of X. You can find the python code to solve the problem on my GitHub, together with a step by step explanation, or just have a look below. I have used Google’s constrained optimisation library, that I found to be easier and more effective to use with respect to Scipy’s optimization tools. So if you are interested in learning how to use these libraries, you will find the simple problem considered here as a useful tutorial to solve more complex problems. Also note that OR tools already has a minimum cost function implemented that you can adapt to solve the problem described above. If you don’t feel like doing this from scratch, that is a valid alternative… but where is the fun then?

The minimum cost problem is one of the classic problems in combinatorial optimization; a more classical formulation of this problem is presented in the picture on the left:

a factory (node 1) produces a certain amounts of goods that needs to ship to destination (nodes 4 and 5) via different routes. Each route has an associated cost and a maximum capacity of goods that can be shipped. This problem can be solved following the steps above, together with a further restrictions on the matrix elements the edge connection matrix,

where l is the maximum capacity of each edge.

Where to go from here

In the past decades physicists have been attracted to random, combinatorial optimisation problems due to their relation to statistical mechanics, see refs.[1,3]. These problems are related to a class of models known as spin-glasses that correspond to an Ising model with random couplings. These models exhibit several interesting features and have been the source of inspiration for several Neural Networks models.

More on this in a following post! For the moment I just want to show how the chocolate assignment problem can be recasted in a statistical mechanics problem for random input/output relations. Associate a Gibbs measure and a constrained partition function:

Note that the Hamiltonian of this problem is dimensionless, so the inverse “temperature” may be thought as been rescaled in H. If beta is a random variable, we need to average the free energy over its distribution

where the last expression is the replicated partition function. How to solve these kind of problems will be the topic of future posts.

Some useful references:

[1] M. Mezard, A. Montanari, Information, Physics and Computation, Oxford University Press (2009)

[2] C. M. Bishop, Pattern recognition and Machine Learning, Springer (2006).

[3] M. Mezard, G. Parisi, Replicas and Optimization, Journal de Physique Lettres, 1985, 46 (17), pp.771–778.

Theoretical physicist, worked in the field of complex systems and quantum emergent phenomena. I currently work in AI, especially interested in Neural Networks, their connection to physics and related foundational issues. View all posts by Mirco Milletari

Published January 28, 2018January 30, 2018

Originally published at equat10ns.wordpress.com on January 28, 2018.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Mirco Milletarì Ph.D.
Mirco Milletarì Ph.D.

Written by Mirco Milletarì Ph.D.

Theoretical Physicist, Machine learning scientist@Microsoft, everything Geek.

No responses yet