The Perfect Matching: Part 2

Convexity 🔗 Permutations

Venkat
Math Simplified
9 min readJan 22, 2022

--

In a previous post, we saw how a decision-space blows up in complexity. Insights from computational geometry are required to tame combinatorial explosion. Convex Optimization tries to tease out its underlying patterns — primarily the ideas of Convex Sets and Convex Polytopes.

Convexity

To get a feel for the ideas underpinning Convex Optimization, we need to wrap our heads around Convexity. Let’s consider the space of real numbers, ℝ. Let’s conveniently choose a few random natural numbers for the purpose of illustration.

  • 𝕂 = {8, 2, 5, 7, 12, 10}
A closed line segment (green) is a skeleton that holds the set 𝕂⊆ ℝ. with two extreme points. Image by Author

They all lie on an infinitely long number line. But for 𝕂, we can always identify two extreme points {2, 12}. All elements of 𝕂 can be expressed compactly with some notion of distance from either extreme. This is usually a ratio, λ (lambda). One such intermediate point is shown in the picture. Picking the first and third elements {8, 5} of the set, their lambdas are:

  • 𝝀(8)= (8–2)/(12–2) = 0.6 | 60% of distance from 2
  • 𝝀(5)= (5–2)/(12–2) = 0.3 | 1–0.3 = 70% of distance from 12

We can easily verify their distance from either extreme:

  • 8 = 2 ⋅ (1–𝝀(8)) + 12 ⋅ 𝝀(8)
  • 5 = 2 ⋅ (1–𝝀(5)) + 12 ⋅ 𝝀(5)
Elements of 𝕂, by construction:
* lie on a closed line segment between its extreme points
* with λ ∈ [0, 1]

In fact, a line through any two distinct points in 𝕂 will lie entirely on that line segment.

𝕂 is a linear combination of {2, 12} with its coefficients summing to one.

Let’s introduce two new elements deliberately outside 𝕂.

  • 𝕃 = {1, 17}

It’s obvious that {2, 12} are no longer the extreme points. But a nifty way to tell is to compute their lambdas. They lie outside the range [0, 1]. The idea of “convex” depends on this move.

  • 𝝀= (1–2)/(12–2) = -0.1< 0
  • 𝝀= (17–2)/(12–2) = 1.5 > 1

A convex set contains every pairwise linear combination of points such that their coefficients (0 ≤𝝀 ≤ 1 ) sum to 1.

But a question still lingers — why is convex associated with this set at all?

This concept can be extended to higher dimensions, where it becomes obvious. Consider 𝕂 in two dimensions (2D).

  • 𝕂 = { (1, 1), (3, 4), (5, 2) }

The ordered pairs of 𝕂 lie on an arbitrary two dimensional curve. We can generate a multitude of random ordered pairs (like the ones shown in blue) they lie on or within the bounds of this “extreme” set.

Arbitrary curves (green, black) enclose 𝕂. Infinite random pairs (blue) lie in 𝕂. Image by Author
  • Consider the green curved segments BC and CA. Any two distinct points them can be connected by a line. That line will lie entirely inside the boundary. So it’s convex.
  • The same isn’t true of the green curved segment AB. It’s concave. A line connecting any distinct points on it will lie outside its boundary — it excludes a bunch of points. That problem can be quite easily fixed by a straight line AB (black). This is the simplest way to make it convex — and is called a simplex. A triangle is a 2D simplex. A tetrahedron is a 3D simplex and so on.
  • In fact, the triangle (AB, BC, CA) with 𝕂 as vertices is a convex set. Infinitely many random ordered pairs lie on or inside it. Connecting any distinct pair by a line segment is guaranteed to lie within or on (not outside) the perimeter of the triangle.
  • It’s as if the grid was a peg board with pegs placed on the “extreme points” — the ordered pairs of 𝕂. Then if a rubber band were held stretched above the peg board so as to “encase” all the blue points and let go, it would hug the pegs tightly. It’s shape would coincide with the simplex.

A set is convex if every point in the the set has an unobstructed line of sight to every other point (all points on the line of sight lie in the set).

The triangle is a convex hull. The coefficients constrain random points within (or on) the hull. Image by Author

We can pick linear combinations at random from the interior (or the boundary) points with suitable coefficients 𝝀 = (u, v, w). The only requirement is they be constrained by (0 ≥ u,v,w ≥ 1) and (u+v+w = 1) as we saw in the 1D case. There are infinitely many 𝝀 that satisfy these criteria. A few are shown in (the dancing green line). The code snippet below generated them.

  • A pattern emerges from seemingly random coefficients. The triangle (which is the convex set) forms a secure perimeter like the hull of a spaceship or a boat containing the chaos within (or without!) — in other words, it’s a Convex Hull!
  • The union or intersection of convex sets is convex. Given a convex hull and a set of new random points outside its perimeter, a new convex hull can be constructed to contain that set (see animation below). This property makes convex hulls extremely useful. It finds a variety of practical applications.
Illustration of a union of convex hulls, which is itself a convex. Image by Author
Code to generate the green dancing line (constrained within the convex hull). Code by Author

Of the infinite points that make up the edges of the hull, three are special — the vertices. They show up in the picture as red markers. This detail is important as we will see in the next section. We can take the idea of convex sets further. If they lie in n-dimensional Euclidean space, ℝⁿ, they are called a Convex Polytopes.

Permutations

There were three extreme solutions to the optimization problem. Their “optimal locations” can be represented by permutation matrices. Visualizing assignments as one-to-one (one winning bid per task, for instance) is a useful exercise. Once a cell gets committed (snatched and assigned), it becomes necessary to void out the remainder of its row and column. That’s when a permutation matrix emerges.

  • Minimal {A:II, B:III, C:IV, D:I} = {A:IV, B:II, C:III, D:I} = 17 units
  • Maximal {A:I, B:IV, C:II, D:III} = 27 units
Representing Optimal Assignments as Permutation Matrices. Image by Author

It’s a matrix obtained by permuting an identity matrix. We can represent every assignment (all 24 in our example) with different permutation matrices.

An n × n identity matrix has n-factorial permutations.

Why are permutation matrices worth lingering over? We saw some tantalizing hints — that a set of coefficients (u, v, w) = {(0, 0, 1), (0, 1, 0), (1, 0, 0)} of the convex hull form its vertices. The rest of the coefficients lie on its perimeter. By construction, the matrix of coefficients has the property that the sum of each row is one. The sum of each column is one — this property makes them doubly stochastic. Permutation matrices are a special case of doubly stochastic matrices.

Topologically, the optimal solution(s) for an assignment problem are found at the vertices of a convex hull — an exoskeleton that encases all possible solutions, with optimal sets occupying its extremities — the vertices. This visual offers an insight into the geometry of such optimization problems.

Convexity 🔗 Permutations

Garrett Birkhoff (1911–1996). Image Source:MathsHistory@St-Andrews

Let’s introduce Garrett Birkhoff. With a long and distinguished career as a mathematician, he is well known for his contributions to Lattice Theory (abstract algebra). In a penetrating insight relevant to this topic, he proved that permutation matrices are intimately linked with doubly stochastic matrices in his paper “Tres observaciones sobre el algebra lineal” — the three observations on linear algebra. John von Neumann is known to have independently rediscovered this relationship in 1953, while working on his seminal Two-Person Zero-Sum Game Theory, but did not publish the proof. In his 1982 interview with Mathematical People, Birkhoff recollects how he got involved in applied mathematics.

Birkhoff: Still later, through the Navy, I became involved in trying to develop theoretical models of the entry of torpedoes into water and “skip-bombing.” But most important of all, I realized that the computer was going to influence profoundly the nature of applied mathematics. Of course, von Neumann was a friend of mine and we discussed informally many problems, especially compressible flows and shock waves around projectiles.Interviewers: World War II was then very influential.Birkhoff: Mathematical physics and engineering are very different, and I have been more attracted to engineering mathematics, which is concerned with practical problems of immediate importance, than to mathematical physics. Most of my “applied” interests have been really the result of circumstances.Conversation Excerpt: Mathematical People: Profiles and Interviews, 2nd Edition by Donald Albers, Gerald L. Alexanderson

Decomposition

The so-called Birkhoff-von Neumann (BvN) decomposition is built on this theorem. Of the many ways it’s stated, here are two that I find most insightful. Both imply the same result, but provide different perspectives.

  • A doubly stochastic matrix is extremal if and only if it’s a permutation matrix.
  • A doubly stochastic matrix is a convex linear combination of permutation matrices.

We saw a dancing green line segment earlier. Let’s call it PQ and suppose that P is the interior point and Q lies on the convex hull — i.e., on any of the edges of the triangle. The convex hull constrains the segment for any combination of coefficients (u, v, w) because u+v+w = 1 and (0 ≥ u,v,w ≥ 1).

Q: When does the segment PQ shrink to a single point?

A: When Q and P overlap.

In the diagram, notice closely as PQ “merge” at the extreme points — it’s vertices (A, B, C). When they overlap at a vertex, it turns red indicating the merger.

  • (u=v=0, w=1) | (u=w=0, v=1) | (v=w=0, u = 1)

Doubly stochastic matrix is a convex linear combination of permutation matrices

Due to the extreme nature of its combinations {(0, 0, 1) | (0, 1, 0) | (0, 0, 1)}, a permutation matrix cannot be expressed as a linear combination of any other doubly stochastic set — it is its own unique linear combination. Any doubly stochastic matrix can be expressed as a linear combination of permutation matrices. Birkhoff was the first to prove it in 1946. Over the decades, many insightful proofs have appeared. Permutation matrices are analogous to orthogonal unit vectors (i, j, k) in 3D Euclidean Space (ℝ³) or the inner products of Hilbert Space (ℂⁿ).

We know the optimal assignments exist as one (or more) permutation matrices. Given a practical (real-life) assignment problem, it begs the question:

How can we express a doubly stochastic matrix as a convex combination of permutation matrices?

We know that given a n×n matrix, there are n! permutation matrices. That’s a big number even for n = 6 (720 permutations). Is there a minimal set whose convex linear combination expresses a doubly stochastic matrix? Let’s look at an example:

  • W = { (0.24, 0.38, 0.38), (0.69, 0.26, 0.05), (0.07, 0.36, 0.57) }

W is a doubly stochastic matrix. Every row-sum and column-sum equal one. Decomposition requires us to (a) find coefficients λᵢ i=1,…,k, with constraints 0≤ λᵢ ≤ 1 and ∑ λᵢ = 1.0 and (b) permutation matrices Pᵢ i=1,…,k such that their (convex) linear combination equals W. But how many, i.e., k=? Since its coefficients are constrained, it can be shown that k ≤ n²−n+1. Here’s a set of five (k = 5) obtained by using this program.

BvN decomposition of doubly stochastic matrix as a convex combination of permutation matrices. Image by Author

Knowing whether there is a maximum of k permutations given a pair (k, W) is a NP-hard problem. Over last few decades, the bound has tightened impressively to k ≤ n²−2n+2.

Decomposing doubly stochastic matrices into permutation matrices gets somewhat gnarly unless somebody comes up with a graphical approach. Here’s a teaser (original source from Peter Lax) for the next installment. Stay tuned!

Quip: In der Mathematik gibt kein Königsweg
Retort: Aber es gibt ein Grafenweg -
Dénes Kőnig
— Quoted by Harold W. Kuhn in his Plenary Talk, 2010, Lisbon

Thank you for your readership and support.

© Dr. VK. All rights reserved, 2022

References

  1. Mathematical People: Profiles and Interviews, 2nd EditionDonald Albers, Gerald L. Alexanderson, 2008
  2. On an algorithm of G. Birkhoff concerning doubly stochastic matricesDiane M. Johnson, A. L. Dulmage and N. S. Mendelsohn (1960)
  3. Notes on Birkhoff-von Neumann decomposition of doubly stochastic matricesFanny Dufossé, Bora Uçar, 2016
  4. Exact Birkhoff Decomposition AlgorithmV.Valls, G.Iosifidis, L. Tassiulas, April 2021

--

--