A New Family of Diffeomorphisms from the Simplex to the Cube — With Application to Global Optimization

Published in

Geek Culture

9 min readMar 3, 2023

Ever feel like a square peg in a triangular hole? Well not to worry this post describes a novel style of smooth bijection from the k-simplex to the k-dimensional cube — and I show how easy it is to use in optimization.

On the left, evenly spaced points from the cube mapped to the simplex. On the right: the images of concentric triangles in the simplex after they are mapped to the cube. The invertible smooth mapping introduced in this note interprets simplex points as contest winning probabilities. NOTE TO EDITORS: This image and all others in this article are by the author (with a little matplotlib help from ChatGPT) unless differently acknowledged.

For avoidance of doubt the k-simplex lies in k+1 dimensional space:

Definition of the k dimensional simplex — Wikipedia

(I say the simplex. Try “Barycentric coordinates” if yours is different)

Derivative-free optimization on the simplex

My sufficient but not necessary motivation for relating the simplex to the cube comes from optimization. In particular some derivative-free global methods are more naturally implemented on rectangular domains. And sometimes a search in one fewer dimensions with one less constraint is just easier.

The simplex is a natural domain for:

Optimizing a choice of long-only portfolio.
Choosing hopefully robust positive-weighted combinations of models.

… and other things. I wanted to lift every optimization method that works on the cube onto the simplex. In particular I wanted the hundred or so derivative-free optimizers marshaled by the humpday package to operate on functions defined on simplexes as well as cubes. I wanted to make optimizers that don’t admit constraints (only rectangular bounds) work on the simplex.

By time you read this note this issue will be closed … maybe.

Example of a relatively easy objection function to optimize on the simplex. We’ll get to a harder one.

The other direction might be interesting too, by the way, since some optimization methods prefer triangular tessellations (SHGO described here comes to mind) so you could use this trick to move them to the hypercube or the entire plane.

I’m sure there are alternatives to what is presented here —especially in low dimensions. Don’t be shy about pointing them out to a guy who just scraped through his Complex Analysis quals.

Indeed in low dimensions we can perhaps get by with extremely simple hacks. Most things like (s0,s1,s2) -> (s0+s1,s1) gets us half way — because you can extend the function on the cube by reflection. That extended function can be fed to a standard unconstrained optimizer on a cube. It might search twice the area it really needs to, but maybe you don’t care.

Image of (s0,s1,s2) -> (s0+s1,s1) in the square. Any optimizer admitting rectangular domain functions can be used to optimize a function defined on the 2-simplex in a mildly wasteful manner, by lifting the function to half the cube with this map and then extending it by reflection.

However, folding and other maps that distort some dimensions a lot more than others (easy to dream up) can be more wasteful in higher dimensions. I will show you a bijection that takes evenly spaced points on the simplex and maps them to somewhat evenly spaced points on the cube. That way, you don’t double the number of local minima you need to skirt around.

Images of regularly spaced points on the simplex under a new “ability” map.

The ability map from cube to simplex

Here’s the idea.

The new map from the cube to the simplex first interprets each of the k cube coordinates as an “ability” percentile in a population where ability is normally distributed. The corresponding point on the simplex is then given by the winning probability in a contest involving k+1 participants — one extra participant is added whose ability is zero.

In the aforementioned contest, a score for each participant is determined by adding their ability to a single independent draw from the standard normal distribution, with the lowest score winning. (You could imagine determining that number by Monte Carlo, but you are better off with convolutions on a lattice).

To map a point on the cube to the simplex: interpret each point in (0,1) as a relative ability percentile — in a population whose relative abilities are assumed to be normally distributed. Convert to a relative ability, then add a zero-ability contestant to the contest so we have k+1 in total. The second line computes the winning probabilities by convolution on a lattice, using the winning package.

The function std_ability_implied_state_prices taking ability a to winning probability p can be found in the winning package. As for the name, there is a fine distinction to be made between “state prices” and winning probabilities related to ties. (That detail is discussed in the ability transform paper and the treatment of ties is core to the algorithm — though not to us, presently.)

Here’s a horsified example of this contest model. I’ve generalized: we can replace normal performance variation with any other distribution and that’s also true for the map to the simplex.

A simple model for performance where each participant is assigned a relative ability — the location parameter of their normal performance distribution. The map from relative ability to winning probability is reasonably fast an involves convolutions on a lattice There is, however, also a fast numerical method for the inverse map that takes winning probabilities to relative abilities — which is how this particular example was calibrated to Kentucky Derby odds.

Speaking of generalizations, we can replace the mapping from the interval to relative ability with something else, too.

The inverse ability map from simplex to cube

But it would seem that the crucial question here is: can the map be inverted in reasonable time in reasonably high dimensions?

The answer is yes, definitively. Again, those interested in volumetrics can refer to the journal (or pdf). That paper also contains other motivations that do, I assure you, extend beyond the pricing of Kentucky Derby quinellas. Short version: the algorithm scales to hundreds of thousands of dimensions, at least. And thus we have:

The inverse map from simplex to cube. Points are interpreted as winning probabilities. Relative abilities are inferred, and shifted so that the first contestant has ability zero. This entrant is ignored, and the rest of the k abilities are converted to (0,1) by the normal cumulative distribution function.

Of course, all the work is done in the first line whose details are here.

Properties

Waving my arms a bit here, pending more work.

Computational scale

No issue here. As noted in the paper, this map should be fine from the computational standpoint in very high dimensions. Perhaps a million variables, if you have time.

Coverage and distortion

I’ve shown an example of the image of evenly spaced points on the 2-simplex above, and what they look like on the square. Here is the reverse direction where we look at the image on the 2-simplex of evenly spaced points in the square.

Image of evenly spaced lattice points on the square.

Here is how I read that plot and in particular the seeming loss of mass near the top of the simplex (and also near the red vertex). The red vertex corresponds to the contestant who is added in. It is a little harder for the “extra” horse (the one with zero ability) to have little chance of winning, or to have a chance very close to unity. For this to happen both the other horses need to be very good, or both very bad.

In three dimensions things a little tougher to judge this way although we can project from the surface in 4 dimensions down to any tetrahedron in three, and see if regularly spaced points on the cube resemble regularly spaced points on the tetrahedron we’d expect.

The image (or preimage) of evenly spaced points on the cube. Here these are first mapped to the 3-simplex in 4-dim and then moved into a 3-dim tetrahedron for viewing.

One sees the breaking of symmetry. Maybe it helps to compare a tetrahedron with regularly spaced points:

Conversely one might map those evenly spaced points in the 3-simplex to the cube, to see how they flesh it out, or fail to viz:

Evenly spaced points in the 3-simplex are mapped to the cube

Smoothness

Maybe these pictures already convince you the map is moderately well behaved — though obviously some distortion is required to make square things triangular or vice versa. Count vertices!

Modulo numerical quirks and lattice effects, which are pretty minor in my experience, it is intuitively clear that the map from relative ability to winning probability is differentiable in both directions. Actually we can say that in generality provided the same is true of the performance density with respect to an ability parameter a. As noted in the paper, the probability of winning a race is an integral of the style:

All terms are bounded including, let us dictate, f(x,a) which is the performance density of the horse in question. So, let us further assume we can successfully take the derivative inside the integral and get some comfort that way.

Now as also noted in the paper, the actual implemented map isn’t quite this definition (I implement the discrete horse race problem, not continuous) but it is a good approximation. You can easily convince yourself of smoothness in the continuous case independently with your own Monte Carlo simulation. Here’s an example of how the winning probability will vary with the ability of the 51st competitor facing 50 others, whose minimum distribution is shown in purple.

Numerically derived histogram of the distribution of winning scores for 50 competitors (purple) set against the probability of winning for the 51st competitor as a function of relative ability (blue) where lower ability is better. The orange line shows the derivative of winning probability with respect to relative ability.

Boundary issues

The map I’ve provided is quite fast and works fine on the “majority” of the interior of the simplex. However because it is a lattice calculation and because points on the cube are mapped to arbitrarily large or small relative abilities, there will be boundary effects.

Speed

That depends entirely on your use case. Often derivative-free optimizers are used on problems where the cost function is expensive. In those settings the additional overhead of this transform is not going to be an issue.

One idea that I think is worthy of further investigation, however, is the use of fast approximations to the inverse ability transform — both as a possible replacement for my numerical scheme where speed is crucial and also as a way of tidying up the boundary behavior (as compared with just sending all sufficiently small relative abilities to 0 winning chance, which isn’t invertible).

To this end it might help to avail ourselves of the approximate mapping of Henery. See the paper for reference to his formula:

An Optimization Example

Okay let’s put it to work. We’ll use this objective function that is named “difficult” in the code:

Hopefully a difficult optimization problem for the simplex. A 3d slice is shown below

To find a minimal point on the simplex we lift it to the cube. I’ll write it as a decorator just in case anyone wishes to use it that way.

Decorator to modify a function taking a (k+1) vector argument. The modified function takes a k-vector argument and can be passed to an optimizer that assumes the domain is the k-cube.

Now we can optimize as follows:

Naturally we can also try out dozens of optimizers at the same time, and form some opinions on which are well suited to this particular task.

Trying out many optimizers from different packages

This prints out a leaderboard showing the minimum values found and the optimizers that found them, as well as the actual number of trials used — which is not the same as the number of trials you tell most optimizers to us, by the way.

Minimum value of the function on the simplex found by various optimizers from different Python packages. Who says there’s no such thing as a free lunch?

By all means play with the bottom half of the notebook to investigate for yourself. Incidentally this is not super surprising if you are at all familiar with my old article Comparing Python Global Optimization Packages or The Python Optimizer Form Guide, although freelunch came along more recently with pretty good 2200+ elo ratings.

(You can find similar canonical usage examples for lots of other optimization packages in this directory.)

A small practical remark. Let’s say we’re happy with freelunch (thanks Max Chapneys) for some purpose. As a practical matter I would suggest you not call this via humpday but rather, steal some code from my wrapping of it here, use the lift I’ve provided in the notebook, and that way free yourself of all the other dependencies in humpday. My package is fluid and intended for choosing optimizers, not playing the bloated middle-man.

Closing thoughts

My hope is that a convincing invertible map from simplex (interior) to cube (interior) might free up some people who prefer one domain to the other, either because they are designing optimizers or using them. But there’s also work to do here:

Ensuring the map works more reliably near the boundary, or augmenting this with a separate boundary search. I don’t think this is suitable at all, presently, if you expect answers in the corners.
Reducing the asymmetry introduced by the k+1'st player. One idea would employ stochastic ability, rather than simply setting it at zero, or modifying the distribution.
Investigating efficacy with different families of performance distributions for the other competitors, perhaps categorized by their extreme value classification, and perhaps admitting clean approximation.
Implement a new test suite of optimizer objective functions suggested by ChatGPT, as I discovered in the process of this work that it is rather good at this!

And so on. Pull requests are welcomed, as is indirect support for my open-source open-AI network pipe-dream.