Convex Optimization and Interactive Graphics
The other day I was playing around with g9.js, a javascript library for automatically interactive graphics. It let’s you declaritively specify what to draw, and the library then figures out how to make it interactive. Try playing with the g9.js gallery to build intuition and then come back here.
Visualizing the problem
The rings example works by converting polar coordinates to cartesian coordinates. Here is the code used in the rings example:
var initialData = {
radius: 100,
angle: Math.PI/6
}
function render(data, ctx){
var sides = 10
for(var i=0; i<sides; i++){
var a = data.angle + i/sides * Math.PI * 2
var r1 = data.radius
var r2 = r1/2
ctx.point(r1 * Math.cos( a), r1 * Math.sin( a))
ctx.point(r2 * Math.cos(-a), r2 * Math.sin(-a), {fill: 'red'})
}
}
g9(initialData, render)
.align('center', 'center')
.insertInto('#demo-rings')
The above code will render these two rings of points using the provided initialData, which is the polar coordinates, and the render function:
When the user drags any point, “g9 optimizes over the space of possible values for your data to find a set of values that comes closest to creating the change.” That means g9.js somehow figured out another set of polar coordinates that results in the dragged point being in that position! But how?
Let’s visualize it!
The above figure simulates an instance of a user interacting with the g9 canvas. An initial ring configuration is drawn using a radius of 100 and initial angle of 𝜋/6. Then, the user drags the leftmost point to a new position (-39, 34). There must exist another set of parameters r’ and θ’ that results in the desired position for that point.
We can visualize the space of all possible inputs of for r and θ:
The above figures show all possible x and y values for the dragged point. The radius and angle that we want is somewhere here! Figuring out what they are is as easy as solving a system of linear equations: we have two equations and two unknowns.
However, we can only do this because we know the formula. The absence of this information, as in g9.js, requires a different formulation.
We are given an initial parameter vector x, an unknown function f that converts those parameters to points in R², and a desired position d.
More generally, our objective is to minimize the distance between the point produced by our guess of the parameters and the desired point. Let’s use the Euclidian distance formula to measure the distance. To save on needing to compute a square root, we can take the square (This is also known as the L2-norm squared).
And that is what we see in point.js in g9:
this.minimize_args(args => {
var drx = args.cx - (cx + dx)
var dry = args.cy - (cy + dy)
return drx*drx + dry*dry
}, affects)
And now the problem we want to solve is a convex optimization problem! Notice that f could be an arbitrary function, not just the one we discussed. Let’s visualize this objective function:
The brute force search over the space of possible values returns these as solutions, and they’re highlighted with green stars in the plot.
brute force search min: 52 1.3744467859455345 X,Y: (-38.18477049065566, -35.297638767712954)
brute force search min: -52 4.516039439535327 X,Y: (-38.184770490655666, -35.29763876771295)
In the above figure the red dot represents value obtained from our objective function using the initial data. Our goal is to get the red dot as close as possible to the green star.
There are many algorithms we can use to descend down from the red dot to the green star. But first, let’s condense all this information into a problem statement.
Problem statement
Given an unknown function f that maps a guess input x to a 2-d point on a plane, and a point d that represents the desired point, we want the optimal x’ that provides us with the desired point.
In other words, want to minimize the distance between our guess and the desired point.
We need a measure of how close we are to the desired point for a given input. We’ll define the objective function as the distance formula squared (the L2-norm squared) between the point obtained from our guess and our desired point d:
The objective forms a parabola, and to minimize means to find a minimum of the parabola. The minimum of the parabola is also the point where the derivative is zero.
Analysis
We rely on the Oracle complexity model. This objective function is minimized when the gradient is zero:
g9 approximated the gradient by using the central difference formula in each dimension on the current approximation, reproduced in a simplified form here:
// Simplified approximation of the gradient in d dimensions
function gradient(f, x) {
var dim = x.length;
var tempX = x.slice(0), grad = Array(dim);
var delta = 1e-6;
for(var i=0; i<dim; i++) {
// Evaluate the function. These could be expensive!
tempX[i] = x[i] + delta;
var f0 = f(tempX);
tempX[i] = x[i] - delta;
var f2 = f(tempX);
// Central difference formula
grad[i] = (f0-f2)/(2*delta);
}
return grad;
}
To obtain the gradient, g9 invoked the unknown function f that computes the position of the shapes 2kd + 1 times, where k is determined by a line search algorithm. If the user provided a function that is expensive to compute, then the performance can be detrimental. Minimizing the number of calls to the unknown function is essential for performance.
Because there is no insight into the unknown function, the next best thing we could do is to use an optimization algorithm such as Newton’s method which has good convergence properties (how fast can we find the solution). We want good convergence because the less time needed for our code, the more time we give to the user to draw all sorts of complex, amazing things as shown in the g9.js gallery.
However, Newton’s method poses challenges for performance as it requires the inverse of the Hessian. Matrix inverse can be expensive. That is why we see the g9.js approximating the Hessian via a quasi-newton method:
var g1 = grad(x1);
var y = sub(g1,g0);
var ys = dot(y,s);
var Hy = dot(H1,y);
H1 = sub(add(H1,mul(ten(s,s),(ys+dot(y,Hy))/(ys*ys))),div(add(ten(Hy,s),ten(s,Hy)),ys));
The implementation in g9.js looks like the BFGS algorithm to me. To mimic what the g9 library does, we will also implement BFGS.
Specifically, we’ll use BFGS with a line search method that meets the Wolfe conditions as shown in Numerical Optimizations by Wright and Nocedal (pages 140, 177). Convex Optimization by Stephen Boyd (section 9.5) is also always a good reference.
We’ll compare BFGS with the Gradient Descent method with a backtracking line search to choose the step size. We’ll also visualize the trajectory towards the solution.
There is one more thing we will do differently from g9.js: instead of approximating the gradient, we derive it and use it directly. This is done to bring clarity into what g9.js is doing and to simplify the code. It does mean the python code here is very specialized to the toy example, but that’s ok! Generalizing the python code here could be a fun side project.
We need the Jacobian because we are working with a multivariate function.
The appendices of Numerical Optimization and Convex Optimization books are a good refresher if you are rusty with linear algebra and calculus. And that’s all the math we need! We’ve solved the problem before we wrote the code. Now is the time to execute.
The Code
In the above code we defined the objective and gradient methods using the math we derived earlier. We’ll first implement the Gradient Descent method with backtracking line search:
GD Iterations: 7739
GD Solution: [53.41018784 1.34293002]
We see that the Gradient Descent method took over 7000 iterations to converge to a solution that is approximately 1 pixel away from the optimal solution. That’s not very performant and will not provide a good user experience in a realtime graphics application. Let’s move on to the BFGS implementation.
BFGS Iterations: 6
BFGS Solution: [52.43757995 1.35947141]
BFGS took 6 iterations to converge to a solution that is 1 pixel away from an optimal solution. That is nice! Let’s visualize the trajectories of these two algorithms:
Both implementations took a similar path: first decreasing rapidly over one axis before continuing down the next. The biggest difference we have seen is in how fast the implementation arrived at the answer. Let’s visualize the ring example using the r and θ obtained from the BFGS solution.
Both methods clearly arrived at a correct answer. We know that BFGS was faster. How much faster?
Performance analysis
Let’s compare the performance of BFGS versus Gradient Descent method in terms of computation time.
vals = np.array([input_radius, input_angle])
%timeit gd(f, g, vals, desired)
973 ms ± 85.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
vals = np.array([input_radius, input_angle])
%timeit bfgs(f, g, vals)
409 µs ± 9.11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
BFGS provides a huge speedup over gradient descent. Recall that graphics running at 60 frames per second means that all processing must done within 16 milliseconds. With BFGS we have an extremely comfortable margin for all sorts of other computations. In the case of g9.js, the less time the library itself needs for computation, the more time can be given to the user’s function to draw all sorts of complex things!
Awesome!
We’ll end on a graph of the objective function’s output per iteration for each method:
fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2, figsize=(12,6))
ax0.plot(error_gd)
ax0.set_title('GD error per iteration')
ax1.plot(error_bfgs)
ax1.set_title('BFGS error per iteration')
plt.show()