Quadratic interpolation given two points and one derivative
Years ago, while reading up on line search algorithms in nonlinear optimization for neural network training, I came across this problem:
Given a function f(x), find a quadratic interpolant q(x)=ax²+bx+c such that f(x) and q(x) share two points and have the same derivative on the first of them — i.e., the interpolant fulfills the conditions f(x0)=q(x0), f(x1)=q(x1) and f′(x0)=q′(x0). Basically this:
(The tangent looks a bit off, but you get the idea.)
So I took out my scribbling pad, wrote down some equations and then, after two pages of nonsense, decided it really wasn’t worth the hassle. It turns out that the simple system
for
has the solution
Instead of ruining your time on the paper, it can be obtained more easily in Matlab using
Obviously — given that this is a line search problem — the whole purpose of this operation is to find an approximation to the local minimiser of f′(x). This gives
We also would need to check the interpolant’s second derivative q′′(xmin) to ensure the approximated minimiser is indeed a minimum of q(x) by requiring q′′(xmin)>0, with the second derivative given as:
The premise of the line search in minimization problems usually is that the search direction is already a direction of descent. By having 0>f′(x0) and f′(x1)>0 (as would typically be the case when bracketing the local minimiser of f(x)), the interpolant should always be (strictly) convex. If these conditions do not hold, there might be no solution at all: one obviously won’t be able to find a quadratic interpolant given the initial conditions for a function that is linear to machine precision. In that case, watch out for divisions by zero.
Last but not least, if the objective is to minimize φ(α)=f(x⃗ k+αd⃗ k) using q(α) , where d⃗ k is the search direction and x⃗ k the current starting point, such that
then the above formulas simplify to
and, more importantly, the local (approximated) minimiser at αmin simplifies to
If q(α) is required to be strongly convex, then we’ll observe that
for an m>0, giving us that a must be greater than zero (or ϵ , for that matter), which is a trivial check. The following picture visualizes that this is indeed the case: