An intuitive and physical approach to Newton’s method.

Tanbal تنبل
6 min readSep 18, 2017

--

Newton’s method (also known as the Newton-Raphson method) is a very handy way to solve for the roots of a function. Understanding how this technique works is also a great way to understand how and why Gradient Descent makes sense.

Motivation: Say we’d like to find the value of √2. We could set up a polynomial function like the one below:

When plotted the function takes on the following form:

f(x) = x²-2

As we can see, the square root of 2 can be found by solving for the the values of x where the function in intersects the x-axis. In other words, we’re looking for values of x that make the following statement true:

This is equivalent to saying:

Let’s show that this is indeed going to give us the value of the square root of 2 by rearranging things a little:

and taking the root of both sides

et voila! If we can find the values of x at which the function crosses the x-axis, we will have found the value of the square root of 2. But the question remains — how do we solve for this value of x? Let’s choose a point (shown in green) on the graph away from the x-intercept: (x=5,f(x)=23)

f(x) = x² -2 with (x=5, f(x)=23 ) shown in green.

Now the value we are actually interested in is at what value of x the function f(x) crosses the x-axis (shown below in red):

f(x) = x² -2 with (x=?, f(x)=0 ) shown in red.

So how do we get from the green point to the red one? In fact, why does it even make sense to start looking at this problem from this angle?

f(x) = x² -2 with both points shown.

One interesting observation is that the function f(x) is increasing in value between these two poins (red → green). Concretely, as x increases in value, so does f(x). Another way to think about this is to express it as a ratio:

Ratio of change in y values to change in x values from red → green point.

What this is saying is that the change in both f(x) and x is positive, i.e. both f(x) and x are increasing in tandem. Another word you may have heard to describe this is ‘slope’ or ‘rise over the run’ of a function.

Using calculus we can compute the slope of the function, f(x) at a single point. The way to do that is to compute the derivative (we’ll talk about the magic behind derivatives in another post!) of the function f(x), commonly referred to as f’(x) (“f-prime-of-x”). We can then plug in the value of x into f’(x) and the value that is produced will be the slope of the function at x!

Ok, so we know how to compute the slope of a function at a given point — f’(x), and we’ve also worked out an alternate way to express a slope — a ratio of change in y to the change in x: Δy/Δx. Can we then use that information to draw a line passing through a point, let’s say the green point, that has the same slope as the function f(x) at that point? Let’s give it a shot; we know that to specify a line we only need two points, and we have one already — let’s pick another one on the x-axis (shown in orange):

So we have our two points, let us denote the orange and green point as:

The line we are drawing connect the green and orange points should have the same slope as the function at the green point. Let’s express this mathematically:

Bearing in mind that we want to work out where the orange point is on the x-axis (we already know that it has a y value [y2] of 0!), we can rearrange the equation above to solve for x2:

And given that:

We have:

Because we already know the value of x1, we can compute f(x1) and f’(x1) and find the value of x2.

That means we now know the exact location of the two points we need to draw our line:

Now that we know x2 we can then find the value of f(x) at x2. Let’s draw one final point on our graph, (x1',y1') [shown in black]:

If we consider the difference between the values of y1' and y-value at the red point (0) , 5.29–0 = 5.29, we can see that we are not quite yet arrived at the our destination. The idea being, that if we repeat this process enough times, the black point will coincide with the red point, and given that we know the x value of the black point (x1'), then we’ll also have worked out the x-value of the red-point.

So to proceed, all we need to do is begin the process again, but this time setting x1 and y1 to the values of x1' and y1'. Here’s what it looks like in practice:

iteration 0 (x1,y1) = (5,23)
iteration 1 (x1,y1) = (2.7,5.0)
iteration 2 (x1,y1) = (1.72,1.0)
iteration 3 (x1,y1) = (1.441,0.0)
iteration 4 (x1,y1) = (1.414,0.0)

And finally here it is as a handy animation showing 4 iterations of the method:

Animation showing 4 iterations of Newton-Raphson.

I’ve also included a Jupyter notebook with a complete implementation of the algorithm, including all the pretty diagrams. See you next time, when we’ll talk about all about derivatives!

--

--

Tanbal تنبل

Je veux du bonheur! فلسطين، أنت امي، أبي و أختي. Pickled in the brine of life.