C. Hinsley

10 August 2020

Newton's method is an iterative algorithm for quickly finding the roots of a differentiable function. I first encountered this technique when looking for an efficient method for calculating square roots. The idea is that you start with some initial value $a_0$ (it doesn't usually matter too much what that initial value is) and a square $x$, and you repeatedly calculate better estimates $a_{t+1}$ with the following formula.

$$a_{t+1} = \frac12 \left( a_t + \frac{x}{a_t} \right)$$

Because the updates get progressively smaller each time, you can set some small value $\epsilon > 0$ such that when $|a_{t+1} - a_t| < \epsilon$, you stop trying to improve your estimate and it is guaranteed that $|a_{t+1} - x| < \epsilon$.

But why does this work? At first glance, it appears we are just taking the arithmetic mean of our current estimate $a_t$ and an inverted estimate $\frac{x}{a_t}$. Indeed, if we were to pick some $a_t = \sqrt{x}$, then this update step would converge immediately:

$$a_{t+1} = \frac12 \left(\sqrt{x} + \frac{x}{\sqrt{x}} \right)$$

$$a_{t+1} = \frac12 (2\sqrt{x})$$

$$a_{t+1} = \sqrt{x} = a_t$$

Though it may seem obvious that sampling the midpoint between the current estimate and its inverse will result in eventually converging to the square root, it turns out this technique generalizes to any root.

Firstly we must note that Newton's method can be stated for any function $f(x)$ which satisfies certain conditions as follows.

$$a_{t+1} = a_t - \frac{f(a_t)}{f'(a_t)}$$

Why is this so, you might be wondering? It turns out that for any initial estimate $a_t$, you can plot the tangent line to the function $f$ at $x=a_t$, and note where that tangent line intersects with the $x$ axis. In other words, you're using a linear function to approximate $f(x)$ at a point, and then acting as though the root of that linear approximation is a fair guess for where the nearest root of $f(x)$ might be. It turns out that this works most of the time; try sliding the $a$ slider around in the plot below to see how the new estimate (the purple vertical line) responds. Although your initial estimate might, in this case, send you way away, you will probably eventually come near enough to a root that you begin to converge rather quickly upon successive estimations.

So how does this help us calculate roots of squares, cubes, and so on?

Suppose we have some real number $\beta$. We want to calculate the $p$th root of $\beta$, or $\sqrt[p]{\beta}$, where $p > 1$. We should be able to find a function $f(x)$ in terms of $\beta$ such that $f(\sqrt[p]{\beta}) = 0$, because Newton's method will converge to an $x$ value such that $f(x) = 0$. It turns out this is rather trivial.

$$f(x) = x^p - \beta$$

Proof:

$$f(x) = x^p - \beta = 0 \implies x^p = \beta$$