Let's forget about fixed-point iterations for the minute, and lets just see. We have some point at the curve. If we draw a tangent to the curve at this point, we'll have a straight line, and the equation of straight line is as written here. This is y, which depends on x, and this line has a slope, given by the derivative of the function at this point. That's just a geometrical meaning of the derivative. So the idea is this, we start at some point, we draw a tangent, and we take as the next approximation to the root, the crossing of this tangent line with the x-axis. So again we start at x-naught, we draw a tangent, we find where it crosses the axis, and then we go towards the function again and call this the next iterate. So in a sense what we do, we use some information about the function, encoded by the derivative, by the slope of the function in an attempt to reach the route faster. It actually works quite well in certain cases of which I'll talk about, but so far let's see the nice case. So I start at x-naught, I go to x_1. That's my new iteration. I draw a tangent at that line, look for the crossing of this tangent with the axis, and I keep on going. Even in this cartoon, you see that the convergence, at least it looks to be quite fast. So if my derivative is not very small, if the function is not very shallow, then the local information encoded by the derivative, tells me which direction the root is. I'm at least superficially, it looks like I have a chance of reaching the root faster than if I we're doing it without using the information about the function. Okay, so what's the rate of convergence? Let's try to put some mathematical rigor into this hand waving of using local information and whatnot. So let's see. When, at the very beginning, we phrased the Newton's method as the fixed point iteration, and we have a theorem which tells us about the convergence of the rate of convergence of fixed point iterations. This is the rate of convergence for fixed point iteration, is controlled by the derivative. It's easier to calculate the derivative for these Newton's iterations, and what we find by doing simple algebra, is that this derivative goes to zero as we are approaching the root. So what does it mean? The fixed-point iterations converge faster for smaller derivatives, the maximum value of the derivative controls the iteration. So we can actually expect a convergence which is faster than linear. We can expect some superlinear convergence, because the method gets progressively better, closer to the root. We can actually formulate a proper theorem about the convergence of the Newton's method, and it will be a little messy. But I'll think at least I'll mention this theorem. I'm not going to prove it, I'll just state it. So let's see. Suppose x-star is a simple real root of function f. Suppose the derivative of this function is non-zero in some neighborhood of this root. Again I don't want multiple roots anywhere nearby. Furthermore, let's suppose that the second derivative is continuous in this neighborhood of the root, and furthermore, my original starting point is close enough to the root. So we see that's the distance between my initial point and the root, and it's small enough, where what small and whatnot is controlled by the maximum value of the second derivative that's M_2 in this neighborhood, and the minimum value of the first derivative. Again I need to require that this minimum value of the first derivative is larger than zero. If all these conditions hold, then the Newton's method converges, and I can have an a priori error bound, which says that at nth iterate, the distance to the root is given by the original distance, the starting point, times this q to power 2n. Again where q involves the original quality of the original starting point, and the ratio of the limits for the second and the first derivative. I mentioned that it's a little messy. So in normal human language, what I get from this theorem is this, the method seems to converge quadratically, because there is this q to power 2n instead of q to power n, as unusual fixed point. However, I need to require many things, and the convergence rate is controlled not only by the function through its derivatives, but also it depends on the quality of my original estimate for the root. This will have quite severe consequences. But if this works, if the function is nice, and I manage to guess the root properly from the very beginning, the method seems to converge quadratically. Let's see, let me try to show a slightly different way of looking at this, slightly less rigorous, but maybe hopefully little more clear. Then I will just directly check that the convergence of the Newton's method is quadratic under all the assumptions of the previous theorem. Let's see, I will define a few things so that formulas look simpler. First I will drop the argument of the derivative, the first and second derivatives. I will only evaluate them at the root itself, and I will call delta the distance of the current estimate for the root, and the true unknown root. Let's see, and I assume that all derivatives exist and are continuous and everything's nice. So if I'm close enough to the root, I can Taylor expand my function. Then I will keep two terms in here up to delta squared. Again delta is the distance to the root. Now then the derivative, I can also expand in the Taylor series and this expansion with the same accuracy will be linear. Okay. Then the Newton's iteration, which relates these delta at the current step and delta at the previous step, and also it involves the ratio of the functions. I'll plug in my expansions for the functions and I have, what do I have? I have delta minus the ratio. In the numerator, I have the function, and the value of the function is proportional to the delta to the distance to the root, because I'm expanding around the root and the free term is zero. So I take this delta in front, then what I have is something linear in delta in the numerator, and something linear in delta in the denominator. Okay. So then I just collect the terms together. What happens Delta is still in front. The terms with the derivative cancel out, and they have something proportional to the second derivative in the numerator, so I have for the second derivative times Delta divided by two. Then, I can drop Delta in the numerator because it only contributes to the higher-order corrections. So then, what I have is that to the lowest order, Delta at the current step is indeed proportional to the square of the Delta at the previous step. So loosely speaking, if I have this many significant figures for my estimate at step n, the next step I double the number of significant figures. So let's recap our exposition of Newton's method, and let's set some details about practical applications of it. So first and foremost, to properly use the Newton's method to benefit from the superlinear, from the quadratic convergence, you need to be able to compute the derivative. You need to be able to compute it analytically. You should not use any finite differences or something because if you use finite differences for the derivative you are actually losing the quadratic convergence, you should never do this. Then, assume you can compute derivatives so you can use this method. You need the starting point to be close to the root. You need the original approximation to be in some sense good enough. So that this theorem n was relating the quality of the original estimate, and the rate of convergence. So I need to be able to guess the root with a good enough accuracy. The main problem is that this method does not honor the localization interval. So you see on one hand, the nice thing about Newton's method is that it can make large steps. See, vice section would blindly divide the interval by two, where the Newton's method is able to see that the function is say steep enough and that it makes a large step. But, the downside of this is that you lose the ability of having your root localized. These can have rather drastic consequences which we'll see in a minute. For now, I want to make the main statement is that one of the main uses of Newton's method in practice is polishing roots. So by somehow by some method you found good or reasonable approximations, and if you make even a single Newton step you double the number of significant figures, that's polishing. So that's one of the main uses of the Newton's method. But you need to be aware of the fact that it can go absolutely haywire because it doesn't honor the localization intervals. Let's see some examples. I'll take some function which is a simple cubic polynomial and actually make several steps of this Newton's method. Let us see. Suppose I start somewhere close to this minimum. That since I have a minimum, I've violated the theorem n therefore the convergence zone guaranteed. So then, the failure mode will be fairly drastic. I start from close to the minimum, I get somewhere close to the maximum. So if instead I start somewhere on the left of this minimum, say somewhere here, then my next estimate is close to the minimum, and then two things are possible. If my first estimate happens to be slightly on the wrong side of the minimum, everything goes haywire. But if it's on the right-hand side of it, then just a next step brings me somewhere very close to my true root , and from this point on everything is nice. So the bottom line is in the reasonable large range say from zero to two. In this particular case, the convergence is not guaranteed. The method can converge or fall apart depending on the tiny details of where exactly I take my initial point, so it's very non-uniform. In fact, if I define the basins of attraction of roots. So essentially, that's the set of all initial conditions, initial points where the method converge to the true root, if I start from them. If I look at these basins of attractions of various roots, their structure is fractal in the complex plane. These are called Newton's fractals and it's a nice exercise at visualizing them. So for the Newton's method, we see that the convergence is local. Like I was saying, it works great if you happen to have a good estimate of the root already. So the nice application of this is polishing.