So far in this module I've introduced you to the concept of Power Series, and

then shown you how we can use them to approximate functions.

We've also seen how Power Series can then be used to give you some indication

of the size of the error, that results from using these approximations,

which is very useful when applying numerical methods.

The last idea that we're going to very briefly look at on this topic

is upgrading the Power Series from the one-dimensional version that we've seen so

far to its more general multivariate form.

Just to recap on the notational options from the previous video.

We saw that we can re-express the Taylor series from a form that emphasises

building the approximation of a function at a point p to a totally equivalent form

that emphasises using that function to evaluate other points

that are a small distance delta x away.

Just to make sure you can see that they're the same, we can write out the first four

terms, one above the other, to make the comparison clear.

In this video, we are going to continue using the delta x notation

as it's more compact, which will come in handy when we're in higher dimensions.

We won't be working through the multivariate Taylor series derivation

in any detail, as all I really want you to take away from this video is what

a multidimensional Power Series will look like, both as an equation and in a graph.

So keeping in mind our one-dimensional expression, let's start by looking at

the two dimension case, where f is now a function of the two variables x and y.

So our truncated Taylor series expressions will enable us to approximate the function

at some nearby point x plus delta x, y plus delta y.

Let's look at the simple two-dimension Gaussian function, which many of you may

be familiar with from studying the normal distribution in statistics.

It's a nice, well-behaved function with a simple maximum at the point (0,0).

Now in a one dimension analysis, our Power Series will give us an expression for

a one dimensional function.

However, in a two dimensional case our power series would of course now give us

a two dimensional function, which we would more intuitively refer to as a surface.

Just as with the one dimensional case, our zeroth order approximation

are simply the value of the function at that point applied everywhere,

which means in 2D, this would just be a flat surface.

So a zeroth order approximation at the peak would look like this.

And is zeroth order approximation somewhere on the side of the bell

would look a bit more like this.

This is fairly straightforward, but

now let's think about the first order of approximation.

drawing the analogy from the 1D case again,

the first order approximation should have a height and a gradient.

So we're still expecting a straight surface, but

this time it can be at an angle.

Now if we are calculating it at the peak, it wouldn't be a very exciting case

as it's a turning point and so the gradient is zero.

However, let's look again at the point on the side of the slope.

Taking the first order approximation here gives us a surface with the same height

and gradient at the point.

Finally, let's look at the second order approximation for these two points.

We're now expecting some kind of parabolic surface.

However, if I calculate it at the peak nothing seems to appear.

But that's just because we've created a parabola inside our bell curve, so

we need to look from underneath to even see it.

Finally, if we put up the second order approximation at a point at

the side of the bell curve,

you can see that we end up with some kind of interesting saddle function.

But it's still fairly clear even by eye that the gradient and

the curvature are matching up nicely at that point.

Although, this approximation is evidently not useful

outside of a fairly small region around the point.

We're now going to take a look at how we would write expressions for

these functions.

So if we want to build a Taylor series expansion of the two dimensional function

f at the point x, y.

And then use it to evaluate the function at the point x plus delta x,

y plus delta y, our zeroth order approximation

is just a flat surface with the same height as the function

at our expansion point.

The first order approximation incorporates the gradient information

in the two directions.

And once again, we are thinking about how rise equals the gradient times the run.

Notice here for compactness, I'm using the partial symbol with the subscript

to signify a derivative with respect to a certain variable.

If we look now at the second order approximation, we can see

that the coefficient of one-half still remains as per the one dimensional case.

But now we have three terms, all of which are second derivatives.

The first has been differentiated with respect to x twice.

The last with respect to y twice.

And the middle term has been differentiated with respect to each

of x and y.

Now there are of course higher order terms, but

we've already got more than enough to play with here.

So let's look again at the first order term.

It's the sum of the product of the first derivatives with the step sizes, but

hopefully this will remind you of the discussion our Jacobian.

So we can actually re-express this as just the Jacobian multiplied by a vector

containing delta x and delta y, which we can write as delta bold x,

where the bold x signifies a vector containing those two terms.

Finally, if we look at the second order term in the same way and

notice that these second derivatives can be collected into a matrix,

which we've previously defined as our Hessian.

Then to make the sum we need,

we now have to multiply our delta x vector by the Hessian, and

then again by the transpose of the delta x vector, and that's it.

So we now have a nice compact expression for

the second order multivariate Taylor series expansion, which brings together so

much of our calculus and linear algebra skills, and makes use of the Jacobian and

Hessian concepts which we defined earlier in the course.

And of course, although we've been talking about the two dimensional case

in this video so far, we could actually have

any number of dimensions contained in our J, H, and delta x terms.

So this immediately generalises from 2D to multi-dimensional hyper surfaces,

which is pretty cool.

See you in the next one.

[MUSIC]