Previously, we saw that we can differentiate functions of

multiple variables and that it isn't much

harder than the univariate calculus we met at the start of the course.

In this video, we're going to introduce the Jacobian,

which brings in some of the ideas from linear algebra

to build these partial derivatives into something particularly useful.

The concept of the Jacobian can be applied to a variety of different problems.

But in the context of getting started with optimisation and machine learning,

there is a particular scenario that comes up a lot,

which is the Jacobian of a single function of many variables.

In short, if you have a function of many variables so

f(x1, x2, x3, ...) then the Jacobian is simply a vector where

each entry is the partial derivative of f

with respect to each one of those variables in turn.

By convention, we write this as a row vector rather than a column vector,

for reasons that will become clear later in the course.

Let's start by looking at a nice simple function to

see just how straightforward building a Jacobian can be.

Consider the function

f(x, y, z) = x squared y + 3z.

To build the Jacobian,

we just find each of the partial derivatives of the function one by one.

So, we've got df by dx is going to equal,

we're just differentiating with respect to x,

so we get 2xy.

When we treat everything else as constants.

df by dy is going to equal just x squared,

and df by dz is just gonna be,

so this thing is just a constant and the z disappears so we just get the number three.

Now bringing all of those together,

we just end up with a Jacobian J,

which is just going to be (2xy, x squared, 3).

So what does this tell us?

We now have an algebraic expression for a vector which when we give it a specific

x, y, z coordinate, will return

a vector pointing in the direction of steepest slope of this function.

The vector for this particular function has a constant contribution in

the z direction which does not depend on the location selected.

For example, at the point (0, 0, 0),

we can see that our Jacobian is just going to be

J(0, 0, 0) is just going to be well, (0, 0, 3) over here.

So we can see that our Jacobian is a vector of length

3 pointing directly in the z direction.

Some of the numerical methods that we will discuss later in

this course require us to calculate Jacobians in hundreds of dimensions.

However, even for the 3D example we just solved,

graphical representation is already quite difficult.

So we are now going to drop down to

2 dimensions so that we can actually see what's going on here.

But to keep things interesting,

we're going to look at a particularly complicated but rather attractive function.

So here is the equation that we gonna plot,

but I'm showing it to you briefly just so that you

can see that even though it is a bit of a monster,

I hope you would agree that just the tools we've seen so far,

we really could calculate the partial derivatives of this thing.

But you wouldn't really learn anything new from grinding through this.

Instead, I'm going to show you the results graphically.

So here is this function,

with the x axis horizontal and the y axis

vertical and the colors are indicating the value of z,

with the bright region suggesting

high values and the dark region suggesting low values.

Because of the nature of this particular function,

I know that nothing interesting happens outside of the regions shown here,

so we can forget about it for now.

Although this plot is fairly clear,

our intuition about the gradient is a bit lacking in this format.

So let's briefly make things 3D,

where the values of z and now also represented by the height.

So, as we said at the start of the video,

the Jacobian is simply a vector that we can calculate for each location on

this plot which points in the direction of the steepest uphill slope.

Furthermore, the steeper the slope,

the greater the magnitude of Jacobian at that point.

Hold the image of this 3D space in your mind as we now go back to 2D.

And rather than showing all of the grid points I used to plot the graph,

lets instead convert to a contour plot,

where just like a map of actual mountains,

we will draw lines along the regions of

the same height which for us means the same value of z.

This removes a lot of the clutter from the plot

which is useful for the final step that I'd like to show you,

which will be adding lots of Jacobian vectors on top of our contour plot.

However, before I do that,

let's take a look at these four points and see if your intuition is now in

tune by guessing which one will have the Jacobian with the largest magnitude.

Overlaying the Jacobian vector field,

we can see that they are clearly all pointing uphill,

away from the low dark regions and towards the high bright regions.

Also, we see that where the contour lines are tightly packed,

this is where we find our largest Jacobian vectors such as at point A.

Whereas the peaks of the mountains and in the bottom of

the valleys or even out on a wide flat plains,

our gradients, and therefore our Jacobians are small.

My hope is that this clear two-dimensional example will give you the confidence to

trust the maths when we come up against

much higher dimensional problems later in the course. See you then.