0:00

In this video, lets delve deeper and get even better intuition about what the cost

function is doing. This video assumes that you're familiar with contour plots. If you

are not familiar with contour plots or contour figures some of the illustrations

in this video may or may not make sense to you but is okay and if you end up skipping

this video or some of it does not quite make sense because you haven't seen

contour plots before. That's okay and you will still understand the rest of this course

without those parts of this. Here's our problem formulation as usual, with the

hypothesis parameters, cost function, and our optimization objective. Unlike

before, unlike the last video, I'm going to keep both of my parameters, theta

zero, and theta one, as we generate our visualizations for the cost function. So, same

as last time, we want to understand the hypothesis H and the cost function J. So,

here's my training set of housing prices and let's make some hypothesis. You know,

like that one, this is not a particularly good hypothesis. But, if I set theta

zero=50 and theta one=0.06, then I end up with this hypothesis down here and that

corresponds to that straight line. Now given these value of theta zero and theta one,

we want to plot the corresponding, you know, cost function on the right. What we

did last time was, right, when we only had theta one. In other words, drawing plots

that look like this as a function of theta one. But now we have two parameters, theta

zero, and theta one, and so the plot gets a little more complicated. It turns out

that when we have only one parameter, that the parts we drew had this sort of bow

shaped function. Now, when we have two parameters, it turns out the cost function

also has a similar sort of bow shape. And, in fact, depending on your training set,

you might get a cost function that maybe looks something like this. So, this is a

3-D surface plot, where the axes are labeled theta zero and theta one. So

as you vary theta zero and theta one, the two parameters, you get different values of the

cost function J (theta zero, theta one) and the height of this surface above a

particular point of theta zero, theta one. Right, that's, that's the vertical axis. The

height of the surface of the points indicates the value of J of theta zero, J

of theta one. And you can see it sort of has this bow like shape. Let me show you

the same plot in 3D. So here's the same figure in 3D, horizontal axis theta one and

vertical axis J(theta zero, theta one), and if I rotate this plot around. You kinda of a

get a sense, I hope, of this bowl shaped surface as that's what the cost

function J looks like. Now for the purpose of illustration in the rest of this video

I'm not actually going to use these sort of 3D surfaces to show you the cost

function J, instead I'm going to use contour plots. Or what I also call contour

figures. I guess they mean the same thing. To show you these surfaces. So here's an

example of a contour figure, shown on the right, where the axis are theta zero and

theta one. And what each of these ovals, what each of these ellipsis shows is a set

of points that takes on the same value for J(theta zero, theta one). So

concretely, for example this, you'll take that point and that point and that point.

All three of these points that I just drew in magenta, they have the same value

for J (theta zero, theta one). Okay. Where, right, these, this is the theta

zero, theta one axis but those three have the same Value for J (theta zero, theta one)

and if you haven't seen contour plots much before think of, imagine if you

will. A bow shaped function that's coming out of my screen. So that the minimum, so

the bottom of the bow is this point right there, right? This middle, the middle of

these concentric ellipses. And imagine a bow shape that sort of grows out of my

screen like this, so that each of these ellipses, you know, has the same height

above my screen. And the minimum with the bow, right, is right down there. And so

the contour figures is a, is way to, is maybe a more convenient way to

visualize my function J. [sound] So, let's look at some examples. Over here, I have a

particular point, right? And so this is, with, you know, theta zero equals maybe

about 800, and theta one equals maybe a -0.15 . And so this point, right, this

point in red corresponds to one set of pair values of theta zero, theta one

and the corresponding, in fact, to that hypothesis, right, theta zero is

about 800, that is, where it intersects the vertical axis is around 800, and this is

slope of about -0.15. Now this line is really not such a good fit to the

data, right. This hypothesis, h(x), with these values of theta zero,

theta one, it's really not such a good fit to the data. And so you find that, it's

cost. Is a value that's out here that's you know pretty far from the minimum right

it's pretty far this is a pretty high cost because this is just not that good a fit

to the data. Let's look at some more examples. Now here's a different

hypothesis that's you know still not a great fit for the data but may be slightly

better so here right that's my point that those are my parameters theta zero theta

one and so my theta zero value. Right? That's bout 360 and my value for theta

one. Is equal to zero. So, you know, let's break it out. Let's take theta zero equals

360 theta one equals zero. And this pair of parameters corresponds to that

hypothesis, corresponds to flat line, that is, h(x) equals 360 plus zero

times x. So that's the hypothesis. And this hypothesis again has some cost, and

that cost is, you know, plotted as the height of the J function at that point.

Let's look at just a couple of examples. Here's one more, you know, at this value

of theta zero, and at that value of theta one, we end up with this hypothesis, h(x)

and again, not a great fit to the data, and is actually further away from the minimum. Last example, this is

actually not quite at the minimum, but it's pretty close to the minimum. So this

is not such a bad fit to the, to the data, where, for a particular value, of, theta

zero. Which, one of them has value, as in for a particular value for theta one. We

get a particular h(x). And this is, this is not quite at the minimum, but it's

pretty close. And so the sum of squares errors is sum of squares distances between

my, training samples and my hypothesis. Really, that's a sum of square distances,

right? Of all of these errors. This is pretty close to the minimum even though

it's not quite the minimum. So with these figures I hope that gives you a better

understanding of what values of the cost function J, how they are and how that

corresponds to different hypothesis and so as how better hypotheses may corresponds to points

that are closer to the minimum of this cost function J. Now of course what we really

want is an efficient algorithm, right, a efficient piece of software for

automatically finding The value of theta zero and theta one, that minimizes the

cost function J, right? And what we, what we don't wanna do is to, you know, how to

write software, to plot out this point, and then try to manually read off the

numbers, that this is not a good way to do it. And, in fact, we'll see it later, that

when we look at more complicated examples, we'll have high dimensional figures with

more parameters, that, it turns out, we'll see in a few, we'll see later in

this course, examples where this figure, you know, cannot really be plotted, and

this becomes much harder to visualize. And so, what we want is to have software

to find the value of theta zero, theta one that minimizes this function and

in the next video we start to talk about an algorithm for automatically finding

that value of theta zero and theta one that minimizes the cost function J.