0:15

Earlier, we said that we would like our classifier

to output values that are between 0 and 1.

So we'd like to come up with a hypothesis that satisfies this property, that is,

predictions are maybe between 0 and 1.

When we were using linear regression, this was the form of a hypothesis,

where h(x) is theta transpose x.

For logistic regression, I'm going to modify this a little bit and

make the hypothesis g of theta transpose x.

Where I'm going to define the function g as follows.

G(z), z is a real number, is equal to one over one plus e to the negative z.

0:58

This is called the sigmoid function, or the logistic function,

and the term logistic function,

that's what gives rise to the name logistic regression.

And by the way, the terms sigmoid function and

logistic function are basically synonyms and mean the same thing.

So the two terms are basically interchangeable, and

either term can be used to refer to this function g.

And if we take these two equations and put them together,

then here's just an alternative way of writing out the form of my hypothesis.

I'm saying that h(x) Is 1 over 1 plus e to the negative theta transpose x.

And all I've do is I've taken this variable z,

z here is a real number, and plugged in theta transpose x.

So I end up with theta transpose x in place of z there.

Lastly, let me show you what the sigmoid function looks like.

We're gonna plot it on this figure here.

The sigmoid function, g(z), also called the logistic function, it looks like this.

It starts off near 0 and then it rises until it crosses 0.5 and

the origin, and then it flattens out again like so.

So that's what the sigmoid function looks like.

And you notice that the sigmoid function, while it asymptotes at one and

asymptotes at zero, as a z axis, the horizontal axis is z.

As z goes to minus infinity, g(z) approaches zero.

And as g(z) approaches infinity, g(z) approaches one.

And so because g(z) upwards values are between zero and

one, we also have that h(x) must be between zero and one.

Finally, given this hypothesis representation, what we need to do,

as before, is fit the parameters theta to our data.

So given a training set we need to a pick a value for

the parameters theta and this hypothesis will then let us make predictions.

We'll talk about a learning algorithm later for fitting the parameters theta,

but first let's talk a bit about the interpretation of this model.

3:25

When my hypothesis outputs some number, I am going to treat that number as

the estimated probability that y is equal to one on a new input, example x.

Here's what I mean, here's an example.

Let's say we're using the tumor classification example, so

we may have a feature vector x, which is this x zero equals one as always.

And then one feature is the size of the tumor.

3:52

Suppose I have a patient come in and they have some tumor size and

I feed their feature vector x into my hypothesis.

And suppose my hypothesis outputs the number 0.7.

I'm going to interpret my hypothesis as follows.

I'm gonna say that this hypothesis is telling me that for

a patient with features x, the probability that y equals 1 is 0.7.

In other words, I'm going to tell my patient that the tumor,

sadly, has a 70 percent chance, or a 0.7 chance of being malignant.

To write this out slightly more formally, or to write this out in math,

I'm going to interpret my hypothesis output as.

P of y=1 given x parameterized by theta.

So for those of you that are familiar with probability, this equation may make sense.

If you're a little less familiar with probability,

then here's how I read this expression.

This is the probability that y is equal to one.

Given x, given that my patient has features x, so

given my patient has a particular tumor size represented by my features x.

And this probability is parameterized by theta.

So I'm basically going to count on my hypothesis to give me

estimates of the probability that y is equal to 1.

Now, since this is a classification task,

we know that y must be either 0 or 1, right?

Those are the only two values that y could possibly take on,

either in the training set or for new patients that may walk into my office, or

into the doctor's office in the future.

So given h(x), we can therefore compute the probability that

y = 0 as well, completely because y must be either 0 or 1.

We know that the probability of y = 0 plus the probability of y = 1 must add up to 1.

This first equation looks a little bit more complicated.

It's basically saying that probability of y=0 for

a particular patient with features x, and given our parameters theta.

6:00

Plus the probability of y=1 for that same patient with features x and

given theta parameters theta must add up to one.

If this equation looks a little bit complicated,

feel free to mentally imagine it without that x and theta.

And this is just saying that the product of y equals zero plus the product of y

equals one, must be equal to one.

And we know this to be true because y has to be either zero or one, and so

the chance of y equals zero, plus the chance that y is one.

Those two must add up to one.

And so if you just take this term and

move it to the right hand side, then you end up with this equation.

That says probability that y equals zero is 1 minus probability of y equals 1, and

thus if our hypothesis feature of x gives us that term.

You can therefore quite simply compute the probability or

compute the estimated probability that y is equal to 0 as well.

So, you now know what the hypothesis representation is for

logistic regression and we're seeing what the mathematical formula is,

defining the hypothesis for logistic regression.

In the next video, I'd like to try to give you better intuition

about what the hypothesis function looks like.

And I wanna tell you about something called the decision boundary.

And we'll look at some visualizations together to try to get a better sense of

what this hypothesis function of logistic regression really looks like.