We talked about over-fitting and under-fitting.

I wanted to just revisit it again.

This is an example of under-fitting.

It looks like the squares are data points and Y values for given input features,

X features are on the X axis and the response Y,

the outcomes are on the Y axis here.

Looks like it might be kind of linear but curves at the end here.

So, in this model,

it doesn't work very well and we see

this type of behavior it's referred to as under-fitting,

and then, you have the case you're after is the best case.

So, you've got the right amount of data,

you process your data properly.

You've done everything well,

and you're getting results when you test

the error of the predictions against what you actually expect.

You get a good fit. Then, the over-fitting cases I alluded to before,

basically just memorizes data points,

and in over-fitting case,

when you give it your test data,

you'll get very irregular behavior.

You'll get points that are way above or way below the predicted shape of

the curve and that tells you that you've got a problem with over-fitting.

So, on under-fitting, you're trying to potentially

fit a linear model to a dataset with non-linear linear area,

so you may want to go back and revisit the hypothesis function and there's many choices,

the linear is just one, there's a whole bunch of them.

Sigmoid is one that's used,

a lot of, not a lot.

Several algorithms exist that use sigmoid because whatever the X axis value is,

you always get a value in between 0 and 1 and it has that slope in between 1 and 0.

So, everything gets bound automatically by that.

So, there are hypothesis functions I can use a sigmoid function for example.

Sort of a linear combination of features.

Over-fitting tends to have

algorithms that have many more parameters in some cases more than the number of features.

So, there's more knobs to turn than there are features and that can

result in memorization problem and results in memorization.

So, good machine learning solutions live in

that trade space between simplicity and complexity,

and this is again where human beings are heavily involved.

These machine learning algorithms don't just magically do it all on their own.

There's tremendous amount of human involvement

to explore sets of algorithms, explore the data,

and find the balance that magical place

in between simplicity and complexity, and it isn't easy.

It was not easy. We'll see you on Thursday.

I'm not done with mine yet.

I have a lot of work left to do.

So, the data dictates what works well and how.

Often as I said, you can't just rely on a single algorithm.

You need to try bunch.

Potentially, you need to try a bunch.

Might get lucky in the first learning algorithm you choose,

go wow, this is great.

You've measured the error, the error is within what we want,

we're good to go.