0:33

For those of you there are, maybe some are more familiar with linear algebra,

what some students have asked me is,

when computing this Theta equals X transpose X inverse X transpose Y.

What if the matrix X transpose X is non-invertible?

So for those of you that know a bit more linear algebra

you may know that only some matrices are invertible and

some matrices do not have an inverse we call those non-invertible matrices.

Singular or degenerate matrices.

The issue or

the problem of x transpose x being non invertible should happen pretty rarely.

And in Octave if you implement this to compute theta,

it turns out that this will actually do the right thing.

I'm getting a little technical now, and I don't want to go into the details,

but Octave hast two functions for inverting matrices.

One is called pinv, and the other is called inv.

And the differences between these two are somewhat technical.

One's called the pseudo-inverse, one's called the inverse.

But you can show mathematically that so

long as you use the pinv function then this will actually compute

the value of data that you want even if X transpose X is non-invertible.

The specific details between inv.

What is the difference between pinv?

What is inv?

That's somewhat advanced numerical computing concepts,

I don't really want to get into.

But I thought in this optional video, I'll try to give you little bit of intuition

about what it means for X transpose X to be non-invertible.

For those of you that know a bit more linear Algebra might be interested.

I'm not gonna prove this mathematically but if X transpose X is non-invertible,

there usually two most common causes for this.

The first cause is if somehow in your learning problem you have redundant

features.

Concretely, if you're trying to predict housing prices and if x1 is the size of

the house in feet, in square feet and x2 is the size of the house in square meters,

then you know 1 meter is equal to 3.28 feet Rounded to two decimals.

And so your two features will always satisfy the constraint x1

equals 3.28 squared times x2.

And you can show for those of you that are somewhat advanced in linear Algebra, but

if you're explaining the algebra you can actually show that if your two features

are related, are a linear equation like this.

Then matrix X transpose X would be non-invertable.

The second thing that can cause X transpose X to be non-invertable is if you

are training, if you are trying to run the learning algorithm with a lot of features.

Concretely, if m is less than or equal to n.

For example, if you imagine that you have m = 10 training examples

that you have n equals 100 features then you're trying to fit

a parameter back to theta which is, you know, n plus one dimensional.

So this is 101 dimensional,

you're trying to fit 101 parameters from just 10 training examples.

3:44

This turns out to sometimes work but not always be a good idea.

Because as we'll see later, you might not have enough data if you only

have 10 examples to fit you know, 100 or 101 parameters.

We'll see later in this course why this might be too little data to fit

this many parameters.

But commonly what we do then if m is less than n,

is to see if we can either delete some features or

to use a technique called regularization which is something that we'll talk about

later in this class as well, that will kind of let you fit a lot of parameters,

use a lot features, even if you have a relatively small training set.

But this regularization will be a later topic in this course.

But to summarize if ever you find that x transpose x is singular or

alternatively you find it non-invertable, what I would recommend you do is

first look at your features and see if you have redundant features like this x1, x2.

You're being linearly dependent or being a linear function of each other like so.

And if you do have redundant features and if you just delete one of these features,

you really don't need both of these features.

If you just delete one of these features,

that would solve your non-invertibility problem.

And so I would first think through my features and check if any are redundant.

And if so then keep deleting redundant features until they're no

longer redundant.

And if your features are not redundant,

I would check if I may have too many features.

And if that's the case, I would either delete some features if I can bear to

use fewer features or else I would consider using regularization.

Which is this topic that we'll talk about later.

5:24

So that's it for the normal equation and

what it means for if the matrix X transpose X is non-invertable but

this is a problem that you should run that hopefully you run into pretty rarely and

if you just implement it in octave using P and

using the P n function which is called a pseudo inverse function so

you could use a different linear out your alive in Is called a pseudo-inverse but

that implementation should just do the right thing,

even if X transpose X is non-invertable, which should happen pretty rarely anyways,

so this should not be a problem for most implementations of linear regression.