This minimization will result in an optimal value of x,

based on which, if a new point a is given to us,

we can form a x* which will give us the new value of b.

So if here is a new, we have

found the equation of this line after we find the x optimal, and

therefore we can use it to find the value b new.

We can write these equations in matrix vector form.

Our model takes the form Ax=b,

where in vector b we have stacked all the values b of i.

And matrix a, we stack the values of

a of i which in general is not a scalar but is a row vector.

Then the least squares objective here is to minimize with respect to x,

this norm squared, and

this is referred to as an L2 norm, as we will see later in the class.

In a number of applications,

there are sparse outliers in the data, like the ones shown here.

So these are clearly outliers, they don't look like the rest of the data, but

they're sparse.

There are just a few of them.

Otherwise this would be the data.

The question is now how well can a line describe this data

if we follow the least squares approach that we described in the previous slide?

Following a least squares approach in solving the linear regression problem, we

obtained the blue line here as the optimal line for fitting the available data.