0:07

explicit to real value outcomes and real value predictors,

so take as a possible example that y looks something like this.

y looks like a function over let's say time from time zero to one.

And x also looks like a function, over time from say zero to one.

So let's consider the space L^2[0,1] which is this space of square

rootable functions on zero one where the interproduct of f and

g on the space Is integral from zero to one.

Half of t, g f, t, dt.

So, what we want to do, consider regression to the origin.

We wanna explain, why with the scale of multiple times this function x.

1:15

So we can just do the same thing we did before.

Y minus beta hat x plus beta hat x

Minus beta x quantity squared,

and we can expand that out just using the standard products of squares and

get y minus beta hat x squared plus

inner product of y minus beta hat x.

X, and beta hat x minus beta of x plus twice that,

and then plus the inner product beta hat

x minus beta of x quantity squared.

Now, this quantity Is positive, so if we get rid of it, we only get larger.

2:07

And I'd like for you to go through the notes or do it as homework,

to show that that quantity is 0,

so that we get that, or if we plug in an arbitrary value of beta,

we're always gonna be bigger than if we plug in the specific value of beta hat.

So what we see is that the least squares equation works, in this case,

in this generalized space.

2:28

So that if we want to explain an observed outcome y, we wanna explain

it as a function of the other function x, as a scale or multiple of x.

Then, we get the same answer.

2:52

Which of course has a finite interval on that range.

Then the beta hat.

Works out to be in a product of y and x over the interproduct of x.

The self denominator is just one, the numerator is integral y of t,

times the one.

Which I'm gonna omit dt over zero one.

So we could just call that function y bar the average of

the function over the range domain 0 to 1.

So we could define say y tilde as the centered version of y,

y minus y bar times the function.

Let me call J the function, let constant one, between zero and one.

Now let's consider linear regression and let's x tilde

the x minus x bar time J, the standard versions.

4:01

Let me define the covariance between two functions y and

x as the integral 0 to 1 of

(y- y bar times j) (x- x bar times j).

And I'm omitting the the dt.

I'm omitting the fact that this is J of t, this is y of t.

Okay I guess I'm not omitting it cuz I'm writing it right now, Okay.

so I'm gonna define that as the covariance and

then define the variance of say y as the covariance of y by itself.

So now we've just extended things like the covariance and

the variance, too and of course we can then also extend the correlation and

the standard deviation to, now, functions on the square integral space.

Now let's suppose we wanted to minimize y,

our function y minus beta not times j minus beta 1 times x.

We can do our same old trick if we hold Beta 1 fixed, right?

And think of Y minus beta 1 x as a single function,

then we're gonna get at the solution beta not, as it depends on beta 1,

has to be equal to y bar minus beta 1 x bar.

Plugging that back in.

We're going to then just get a regression to the origin case again.

And we're gonna get the beta 1 is equal to beta 1 hat is equal to the correlation.

The correlation between the function y and the function x times the functional

standard deviation of y divided by the functional standard deviation of x.

5:43

In addition, then we're gonna get to beta not hat,

plug it in back into it is y bar minus beta one hat, x bar.

So more than anything what I wanted to show is that we can extend

our equations that we've written out for linear regression and

the same thing is true to multivariable aggression.

We can extend them to these more complex spaces.

This is an example of a so called Hilbert space and you can just have linear

regression and multi variable regression for a general Hilbert space you basically

just need the inner product and to define concepts like the correlation and

the covariance like this but all of the results basically turn out to be the same