0:04

So, welcome back.

The final thing we're going to do in today's lecture is to talk about

the profile likelihood, which is a method

for creating univariate likelihoods from multivariate likelihoods.

So in this case, we're going to look at the

bivariate normal distribution, which has two parameters, mu and sigma.

And we're going to figure out to get likelihoods for mu alone

and you could equivalently do it to get likelihoods for sigma alone.

And here's the idea.

The multivariate

likelihood is a bivariate surface.

It has mu on one axis, sigma on another axis, and the surface above it.

And to obtain a likelihood for mu, profiling is

basically like let's imagine we took a lamp and shine it along

the sigma direction and looked at the shadow that the

likelihood placed on the plane defined by the mu direction.

And that's exactly what it gives so it's

name is exactly indicative of the, of the technique.

And then, now we'll just go to how

do you actually execute the mathematics to do that.

So, in other words, we want to shine the light on

this bivariate likelihood and we want to get the function that you

obtain onto say, the wall, where the shadow occurs, okay?

So let's pick a particular value of mu 0 and we want to

know what's the value of this curve in the shadow at mu 0.

Well basically, the light will go through all values above the likelihood

and we'll get stopped anywhere on the likelihood and up until the maximum value.

And so what

we basically do is we maximize the joint

likelihood for sigma with mu fixed at mu 0.

And then, this process is repeated for lots of values of mu 0.

So, let's actually go through it.

So the joint likelihood with mu fixed at mu 0 is just the Gaussian density.

And then we have independent data so we take a product out front.

So it's sigma squared to the minus 1 half e to the minus xi minus and

we're fixing mu 0 squared divided by 2 sigma squared.

And collect all the terms and you get the next line.

With mu 0 fixed, then the maximum likelihood estimator

for sigma squared and you can go through this.

Log the likelihood, take derivatives, solve for

sigma squared, you know, so maybe just so

you don't accidentally take the derivative with respect

to sigma, replace sigma squared say by theta,

so that you remember that you're fixing sigma

squared as the parameter, not sigma as the parameter.

If you accidentally take derivatives with respect to sigma,

you'll get the square root of this answer, of course.

So then, you wind up with a summation i

equals 1n xi minus mu 0 squared divided by n.

That's actually a nice result, right?

If you fix mu at a particular value then your MLE for sigma squared

is the sample variance.

But instead of plugging in the sample mean and subtracting deviations around

the sample mean, you're subtracting deviations

around that specific value of the mean.

3:12

So, it's a nice little result.

So, anyway, with mu 0 fixed, the maximum likelihood estimator

for sigma squared is this generalization of the variance right there.

So that's the peak of our likelihood, all right?

That's the point where the light

switches from being able to not go through the likelihood to the point

right above it where the light is

actually, you know, passes over the likelihood.

And that's that point.

That's that point that gets shadowed onto the wall at mu 0.

And so, we want to plug this back in

to the likelihood and we get this function right here.

Summation xi minus mu 0 squared over n raised to the minus n over 2 power,

and then e to minus -n over 2.

And this e to the -n over 2 is irrelevant because that it doesn't involve mu 0.

So that's for one mu 0 and if we did that for every mu 0, we would get a function.

And so here's our profile likelihood is this function.

Summation xi minus mu square raised to the negative n over 2.

That function is our profile likelihood.

And then again, this function is clearly maximized it at mu equals x bar.

You can,

of course, solve it.

But in general, one nice property of

the profile likelihood is that the maximizer of

the profile likelihood, the maximum profile likelihood

estimate is also your MLE for the parameter.

So, in this case, the maximum of the profile likelihood for mu is

going to be x bar, the same as the maximum likelihood for the complete value.

So if

we wanted to divide this by its peak value, we would simply divide it

by the same thing in, with instead of mu, there plug x bar n.

And that would normalize this function so it tops it out at 1.

So, lets actually go through the R code to generate this function, our

mu for the sleep data. So, our muVals, we're going to go from say

zero to three and do a thousand of them, so we plot a function of the thousand

mu not values, our likelihood values.

And then so it would just be the sum xi minus mu squared sum raise to the minus

n over 2 power so that's this term right

here, raise to the negative n over 2 power.

But I want it to be maxed out at

one, so normally I create the likelihood and then divide by its maximum value.

But in this case, I know exactly what the exact maximum value is.

It's when you replace mu by the mean.

So instead, I divide it by the mean right here and this sapply

is just a loop it says loop over mu values and do this function.

And then I'll plot them and connect them together with type equals l and then I'll

put the likely values above 1 eighth and

above 1 sixteenth and then I get this plot.

So that is my profile

likelihood for mu.

That is the function that I get if I take

the bivariate likelihood for mu and sigma, place a light along

the direction of the sigma axis and look at the

shadow on the wall, this is the outline of that shadow.

And that's called the profile likelihood.

And there's many theoretical properties of the profile likelihood.

But, most importantly, you can kind of treat them as

if they were a standard univariate likelihood.

So, you would treat this just like a regular likelihood for mu, the

higher values are better supported, the peak

is where the maximum likelihood estimate occurs.

And you could draw horizontal lines to get likelihood-based intervals for mu.

6:39

Well, that's the end of today's lecture.

We gave you many ways to create confidence intervals.

We gave you methods for creating T confidence intervals.

We gave you a method for creating a confidence interval for

a variance, maybe not the most useful one, but we did it.

We also showed you lots of really neat kind of well-known amongst statistics

circles but not generally well-known techniques for

generating likelihoods when you have Gaussian data.

And so all these techniques you could use in practice.

If you have data, and you're willing to assume that it's Gaussian,

all of the techniques would apply.

The specific technique of the T confidence interval is a very robust interval,

as long as your data looks roughly

mount-shaped then you, you're going to be okay.

The last thing I always mention is a question I

always get about the T confidence interval is basically, the T

confidence interval and the standard normal confidence interval look the same,

except with the T quantile replaced by a standard normal quantile.

And people always ask me at what point do I switch, what sample size

do I switch between its T confidence

interval and a standard normal confidence interval.

But the point is, is that the T

confidence interval limits to the standard normal confidence interval.

So my answer to that is just always do a T confidence interval.

Just never do a standard normal confidence interval.

And then you don't even have to worry about it because if your

sample size is big enough that T quantile looks like a normal quantile anyway.

So hopefully that answers that question.

And I look forward to seeing you next time where we'll expand

on confidence intervals for more general settings where we have multiple groups.

[MUSIC]