As with simple regression, the left-hand side,

the function of our outcome depends on what variable type the outcome of interest is.

So, for continuous measures the left-hand side,

the function we estimate will be the mean of

the outcome y for a given set of xs and the regression type is linear regression.

For binary outcomes, the left-hand side,

we start with one,

zero variables and we turn them into proportions which we turn into odds.

Well, the computer does and then ultimately the function we

model as a linear function of our predictor is the log odds of the outcome.

The log odds or p over one minus p,

where p is the probability that the outcome occurs,

that y equals one and the regression type for this is multiple logistic regression.

For time-to-event outcomes we either have Poisson regression.

We can use this when either the individual event times and censoring times are not known,

we can do Poisson regression or certainly if they are known,

we can also do Poisson regression,

but we can also take into account

the individual level information as well without grouping it,

which we'd have to do for Poisson using Cox proportional hazards regression.

As with everything else we've done this far in this course,

we will only be able to estimate the regression equation from a sample of data,

so just to indicate the estimates on putting

hats on with the intercept and all the subsequent slopes.

So, of course that ultimately means that we'll

have to deal with the uncertainty in these estimates

and do things like put confidence intervals on them and get p values when interested.

So, the right-hand side,

the Beta naught hat plus Beta one hat x_1,

plus Beta two at x_2 et cetera,

includes the predictors of interest,

the xs of interest, x_1 through x_p.

These can represent binary,

categorical, or continuous predictors.

Then each slope estimates the difference in

the left-hand side like we saw before for a one unit difference in the corresponding x.

But now that we have multiple predictors in the model,

it's adjusted for the other x variables or other predictors in the model.

We'll drill down on this in detail with examples for each type of regression.

What the intercept is going to estimate is the left-hand side when all over xs are zero.

So, let's just do a generic example to give

a little more illustration of what I just defined generically.

So, suppose we estimate a multiple regression with three predictors,

it's a study on intravenous drug users in

four cities and we have some outcome could be binary,

continuous, timed event, that we want to models of function of our xs.

The reason we have five xs with three predictors,

is because one of the predictors is multi-categorical.

So, the first x is just the indicator of the sex,

the biological sex of the participant.

One for female, zero for male.

The second predictor is nominal categorical.

There's four cities that this study takes place

in and so we need three indicator variables.

The four cities are Baltimore,

London, Delhi, and Cape Town.

So this is truly a global study.

We'll make Baltimore the reference and we'll have indicators for the other three cities.

Then our predictor x_5 is going to be the age of the participant measured in years.

So, how would we generically interpret these slopes?

Well, the first x is for sex,

a one for female and a zero for male.

So, just like we saw in simple linear regression,

this slope is going to compare the left-hand side is going to be

the difference in the left-hand side for females compared to males,

but that's where we'd stop.

In simple regression, we wouldn't qualify it any further.

But now that we have other predictors in here including

city of where the person was from and their age,

this is now a difference in the left-hand side for females compared to

males adjusted or taking into account those other two characteristics.

So, we talked about adjusted estimates in the lecture set on confounding,

and how they were useful and what they meant conceptually and now we're seeing a way to

operationalize this pretty painlessly by using multiple regression techniques.

Similarly Beta five is the difference in the value of

the left-hand side for subjects who differ by

one year in age adjusted for sex and the city where the participant is from.

If we look at the slopes for the xs that make up the multi-categorical predictor of city,

Beta two, Beta three, and Beta four,

the respective differences in the left-hand side between London and Baltimore,

Delhi and Baltimore, and Cape Town and Baltimore.

These are now differences adjusted for sex and age of the participants.

So, if there are different sex and age distributions across the different sites,

there's the potential for confounding.

Now we have adjusted estimates that we can call compared to

their unadjusted counterparts from a simple regression where city is the only predictor.

How do we get confidence intervals and p-values for individual slopes and intercepts?

Well, these will be generically computed as we saw before.

We'll take our estimated intercept or

slope and add or subtract two estimated standard errors.

Which of course will come from the computer,

and hypothesis testing for any individual slope

or intercept will be done the same way as always.

We'll compute the standardized distance of our estimate from what we'd expect it to

be under the null of no association, which will be zero.

So, we convert that into standard errors and then take

this distance and figure out how far our result is from what we'd

expect zero in standard errors and whether

it's far or not by turning to p-value and looking at

the chances of getting the result as

farther if our null about that particular quantity is true.

So, we're going to introduce a slightly new concept here,

just conceptually and I'll point it out and how to interpret the resulting p-value.

When we estimate a multiple regression model that

includes multi-categorical predictors technically speaking,

in order to test whether the multi-categorical predictors.

So, in this case we have three xs to represent city.

Technically speaking in order to test formally whether that predictor is

associated with the outcome the left-hand side or

not after adjusting for the other things in the model,

we can't just look at the p-values for each of the individual slopes here,

we need to test all three slopes at once and test what's called the joint

null that all three together are equal to zero.

So, why is that?

Well, just think about this in the context of what we've done here.

We said that Beta two hat is the difference

between London and Baltimore.