A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 2: Regression Methods

81 个评分

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

从本节课中

Introduction and Module 1A: Simple Regression Methods

In this module, a unified structure for simple regression models will be presented, followed by detailed treatises and examples of both simple linear and logistic models.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this lecture section, we'll look at estimating risk and

functions of risk using the results from logistic regression.

So while the results from logistic regression can be easily interpreted in

terms of odds and odds ratios after, after exponentiation.

And they're frequently presented as such in papers for

prospective cohort studies, risk can actually be estimated.

With a little bit of work, the results from logistic regression can be

converted to probabilities, proportions or risks and presented on this scale.

So in the last several sections,

we've explored how to relate a binary outcome to a predictor, binary,

categorical, either ordinominal or continuous via simple logistic regression.

We are showing how to translate the results and the estimates of odds and

odds ratios.

The results from logistic regression can also be used to get estimated risks and

functions of risk if the study design allows for risk estimates and for the most

part that we've only considered study designs of that nature in this class.

The one exception would be case control studies and

I'll comment a little bit more about that when we wrap things up.

So just to, to work this out.

Recall the estimated odds of a binary outcome is given by odds is equal to p

hat, the proportion or risk of having the outcome divided by the proportion of

risk of not having the outcome, which is one minus the risk of having it.

This expression can actually be algebraically solved in terms of p hat, so

I'll let you if you're interested algebra wise do it yourself, but

the end result is that if you solve for p hat in this equation,

you get p hat equals odds over 1 plus odds, which is

one version of the logistic regression we looked at when we first introduced it.

So, we can actually use the results from logistic regression to

estimate the log odds of an outcome given these x-values and

translate that into a proportion of risk estimate via that formula.

So let's look at our example with respiratory failure and gestational age.

We had a reference category of 37 to 40 works and then we had indicator for

being a low gestational age of 34 weeks.

Another indicator of the infant was 35 weeks gestational age and

another indicator, if the input was 36 weeks.

So let's compute the probability or proportion of infants with respiratory

failure for their reference group, children born full term, 37 to 40 weeks.

Well, if we do this based on the previous example the log

odds is equal to that estimated intercept of negative 5.5.

We can turn this into an odds, an estimated odds with p hat over minus 1,

p hat over 1 minus p hat, just by taking the antilog and

raising e to the negative 5.5 power and this will give us the odds for this group.

And we could now turn that odds estimate into a probability by

our formula taking the estimated odds over 1 plus the odds.

Or in this case, 0.004 over 1 plus 0.004, i.e.,.

1.004.

And this is roughly equal to 0.004.

So in this case, the risk is really low and hence, the odds and

the risk are close.

But this estimates that 0.4% of the children

born full term experience respiratory failure.

So less than 1%.

We do this for the [SOUND] group with the highest relative odds to this reference.

The gestational age of 34 weeks we're going to see a different story.

So the log odds for this group using the results from the previous equation is

estimated to be at negative 5.5 that starting log odds for

the reference group, plus the difference in log odds or

the slope for the group with 34 weeks which was equal to 3.4.

So if we add those together, we get a log odds of negative 2.1 for

the group with gestational age of 34 weeks.

So, we translate this into an odds by taking the antilog or

exponentiating using e.

And we get an odds of about 0.122.

And we can translate this into an estimated probability or proportion for

this gestational age group of 34 weeks by taking the estimated

odds over 1 plus that value 0.122 over one plus 1.22 or 1.122.

Which is equal to 11.

So this is a very different story.

Whereas less than 1%, we estimated that less than 1% of the children

with full term births experienced respiratory failure after birth.

We estimate 11% among those with the higher risk those in

the gestational age of 34 weeks.

So this gives us some grounding as to what that relative odds ratio we got from x

diminishing the slope of 3.4 means in terms of a risk differential of

this outcome between these two groups.

And we could compute the risks for

the other two gestational age groups, as well and compare them all to each other.

So let's look at our example relaying the risk of obesity to HDL cholesterol levels.

So this was the equation we got.

Where we estimated the log odds of obesity related to

HDL was negatively associated with HDL cholesterol level.

So what we can use this to estimate the risk or proportion of obesity

amongst the population, persons with HDL measurements of 75.

So how would we do this?

Well, we're going to use this equation to get an estimate of the log odds for

this group within our larger population.

You take the negative 0.05 and take the slope of negative 0.033.

We multiply that by 75.

And if you do the math on this, we get a log odds of negative 2.53.

If we exponentiate that.

We get an odd estimate of 0.08 approximately.

And so then if we can covert this to an estimated probability,

it's 0.08 for one plus that 1.08.

Which is approximately 0.074.

So we estimate that in the population, which was the overall sample was taken, we

estimate that roughly 7.4% of the persons who have HDL levels of 75 are obese.

So now we can ask and answer questions like what is the estimated risk

difference and say, relative risk of being obese for persons with HDL

levels of 100 milligrams per deciliter versus persons with HDL of 75.

So in the previous slide,

we estimated the risk of obesity to be 7.4% amongst those with HDL of 75

milligram per deciliter by the same approach and you verify my math here.

If we did this for those with HDL equal to 100,

the estimated risk turns out to be about 3.4% or .034.

So the relative risk, for example of obesity for

those with HDL levels of 100 versus 75 is equal

to 0.34, 3.4% over the point 0.074.

We got among those was 75 milligrams per deciliter,

which is about equal to 0.46 so those have a little those with the higher

of HTL level have about 54% lower risk of HTL of the obesity.

And the risk difference we could estimate would be 0.034

minus 0.074, which would be negative 0.044,

4% reduction on the absolute scale.

So this with a little bit of work, we were able to translate things into a risk

context and not have to rely just on odds and odds ratios.

Something that you can do in a paper based presentation,

it's kind of nice if you were trying to summarize these results and

ground them in terms of the general risk in the sample.

As an estimate for the population, you could present a graphic like this

where you actually plot the predictive risks for a given HDL level as

a functions of HDL level to show what the resulting starting log odds and

odds ratios convert to on the risk scale.

And we can see clearly that the probability of

being obese drops relatively quickly with increases in HDL cholesterol.

Recall the results about breast feeding and age in the resulting equation using

the sample of 192 Nepali children 12 to 36 months old.

So we were looking at the relationship between breastfeeding and age and

we came up with an equation that looks like this.

The log odds of being breast fed is equal to the intercept of

7.30 plus negative 0.24 times the age.

And so we said well, what is the estimate, what does this estimate for

24 month old children?

Well, what we need to do is plug 24 into our equation.

And take 7.30 plus negative 0.24, times 0.24.

And we get a log odds estimate of 1.54.

If we were to convert this to a corresponding odds.

We'd exponentiate the result in odds of 4.66 and

then to translate that into probability or p hat, it's something like this.

82% we estimate that,

estimated 82% of children who are 24 months old are being breast fed.

What if we wanted to do this for 16 month olds?

Those that are younger.

Well, I get us started.

If we plug in the number here, the estimated log odds is 3.46.

We actually convert that to the odds by exponentiating 3.46.

We get an odds estimate of about 31.8 very large odds.

And if we translate this into a probability or

proportion is 31.8 over 32.8, which is roughly equal to 0.97.

So roughly 97% of 16 month olds are breast fed.

Okay. So what is the estimated relative risk of

being breast fed for 24 month olds to 16 month olds?

Well, we could just compare in ratio format their

respective estimated probabilities and the relative risk is 0.85.

On the relative scale, 24 months olds have a 15% lower risk of

being breast fed as compared to 16 month olds.

And this is one example, the only one we've looked at before.

But just how the numbers worked out where the estimated risk

difference is equal to the reduction on the relative risk scale.

So if we look at the risk difference, we take 82% minus 97%.

This is an absolute reduction in the proportion being breast fed of 15%.

And this following graphic here shows essentially the estimated risk or

probability of being breastfed as a function of child's age.

So you can see at 16 months if we trace that up.

We get that not a perfect alignment, but we're close to one.

That was that 97%.

If we go up to 24 months, it's bringing us down into the low 80s territory, but

this graph it sort of shows that for the youngest children,

it's almost certain that they're being breast fed but the probability drops off.

Relatively quickly,

especially in the older ages down to about 20% of all children by 36 months.

So in summary, for most types of studies and

the only exception in this case controls in the studies that we've talked about.

And you may recall from that one part of stat reading one.

We've showed that you cannot estimate risk directly in case control studies, but

the results from logistic regression for

studies other than case control can be used to estimate risks in probabilities or

proportions and hence risk differences and relative risks.

And in turn these can be, the results can be presented to some degree on

these scales and that might be helpful for informing readership about

the implications of this association and give it a little more grounding than just

reporting the results in terms of odds ratios