So, in this section. We'll take on estimating risk and functions of risk. In other words, p-hats from logistic regression results. While the results from logistic regression can be interpreted in terms of odds and odds ratios after exponentiation, we shouldn't be limited to those measures. We shouldn't only be able to compute an odds ratio is a measure of association, for example, and we shouldn't only be able to compute odds for individual groupings if we're dealing with data from prospective cohort studies. If we're dealing with data from prospective cohort studies and other non case-control studies, we should be able to estimate risks or probabilities or proportions as well. So, with a little bit of work, the results from logistic regression can be converted to probabilities or proportions or risks, however, these are all synonymous and presented on that scale as well. So, in the last several sections we've explored how to relate a binary outcome to a predictor whether the predictor's binary, ordinal or nominal, categorical or continuous via simple logistic regression. We showed how to translate the results in estimates of odds and odds ratios. The results from logistic regression can also be used to get estimated risks and functions of risk in any study design that allows for risks or probability estimates. So, recall the estimated odds for a single group for any given group the estimated odds of an outcome occurring is the probability of the outcome occurring divided by one minus that proportion or probability, so p-hat over one minus p-hat. That's how we estimate the odds of an outcome for any single group. We can solve this a little bit of algebra. You can solve this equation in terms of p-hat. Ensure that we can express p-hat in terms of the estimated odds. By showing that p-hat is equal to the estimated odds divided by one plus the resulting odds. So, this means since we can estimate log odds for any single group from logistic regression based on an x value, we can estimate the odds for any single group based on its x value and hence estimate the proportion or probability as a function of the odds. So, the recipe is follows; the regression equation we have can be used to estimate the log odds of an outcome for any given x value. Then, they get the odds for the log odds we simply exponentiate that result and then to estimate the proportion or probability of that outcome occurring for a group with the given x value, we take that estimated odds over one plus itself and that will give us the estimated proportion with the outcome. So, remember our example with respiratory failure and gestational age, we had four gestational age groups. Even though gestational age categories are ordinal, the authors did not want to assume the relationship between the log odds of respiratory failure and gestational age category was necessarily linear so they created categorical variables for the gestational age categories and ultimately we had four different group used. The reference group was kids who were full-term 37 to 40 weeks gestational age. Then, we had three other groups each with their own indicator group; there was 34 weeks gestational age in indicator x1, the group that was 35 weeks with an indicator x2 and a group that had gestational age of 36 weeks with an indicator x3. So, again the resulting logistic regression for these data was the log odds of respiratory failure was equal to this equation here. Again even though we have three slopes and 3xs, we're only ultimately estimating for quantities based on this equation. So, we had shown before that we can compute the odds of any group by plugging in their respective x value, cranking out the log odds and exponentiate to get the odds and we'd namely done that for reference groups where the estimated log odds with the intercept of the equation. So, in this example, to commute the estimated risk or the probability or proportion, again also synonymous, a respiratory failure for the reference group, children who are full term, all Xs are zero. So, the log odds for this group when you plug in all Xs are zeroes simply equal to the intercept. It's negative 5.5. To turn this into real odds, we could exponentiate that intercept of negative 5.5. This would give us an odds of 0.0041. Again, odds under itself is sometimes hard to interpret, so it'd be nice if we could transform this into a proportion or probability. Well, we can do that now. We can estimate the proportion of children with full term gestational ages who experienced respiratory failure is by taking the odds, estimate we had there and dividing by one plus that same estimate. So, 0.0041 over 1.0041 is approximately equal to 0.004 or 0.4%. So, now from this logistical regression equation originally estimated on the log odd scale, we've done some computation and now we can state that the risk of respiratory failure, for the full term children was on the order of 0.4%. So suppose we wanted to do this for the group with gestational age of 34 weeks, we already showed that the relative odds of respiratory failure for this group compared to the reference of full term was large. The log-odds ratio is 3.4. When we exponentiated, we got an odds ratio roughly 30. But what does this bear out in terms of the risk for this group compared to the estimated risk for that reference group. So, to compute the estimated risk probability or proportion of respiratory failure for this group of gestational age at 34 weeks,we can do the following: The X values for this group are as follows; x1 is equal to one because that's the indicator of having a gestational age of 34 weeks and the other two indicators are equal to zero. So when we plug in this x1 value and do it out in the equation, we get the log odds of respiratory failure for gestational ages at 34 weeks is equal to the intercept of negative 5.5. That's our starting point for these computations, the intercept for the reference group. Then, we add this slope for 34 weeks or 3.4 and the sum of these two things give us an a log odds estimate for the gestational age of 34 weeks of negative 2.1. We can convert that to an estimated odds for this group by exponentiating that result and that gives us an estimated odds of 0.122. To translate that into an estimated probability or proportion, we can now take this estimated odds and divide it by one plus itself. So, 0.122 over 1.122 is approximately equal to 0.11 or 11%. So, based on the results of this regression, we also estimate that the risk of respiratory failure in the gestational age equal to 34 weeks groups is much higher than that of the reference group. We knew that based on the odds ratio but we didn't know what the actual proportions are here. We see it's 11% for the group with gestational age of 34 weeks. Let's look at the breastfeeding status and child's age example and look at similar computations for this. The resulting logistic regression equation for this analysis is the log odds of being breastfed, given the child's age and month, remember the age range from 12 months or 36 months in our sample was equal to an intercept of 7.29 plus a slope of negative 0.24 times the age measure. So, we might want to ask, well, certainly we see that the log odds and the odds then when we exponentiated the slope of breastfeeding decreased with each increasing month of age. But to give us some context as to how many or what proportion of children are being breastfed as opposed to just that relative comparison, it might be nice to look at some estimates for different ages. So, let's look at the estimated probability or risk or proportion of being breastfed among 24-month old children. So, in this group, we can plug in the value of 24 for age. Variable, plug it into the equation, we get the intercept of 7.29 plus the slope of negative 0.24 times our x value of 24. When all the dust settles, this is equal to an estimated log odds of being breastfed of 24-month-olds of 1.53. We can easily convert that to an odds estimate by exponentiating that sum e to the 1.53 is equal to an odds estimate of 4.6. To turn that into an estimated probability proportion of being breastfed in this age group, we can take that estimated odds of 4.6 and divide it by one plus the estimated odds. We get a result of 4.6 divided by 5.6 which gives us an estimated probability or proportion of 0.82 or 82 percent. So, this gives us some more context by giving us a risk estimate for a single group here, but we know that this group of 24-month-olds would have less likelihood to being breastfed than 12 months old and higher likelihood than 28 months olds, for example, but this gives us a sense of the order of magnitude we're working with on the risk scale. I can certainly do the same calculus at different ages. So for example, if I wanted to estimate the probability of risk of being breastfed among younger children, 16-month-old children, see how that compares to the proportion in 24-month-old children, we know it'll be larger because these children are younger and the odds of breast feeding decreases with increasing age, but we don't know by how much unless we do the computation, how much larger the estimated probability would be. So if we do this on the log odd scale for 16-month-olds, we put in the intercept of 7.29, take the slope of negative 0.24 times the 16, the age group we're dealing with, that do the math that that operation is equal to 3.45, that's the estimated log odds of being breastfed at 16 months. If we exponentiate that to get the odds, we get nods and large odds, the odds of being breastfed at 16-month according to our results is 31.5, that's a large odds, but let's now contextualize that by turning into an estimated proportion of 16-month-olds children who are breastfed. We take that estimated odds at 31.5 divided by one plus itself, so 31.5 over 32.5, we get an estimated proportion of 97 percent. Roughly 97 percent of children, 16-month-olds are being estimated in this sample, based on our logistic regression results. Since we've estimated both of these probabilities at 24 months and 16 months, it's possible we could certainly go back and compute the relative odds of being breastfed at 24 months compared to 16 months, but we can also compute the relative risk because we now have these two risk estimates. We can take their ratio to get the risk ratio, the 82 percent who were breastfed at 24 months divided by the 97 percent among those who were 16-months-old and a relative risk of being breastfed, a 0.85 or 0.85, a 0.85 for those two groups being compared. So, those children 24-months-old have 15 percent less risk of being breastfed than those children who're 16-months-old on the relative scale. We could also do the risk difference. In this case, and as we've seen from the first term, and we'll continue to see, this is not necessarily always the case; but in this case, it's actually equal to a 15 percent reduction on the absolute scale as well. So this is a rare case where the results on the relative risk scale in terms of the reduction and the results on the risk difference scale are equivalent. 15 percent reduction on the relative risk scale, 15 percent absolute reduction on the risk difference scale. But, this gives us two more ways of quantifying the result in addition to the odds ratio we would get by working with the results from logistic regression. If I were presenting this in the journal, I might present both the results of the logistic regression with the odds ratio and confidence interval for the per month change in breastfeeding odds with increase in age, but in order to contextualize and give it a sense of the magnitude that we're dealing with the risk of being breastfed as a function of h, you could present a graph like this where you go ahead and use the computer to back compute the proportion or probability of being breastfed for each of the age is in the age range that we have in our data just simply by recreating this computations we did for every age between 12 months and 36 months. Again, a computer can automate this, then we could plot a graphic that shows, based on the results of this logistic regression model the estimated probability of being breastfed as a function of age. We certainly know that the odds ratio relating breastfeeding to age was such that the odds were decreasing with age, so that would result in a decrease in proportion or probability of being breast fed with age. Although on the log odd scale, the resulting function we estimated was a linear decrease on the log odd scale. When transformed back to the probability scale, we see clear evidence of this decrease, but it is not linear on the probability or proportion scale because that's a double transformation of the original log odds scale. So, this is helpful for giving contexts. We can now see the 12-month olds, a very high proportion of being breastfed close to 100 percent, and that attenuates slowly until about 20 months when it begins to accelerate. By the time we're looking at 36-month old, three-year-olds, only about an estimated 20 percent are still being breastfed at that age. So in summary, for most types of studies, the only exception is case-control studies, the results from logistic regression can be used to estimate risk probability proportion of the outcome for different values of the predictor; and hence we can get risk differences and relative risks above and beyond the odds and odds ratio estimates we can get initially from exponentiated results of the logistic regression equation. The recipe is as follows, for a specific values of X_1 or if we have a multi-categorical value for specific combinations of the X values for our different groupings, the logistic regression equation can be used to calculate the estimated log odds of the binary outcome of interest. So, we can plug our X value into the equation to get the estimated log odds, from that, we can estimate the odds for that group, but with the given X value by exponentiating the estimated log odds and then ultimately we can estimate the proportion or probability of persons or objects in that group with the outcome by taking the group-specific odds divided by one plus itself. So, this is useful and is a reminder that despite the fact that the initial results we get are related to the odd scale, initially on the log odds scale and we can exponentiate the odds and odds ratio scale, we're not necessarily constrained or limited to only presenting results and computations on that scale. So, just like we can get estimated proportions from this result, the estimated proportions of complex doubled transformation function of some linear combination of the intercept and the slope or multiple of the slope, we can get confidence intervals for these, but we'll defer with a the computer. I wouldn't be able to do these by hand myself even given the appropriate information and I wouldn't expect you to be able to do so, but I want you to know that because these are estimates based on the imperfect data, there's some uncertainty to them, and we could get confidence limits for these estimated proportions and it would be a combination. It would involve uncertainty from both the intercept and the slope and that whole transformation process.