Okay, so we're continuing with our week four discussion of different applications of the techniques that we've been learning about. And in this lecture we're going to talk about how to compare means in two paired samples. So, looking at an example of comparing means and samples of data that are correlated with each other. So our example, we're going to look at some NHANES data where researchers want to make sure that measures of blood pressure collected in the NHANES are reliable across different sub groups. So it turns out that each NHANES respondent has two different measures of blood pressure collected just to assess the reliability of these different measurements and come up with an overall average for a given individual. So the research question that we're attacking today is for female hispanic adults living in the US in 2015-2016. So that's our specific population of interest. Did these two measures of systolic blood pressure differ significantly from each other? Our expectation if these are two reliable measures of blood pressure coming from the same individual is that these measures of blood pressure would not differ systematically across the two different measures. So the different inference approaches that we're going to consider to address this research question, first of all we're going to form a confidence interval for the mean difference. Second, we're going to perform a paired samples T-test for the mean difference. And third we're going to make sure to check assumptions for each of these different techniques. So check the assumptions that underlie the use if these different techniques. So first approach, we're going to form a confidence interval for this mean difference. So the steps that we need to do this. First of all, we would compute the difference in the systolic blood pressure measurements for each of the women in this population. So the difference is going to be defined as the second systolic blood pressure measurement, SBP2, minus the first systolic blood pressure measurement which is SBP1 in this case. So if we do that calculation for each of this women and we take the average of all those differences, the resulting mean is -0.977 and the standard deviation of those differences is 4.848. And we have a total of 911 women in this population. So our best point estimate of that mean difference is -0.977 mmHg. So, how do we interpret that? We would say that in 2015-2016, our estimate of the mean difference in systolic blood pressure measurements for all female hispanic adults was -0.977 mmHg. So, that seems like a big difference. On average what that means is that the first measurement of systolic blood pressure was larger by nearly one unit, one mmHg which is fairly large for what we expect to be very similar measurements. So let's examine the data more and check some assumptions underlying these techniques that we're going to use. So here we see some graphs. The first graph is a histogram showing the distribution of the differences that we computed for the women. So you see a histogram that presents the distribution of those blood pressure two minus blood pressure one differences. So you can that roughly, it looks like it started at zero, but it does seem to be shifted a little bit to the left of zero in terms of the overall central tendency. Second, you see a normal quantile-quantile plot. And if the distribution of differences was normal, we would expect all the data points to lie on that 45-degree line that we see in this particular plot. And we see that the distribution of points deviates a little bit and that 45-degree line which would be indicative of a normal distribution. So we have some slight concerns about deviations from normality. And if we have a large sample size like we have in this case with 911 women, we can usually rely in the central limit theorem and this assumption of normality is not as critical. Nonetheless, we're still going to check what would happen if we allow this assumption of normality to be violated in this case. We're going to check this sensitivity of our results to relaxing that assumption. Another examination of the data is to simply generate a scatter plot showing systolic blood pressure measure two on the y axis, and systolic blood pressure measure one on the x axis. And see if these two measures are in fact correlated with data. Recall that we're using procedures for paired data. So there is an assumption that the values on these two variables are in fact correlated with each other. And by looking at the scatter plot, we see strong evidence of a correlation between these two measures of systolic blood pressure. The Pearson correlation coefficient is actually 0.966, so there's a very strong linear association between these two paired measures. So there's clear evidence that the two measures are actually paired and this supports the use of paired sample procedures like the paired sample t-test for this data. So let's continue with approach one where we form a confidence interval for that mean difference. So recall that in forming a confidence interval, we're going to take that best point estimate and we're going to add or subtract a margin of error to form the interval. And how do we form that margin of error? We add or subtract a few estimated standard errors to that overall best estimate, and we'll talk more about what a few means in a second. So our sample mean of the 911 differences in the blood pressure measurements was as we saw earlier, negative 0.977, that's our best estimate. Our sample standard deviation of the 911 differences was 4.848 mmHg. So to translate that into an estimated standard error, recall that we take the standard deviation of the differences, and we divide by the square root of the sample size, which again in this case was 911. So the results of that calculation is 0.161 mmHg as the standard error of this estimated mean difference, and recall that that standard error describes the standard deviation of the sampling distribution. So again if we had gone out and taken many samples of size 911 and calculated that estimated mean difference for each of the samples, 0.161 would be the standard deviation of that distribution of estimated mean differences. So if we look at the sample mean difference relative to its standard error, it seems like that sample mean difference is actually quite large relative to the standard error. And again, that gives us some concern about the fact that this is actually a fairly large mean difference allowing for sampling variability. So, putting all that together, we form a 95% confidence interval for the population mean difference in systolic blood pressure of all female hispanic adults living in the US in 2015-2016 as -1.292, and -0.662. And remember that few standard errors, we multiplied that standard error of the mean difference by about 1.96 to form the 95% confidence interval. So we add or subtract 1.96 times that estimated standard error to arrive at this result. So what does this 95% confidence interval mean? Well, the interval does not include zero. And remember, under our null hypothesis that there is no difference in this means, it would mean that the mean difference is in fact zero. And the fact that the 95% confidence interval does not include zero suggest that there is in fact a significant difference. Plausible values for this mean difference allowing for sampling variability range from -1.292 to -0.662, and clearly that interval does not include zero. So our inference from this result would be that we have evidence that the first measure tends to be significantly larger than the second measure for this specific sub population. So in line with this evidence, we might ask why this would be. And this might go back to the NHANES data collection procedure. Maybe there was some reasons that the first measurement tended to be larger than the second measurement. But we would need to pursue reasons for this result to have emerged. So let's consider the different approach using a paired sample of t-test. Our null hypothesis in the case of this t-test is the population mean difference measurement is zero. So that's our null. In other words, the two measurements are identical to each other on average. The alternative hypothesis in this case is that population mean difference is not zero, or in other words, the two measurements are different on average. So the alternative allows the first measurement to be either greater than or less than the second measurement on average. And this supports the use of a two-tailed test for this particular test of the null hypothesis. And for this hypothesis test, we're again going to be using a significance level of 5% to be consistent with calculation of the 95% confidence interval. So again that means that our type one error rate would be 0.05, and any p values for this test less than 0.05, we generally would have evidence against the null hypothesis. What assumptions are we making? Well first of all, we assume that the sample of differences is a simple random sample. And second of all, we assume that there's a normal distribution of differences in blood pressure. And again, this is not as critical given the fact that we have a large sample size and we can rely on the central limit theorem in this particular case. But we're still going to examine the data and see if the paired measures are in fact correlated. So recall from the scatter plot that we looked at a couple slides back, this graph definitely supported this assumption. It seems like these two measures of systolic blood pressures are certainly correlated. So here's the result of that paired samples t-test. We talked about the mechanics of this kind of paired sample t-test in an earlier lecture. Our result under the stated assumptions are t-statistic is negative 6.082. The degrees of freedom for this t-statistic are 910, which is just the sample size minus one for estimation of the mean difference. And the resulting p-value for that t-statistic is less than 0.001. So, it'd be extremely unlikely to see t-statistic this extreme if the null hypothesis was actually true. So given that that p-value is less than 0.05, we would reject the null hypothesis and this supports the idea that the population mean difference and the systolic blood pressure measurements is not equal to zero. So again taking the t-test approach, we have evidence that the first systolic blood pressure measure tends to be significantly different than the second systolic blood pressure measure on average for this particular population. Now, recall we had some slight concerns about deviations from normality. In terms of the distribution of these differences that we computed for these women. So as an analyst, if you're not convinced that the differences follow a normal distribution, you can consider what's called a non-parametric test that does not assume that the differences follow a normal distribution. So we're not making parametric distributions about the distribution of the differences. Okay, we're not making assumptions that that distribution is normal. So the non-parametric analogue of the paired samples t-test is called the Wilcoxon signed rank test. And what this test does, is it uses the median to examine the location of the distribution of the differences as opposed to, again, assuming that the differences follow a normal distribution. We're not making a parametric assumption about the distribution of these differences. So here's the result of this test. And again, there's accompanying Python code to see how to generate these results. The Wilcoxon signed rank test result has a p-value, again, that's very small, less than 0.001. So in the case of this particular non-parametric test, we would reject the null hypothesis that both measures have identical medians. Again, we are referring to the median rather than the mean of the differences. So the conclusion is that our result, our conclusion that the two measures do in fact have significantly different means, that conclusion is robust to potential violations of normality. So, we have consistent evidence, really no matter how we look at it, that the two measures of systolic blood pressure differ significantly for this population. And this is irregardless of the assumptions made, and the approach used for making that inference. If really does appear on multiple fronts that the two measures are not in fact reliable and that there is significance difference in terms of the values of the two measurements. So what are we going to talk about in the next lecture? Well, now we're going to think about how to compare two proportions based on independent samples. So we're going to shift our applications from a comparison of means to a comparison of proportions in two samples that are independent of each other.