In this section, we'll talk about issues in identifying and comparison groups that can help you test the hypothesis or relationship. So, as we said before, specific epidemiologic hypotheses gives a relation between an exposure and an outcome. So to test a hypothesis, we either want to compare the incidence of the outcome in those with and without the exposure and is this type of relationship we'll mostly focus on in this module, or we want to compare the frequency of the exposure in those with and without the outcome. This type of comparison is commonly called a case/control study. Selection of appropriate groups to compare is the key to correctly testing a hypothesis. Inappropriate comparisons lead to what is called bias, and bias means a systematic difference between the observed and the true effect. So, this is different than random errors. So, for instance, we could have a situation where I got sick and my brother got sick, but none of the women in my family got sick, and that could just be bad luck. Or we could have a situation where there was a relationship between men getting sick and women getting sick that had something to do with the design of our study and our selection of comparison groups that led us to an incorrect conclusion. So, what makes a good comparison group? First, and perhaps most importantly, the groups must be selected without regard to what is being compared between groups. If we're comparing incidence in the exposed and unexposed, groups must be selected without regard to the outcome because we're going to be counting outcomes in the exposed and unexposed groups, and we don't want to be accidentally putting more outcomes in one group than the other. If we're comparing exposure in cases and controls, group members must be selected without regard to the exposure. That is, our selection process can't make the exposure more likely in one group than the other. Selecting based on the exposure or the outcome when you don't want to and giving a bias from that is called selection bias. We also don't want to be unintentionally comparing on a factor that causes the outcome when we define groups. So, this can happen when some other factor other than the one we're looking at, is associated with our exposure of interest and the outcome. When it occurs, it's called confounding. So, let's look a little bit more deeply at these different kinds of bias. Suppose we have the hypothesis that people who get cholera are more likely to be vegetarians than those who do not. So we decide to compare people who are hospitalized for watery diarrhea, who tests positive for cholera, with those who are hospitalized for diarrhea that test negative for cholera, and this is called the test-negative design. So, to think about the population we're doing this in, look at the picture to the left. We have our meat eaters and our vegetarians in the general population, people who don't get sick are shown in green, people who get sick and go to the hospital with cholera are shown in red, and people who get sick with another disease in this case, E. coli and go to the hospital, are shown in pink. So, on the left here, we illustrate the populations being compared. So, we have two populations, meat eaters on top and vegetarians on bottom. The people in green are people who did not get sick. The people in red in the lower right of each population are people who got sick with cholera, and the people in pink also in the lower right next to the cholera cases are people who got sick with some other disease in this case, E. coli, and went to the hospital, but didn't have cholera. So, if you look at these populations, you see that meat eaters are more likely to be hospitalized with acute watery diarrhea than vegetarians. That's because meat is a common carrier of E. coli. So, in this case, 4 out of the 40 people, meat eaters developing E. coli are hospitalized, whereas 1 out of the 40 vegetarians developing E. coli are hospitalized. So, if you think about the people we're actually looking at when we do this test-negative design, we have four people with cholera, two of whom are meat eaters, and five people who don't have cholera, all of which have E. coli, and all which have E. coli, four of which are meat eaters. That means we incorrectly include the cholera cases are 2.5 times as likely to be vegetarians as non cholera cases. Because half of the cholera cases were vegetarian, and only one-fifth of the non-cholera cases were vegetarian. But this has nothing to do with vegetarians being more likely to have cholera, it has to do with meat eaters being more likely to have E. coli, hence it's a selection bias. For an example of selection biases playing off out in real life, we can go to a classic example of coffee and pancreatic cancer. So, there was a case control study done in 1981, that was looking at the association between coffee drinking and other exposures and pancreatic cancer. To run the study, the investigators found cases of pancreatic cancer and then selected controls from the other patients of the doctors who diagnosed the pancreatic cancer. They were trying to get rid of other differences in the populations by doing this. But what they didn't realize at the time was that many of these other patients had gastrointestinal problems, because of the types of doctors they were seeing and they were advised to avoid coffee. So, the controls consumed less coffee than the cases. So when the study was originally run, it was incorrectly concluded that coffee was associated with pancreatic cancer. Now, let's think about the process of confounding. Imagine we're studying a cholera outbreak, and we hypothesized that people with low BMI are more likely to get cholera than people with normal BMI. Comparison of these groups leads us to conclude that the risk of cholera in the low BMI group is 1.8 times of that in the high BMI group. This is illustrated on the left, where we have the low BMI group on the top and the normal BMI group on the bottom. People without cholera are colored green and people with cholera are colored red, and we have the nine people in the low BMI group of cholera on the right of that group, and we have the five people that developed cholera in the normal BMI group on the right, colored red. So, as I said, comparison of these two groups, nine out of 40 versus five out of 40, makes us include that the low BMI group is 1.8 times more likely to have cholera than the high BMI group. But people with higher BMI are better off and more likely to have private wells to get their water. So, this is leading to a difference in the distribution of cholera cases in the low BMI group and the normal BMI group, confounding our relationship. So, if we just compare among people who have city water, we see that there is no increase in risk for low BMI, two-tenths of those people are infected in both groups. Our relationship that we originally measured was confounded by where people get their water. So, confounding plays out in real life in the story of John Snow that we've have been coming back to, and who had happened to is William Farr who was the chief statistician for London and a contemporary of John Snow. So, William Farr initially believed the miasma theory of cholera transmission. That is, that emanating gases coming from, perhaps the tens, perhaps the sewers, were coming up through the air and making people sick with cholera. His evidence for this was that people had lower rates of cholera if they lived at higher elevations. So, those living at low elevations at the bottom of the hill were getting more cholera than those living in the higher elevations, and he thought this was powerful evidence for his miasma theory. But what he didn't realize at first is that people living at higher elevations were richer and paid for cleaner water, so that this comparison between high and low elevations was confounded by economic status and where people of different economic status got their water. Farr is eventually convinced by John Snow's work and went back and re-analyzed his data and realized the mistake he made, and became a believer in the idea that water and direct contagion transmitted cholera. So, to go over some key points for this section, appropriate comparison groups is the key to testing hypothesis. Bias can occur if you pick a bad comparison group. Accidentally picking groups based on the exposure or outcome depending on the study you're doing can lead to selection bias. Picking groups that differ by some factor associated with the disease other than what is being tested can lead to confounding. So, as an exercise, let's go back to your family dinner. You think it's the sweets that are causing the diarrhea. Pick an appropriate comparison group to sweet eaters to determine if is the cause. Then do the same thing for comparing cases and non-cases. Why did you pick what you picked? Are there specific biases you are trying to avoid?