So to think more rigorously about testing hypotheses and looking for associations, we need to learn about some basic measures of association. And we'll start talking about the risk difference. Before we talk about risk difference, we need to go over a few measures of disease occurrence. Incidence, which is what we'll be focusing on here, is the number of new cases of a disease that occur in a defined population in a defined period of time. Some measures of instance include the reported cases, which are the raw number of cases reported over some period of time. The cumulative incidence, or something that's called the cumulative incidence rate, which is the total number of cases that occur in a given population over a span of time. This is sometimes called the attack rate, particularly when we're talking about outbreak investigation. And then probably, the most principled one is the incidence rate, which is the number of cases that occur per sum unit of population per unit of time. So it occurs cases per person time. So for an example, if there were three cases for every ten people for every month, we would say there were three cases per every ten person months. This all contrasts with prevalence, is the number of people that are infected with the disease in a population at a given time. And we're not going to talk much about prevalence here, except to keep in mind it is not the same thing as incidence from most diseases. So the first measure of association we'll talk about is the risk difference. So the risk difference is the difference in the incidence rate, or cumulative incidence of a disease between groups. So it's simply taking the incidence rate in population A and subtracting from that the incidence rate in population B. This measures a population difference in rates of disease, and populations with different exposures. And for risk differences, zero means no difference. So let's look at some populations below. So here we have two populations. We're looking at cases of acute watery diarrhea in one month. In population A, 3 out of 25 people get the acute watery diarrhea. In population B, 1 out of 25 people get acute watery diarrhea. So the risk difference here is 2 cases per 25 persons-months. We usually use standardize this to some nice numbers. So we might say 8 cases per 100 person-months. To think more about how to calculate risk difference let's talk about classic risk difference calculation using two by two tables. So if observe people the same amount of time, we can calculate the risk difference base on cumulative incidence. So where we have a table where we have four quadrants, we have people who have some exposure and are sick in quadrant a. People who have an exposure and are not sick in quadrant b. People who do not have the exposure or unexposed and sick in quadrant c. And people who do not have the exposure and they're not sick in quadrant d. So to calculate the risk difference, we look at the people in quadrant a divided by the number of people in both quadrants a and b. We subtract from that the people in quadrant c divided by the people in both quadrant c and d, and that gives us the risk difference. We can apply this to Johns Snow's investigation of cholera. So if we look at people who got their water from Southwark and Vauxhall, which came from downstream of London and compare those to people who got their water from the Lambeth company that got its water from upstream of London. We see that deaths in households from the Southwark and Vauxhall group there are 1,263 deaths and 38,783 people surviving in those households. Where as in the Lambeth group there are 98 deaths and 26,009 people surviving. If we work this through our formula we get a risk difference of 2.8 deaths per every 100 people. That is among people getting their water from Southwark and Vauxhall we see 2.8 deaths per every 100 people observed. So it's not always the case that we see all people from the same amount of time. So if we're seeing people for different amounts of times, we need to do our risk difference calculation based on person time. If different individuals are observed, that is at risk for different amount of times, we need to base the calculation on the total observed. So we set up our two by two table a little bit different there. There we have cell a, being the people who are sick and exposed. And then cell b has the total time observed among people who are exposed or people who had the exposure. And then cell c is the number of people who got sick who are unexposed. And cell d is the total time observed among those unexposed people. And then the risk difference is just a divided by b minus c divided by d, or the difference in incidence rates. So we can consider a different study to look at the person-timed based risk difference calculation. So we can look at the risk of coronary artery disease among participants in the Nurses Health Study. Among women currently using hormones in that study when it was conducted, a couple decades ago, there were 259 cases of coronary artery disease in 265,203 years of follow up. Among those who never used hormones, there were 662 cases of disease within 358,135 years of follow-up. And just to be clear, if it's not evident these years of follow-up are person-years of follow-up. That means if we're seeing two women for 10 years each, that's 20 years of follow-up. So obviously we can't really watch people for 265,000 years. So we can plug these numbers into the risk difference calculation. And then we get an instance rate difference of negative 8.7 cases per 10,000 person years. That is, for every 10,000 person-years, there are 8.7 fewer cases of coronary heart disease among women who currently use hormones, compared to those who never used hormones.