[MUSIC] As we move to a future where we start to individualized therapy based on genetic testing, or other kinds of personalized treatments, we have to think about each time we implement one of those personalized treatments. What the statistical basis and what the evidence basis is, for treating one patient differently from all the rest. My intent here is not to review all of bio-statistics. But over this module and the next module I want to review some of the very basic principles that we use when we start to evaluate large data sets in the area of personalized medicine. The most important principle is garbage in, garbage out. And what I mean by that is that If we're going to act on data, we really have to be quite convinced that the data are high quality. If the data are garbage, then the conclusions have to be garbage. So one of the first and most important steps in the analysis of any data set is to ensure very, very high quality, whether that's a genetic test, or a biostatistical analysis, or defining a disease. All of those have to be done to the highest of quality standards, and it's a very important part of analysis of all genetic and other kind of large data sets. The standard way we have of evaluating a test is to look at its effects in a large population, and often in a large population of patients we have split up into two groups. Those groups can be split up by nature, so they could be patients who have a disease and patients who don't have a disease, patients who take a drug, patients who don't take a drug. Or sometimes we take a large population and we randomly assign them to get a drug or to not get a drug, that's a randomized clinical trial. That's the fundamental basis of deciding whether a test or a drug is actually going to be useful in long term treatment. So the idea is to compare the group on the right to the group on the left. Now as we do that, we have to make sure that the groups are chosen appropriately and comparable in other ways. So you have to make sure, for example, that the patients who take a drug are the same as the patients who don't take a drug with respect to how their disease was determined, when their disease was determined, how long they had the disease. Whether they're African American or European can make a huge difference. You don't want all women in one group and all men in another group. You don't want one group to have all the diabetics and one group not to have the diabetics. You don't want one group to take one treatment and another group to take another treatment. The time honored way of doing that, of assigning patients to one of two groups, is a randomized trial. Sometimes nature does the randomization for us, though. For example, if we're examining the effect of a genetic variant, we don't have control over that. So we compare patients with a genetic variants to patients without a genetic variant, and we try to ensure that all those other what we would call comorbidities, or cotherapies, are similar in the two groups of patients. So that's a very important concept in trial design and trial evaluation. So here we are with two groups that we're going to compare. Now pretend that one group, both groups have 50 patients in them. I've counted them and I made this slide. So one group has in the group on the left, 30 out of the 50 patients have a particular trait. And the group on the right, 15 have the particular trait. A common way of expressing the difference between those two groups is odds ratios or relative risks. So the odds ratio is the ratio of the odds in the two groups. The odds in one group is 1.5, so 30 out of 50 patients have the trait, 20 out of 50 don't, so 30 divided by 20 is 1.5. That's the odds in one group. The odds in the other group are 15 divided by 35, and that's 0.42. The ratio of those two is 3.57. So the odds ratio for that particular test, to distinguish between one group and another is 3.57. The relative risks are calculated in this same way. The risk in one group is 0.6. The risk in the other group is 0.3. And so the relative risk is 2.0. So odds, ratios, and relative risks are common ways of expressing the impact for example of a genetic variant on a trait in a population. And here are some examples for you to work on at home. Now one of the other questions we would ask ourselves if we had an example like this, suppose there are 20 patients with a trait out of 50 in one group and 15 patients with a trait In the other group. Or pretend that we were evaluating a drug, and 20 patients in one group had a beneficial response and 15 patients in the other group had a beneficial response. Are those two frequencies really different in a believable kind of way? And how about these two, here's an example where 49 out of 50 patients have a particular characteristic, a response to a drug or a genetic trait, and only 1 in 50 in the other group. It's pretty obvious that those two are different, but how do we express that and how do we quantify the differences? What kind of cutoffs do we use for evaluating those kinds of things? So the fundamental way of approaching that is the probability, or the P value. An easy way to think about P values is rolling the dice. So everybody's familiar with the idea that if you have dice that are not loaded dice, that are fair dice, and you roll them many, many, many times, the idea that we'll get A single, a one out of all of our rolls is one time in six. Or it might be 10 times in 60, or 100 times in 600, but one time in six, the number one will appear on the face of an unloaded dice. What about if we rolled two dice? Then the probability is one-sixth squared, and that turns out to be 0.0278. And if we roll the dice three times the probability that we get three ones, all together, is even smaller, it's 0.00462. That's about 4 times in 1000. So if we did the experiment 12 times, we'd get that the probability of rolling all 12 dice as ones at 4.6 x 10 to the -10. So that's a vanishingly small population. So if you took 12 dice and rolled them and got that result, [COUGH] you would say, gee whiz, maybe those dice are fixed. Or maybe that's a fluke, because boy, that is not likely to happen by chance. That's likely to happen only if there's something strange about the experiment. So let me go back to this, this is 1 in 36, 0.0278. Now, sometimes we ask, we might ask the question, what's the chances when we roll the dice that we're going to get a very high value or a very low value? So, what's the chances that we would get two ones or two sixes? The chances of that happening are shown on this slide, and it's around 0.05, 0.056. So it's around 1 time in 20, it's 1 time in 18. In conventional medical statistics, if we do an experiment over and over and over again, and we observe a p value of less than 0.05, we say that the chances are less than 1 in 20 that that's a fluke, that that could happen by chance. So if we see this result, we'd say boy, the dice must be loaded, or we think the dice might be loaded. So how about comparing this group of 20 people to this group of 15 people who might have a beneficial drug response or a particular genetic marker? And the fact is that the frequencies of those two can be compared using a statistic that will calculate the probability. The probability there is 0.29. So if we did an experiment [COUGH] where we took 100 people, randomly assigned 50 to treatment one and 50 to treatment two. We saw a beneficial response in 20 out of 50 in one group and a beneficial response in 15 out of 50 in the other group. We would say that the chances of that happening by chance is 0.29. And that's pretty much by chance. We wouldn't say that those are very different. On the other hand, if we did exactly the same experiment and saw a beneficial response in 3 out of 50 patients in one group and 15 out of 50 patients in the other group. Assuming those two groups are comparable in other ways. They're balanced with regard to ancestry, gender, age, comorbidities. The probability of seeing that difference by chance is less than 0.05, it's actually 0.0018, so we would say that that looks like that effect is real. And it's not just a chance distribution effect. So that's probability. Now one of the things that sometimes happens in experiments to look at the effect of drugs in a population or to look at the effect of a genetic marker on risk of a disease or an unusual response is that people look more than once. So I want to close this module with an example that highlights the risks of doing that. Here's the example that I gave you before. Beneficial response in 20 out of 50 in one group and 15 in out of 50 in the other group, a probability of 0.29. An investigator would look at that and say well, there is no difference in treatment outcomes between the two groups. Now the investigator might be very invested in this drug, he might say well, let me ask the question a little bit differently, and let me count outcomes in a little bit different way. Let me count outcomes a different way. And now I've made a second comparison. And now the P value is 0.2. Suppose the first group I was testing a high blood pressure medicine, and the first group asked, how low did the blood pressure go? No difference. In the second comparison, I asked, well, what was the average change in the blood pressure, not the biggest change in the blood pressure? Still no difference. Well let me say, well, let me ask the question, what was the lowest blood pressure on Fridays? Wow, boy, that's pretty interesting, because the lowest blood pressure on Fridays has a P value of 0.053. So maybe this drug does something special on Fridays. So let me now ask the question, what about the effect of the drug on Fridays in women? Man, look, I have a P value of 0.017 and I've already told you that a P value of less than 0.05, I'm going to that that's not a chance event, that's a real drug effect. So now I'm going to say, now I have a real drug effect. This drug lowers blood pressure well on Fridays in women. Well, that's stupid, obviously, for many reasons. One of which is nobody would think a drug operates differently on Fridays than another day, usually. But the main point of showing this is that the more comparisons you make, the more likely it is you're going to find a set of differences that are different by chance. But unless you recognize that you're making multiple comparisons, you won't accept that by chance. Now this is an example where we use four comparisons. In some of the genetic testing that I'm going to talk about when we get to genome-wide association studies for example, we end up testing 500,000 or a million times. So you really have to accept the idea that if you do a million tests, 50,000 of those are going to be positive by chance alone. We have to take into account this multiple comparisons question. [MUSIC] [SOUND]