0:42

The most important principle is garbage in, garbage out.

And what I mean by that is that If we're going to act on data,

we really have to be quite convinced that the data are high quality.

If the data are garbage, then the conclusions have to be garbage.

So one of the first and most important steps in the analysis of any data set

is to ensure very, very high quality, whether that's a genetic test, or

a biostatistical analysis, or defining a disease.

All of those have to be done to the highest of quality standards, and it's

a very important part of analysis of all genetic and other kind of large data sets.

1:24

The standard way we have of evaluating a test is to look at its

effects in a large population, and

often in a large population of patients we have split up into two groups.

Those groups can be split up by nature, so

they could be patients who have a disease and patients who don't have a disease,

patients who take a drug, patients who don't take a drug.

Or sometimes we take a large population and we randomly assign them

to get a drug or to not get a drug, that's a randomized clinical trial.

That's the fundamental basis of deciding whether a test or

a drug is actually going to be useful in long term treatment.

So the idea is to compare the group on the right to the group on the left.

Now as we do that, we have to make sure that the groups

are chosen appropriately and comparable in other ways.

So you have to make sure, for example,

that the patients who take a drug are the same as the patients who don't

take a drug with respect to how their disease was determined,

when their disease was determined, how long they had the disease.

Whether they're African American or European can make a huge difference.

You don't want all women in one group and all men in another group.

You don't want one group to have all the diabetics and

one group not to have the diabetics.

You don't want one group to take one treatment and

another group to take another treatment.

The time honored way of doing that,

of assigning patients to one of two groups, is a randomized trial.

Sometimes nature does the randomization for us, though.

For example, if we're examining the effect of a genetic variant,

we don't have control over that.

So we compare patients with a genetic variants to patients without

a genetic variant, and we try to ensure that all those other what we would call

comorbidities, or cotherapies, are similar in the two groups of patients.

So that's a very important concept in trial design and trial evaluation.

So here we are with two groups that we're going to compare.

Now pretend that one group, both groups have 50 patients in them.

I've counted them and I made this slide.

So one group has in the group on the left,

30 out of the 50 patients have a particular trait.

And the group on the right, 15 have the particular trait.

A common way of expressing the difference between those two groups is odds ratios or

relative risks.

So the odds ratio is the ratio of the odds in the two groups.

The odds in one group is 1.5, so 30 out of 50 patients have the trait,

20 out of 50 don't, so 30 divided by 20 is 1.5.

That's the odds in one group.

The odds in the other group are 15 divided by 35, and that's 0.42.

The ratio of those two is 3.57.

So the odds ratio for that particular test,

to distinguish between one group and another is 3.57.

The relative risks are calculated in this same way.

The risk in one group is 0.6.

The risk in the other group is 0.3.

And so the relative risk is 2.0.

So odds, ratios, and relative risks are common ways of expressing the impact for

example of a genetic variant on a trait in a population.

5:19

And how about these two,

here's an example where 49 out of 50 patients have a particular characteristic,

a response to a drug or a genetic trait, and only 1 in 50 in the other group.

It's pretty obvious that those two are different, but how do we express that and

how do we quantify the differences?

What kind of cutoffs do we use for evaluating those kinds of things?

So the fundamental way of approaching that is the probability, or the P value.

An easy way to think about P values is rolling the dice.

So everybody's familiar with the idea that if you have dice that are not loaded dice,

that are fair dice, and you roll them many, many, many times, the idea that

we'll get A single, a one out of all of our rolls is one time in six.

Or it might be 10 times in 60, or 100 times in 600, but one time in six,

the number one will appear on the face of an unloaded dice.

6:16

What about if we rolled two dice?

Then the probability is one-sixth squared,

and that turns out to be 0.0278.

And if we roll the dice three times the probability that we get three ones,

all together, is even smaller, it's 0.00462.

That's about 4 times in 1000.

So if we did the experiment 12 times,

we'd get that the probability of rolling all

12 dice as ones at 4.6 x 10 to the -10.

So that's a vanishingly small population.

So if you took 12 dice and rolled them and

got that result, [COUGH] you would say, gee whiz, maybe those dice are fixed.

Or maybe that's a fluke, because boy, that is not likely to happen by chance.

That's likely to happen only if there's something strange about the experiment.

7:30

So, what's the chances that we would get two ones or two sixes?

The chances of that happening are shown on this slide, and it's around 0.05, 0.056.

So it's around 1 time in 20, it's 1 time in 18.

In conventional medical statistics, if we do an experiment over and

over and over again, and we observe a p value of less than 0.05,

we say that the chances are less than 1 in 20 that that's a fluke,

that that could happen by chance.

So if we see this result, we'd say boy, the dice must be loaded, or

we think the dice might be loaded.

8:11

So how about comparing this group of 20 people to this group of

15 people who might have a beneficial drug response or a particular genetic marker?

And the fact is that the frequencies of those two can be

compared using a statistic that will calculate the probability.

The probability there is 0.29.

So if we did an experiment [COUGH] where we took 100 people,

randomly assigned 50 to treatment one and 50 to treatment two.

We saw a beneficial response in 20 out of 50 in one group and

a beneficial response in 15 out of 50 in the other group.

We would say that the chances of that happening by chance is 0.29.

And that's pretty much by chance.

We wouldn't say that those are very different.

On the other hand, if we did exactly the same experiment and

saw a beneficial response in 3 out of 50 patients in one group and

15 out of 50 patients in the other group.

Assuming those two groups are comparable in other ways.

They're balanced with regard to ancestry, gender, age, comorbidities.

The probability of seeing that difference by chance is less than 0.05,

it's actually 0.0018,

so we would say that that looks like that effect is real.

And it's not just a chance distribution effect.

9:35

So that's probability.

Now one of the things that sometimes happens in experiments to look

at the effect of drugs in a population or

to look at the effect of a genetic marker on risk of a disease or

an unusual response is that people look more than once.

So I want to close this module with

an example that highlights the risks of doing that.

Here's the example that I gave you before.

Beneficial response in 20 out of 50 in one group and

15 in out of 50 in the other group, a probability of 0.29.

An investigator would look at that and say well,

there is no difference in treatment outcomes between the two groups.

10:33

Let me count outcomes a different way.

And now I've made a second comparison.

And now the P value is 0.2.

Suppose the first group I was testing a high blood pressure medicine,

and the first group asked, how low did the blood pressure go?

No difference.

In the second comparison, I asked, well, what was the average change in the blood

pressure, not the biggest change in the blood pressure?

Still no difference.

Well let me say, well, let me ask the question,

what was the lowest blood pressure on Fridays?

11:28

Man, look, I have a P value of 0.017 and

I've already told you that a P value of less than 0.05,

I'm going to that that's not a chance event, that's a real drug effect.

So now I'm going to say, now I have a real drug effect.

This drug lowers blood pressure well on Fridays in women.

Well, that's stupid, obviously, for many reasons.

One of which is nobody would think a drug operates differently on Fridays than

another day, usually.

But the main point of showing this is that the more comparisons you make, the more

likely it is you're going to find a set of differences that are different by chance.

But unless you recognize that you're making multiple comparisons,

you won't accept that by chance.

Now this is an example where we use four comparisons.

In some of the genetic testing that I'm going to talk about