0:00

In this lesson we work on the same research question on the effectiveness

of RU-486 as a morning after pill that we introduced in the previous lesson.

However, this time we answer the question using a Bayesian approach.

Let's start with a quick reminder of the framework.

We had decided on considering only the 20 total pregnancies,

four of which occur in the treatment group.

And the question we're asking is,

how likely is it that four pregnancies occur in the treatment group?

Also remember that we had decided that if the treatment and control are equally

effective, and the sample sizes for the two groups are the same,

then the probability that the pregnancy comes from the treatment group is 0.5.

0:38

Within the Bayesian framework, we will also start by setting our hypotheses, or

we can think of these as the models that the data come from.

We begin by delineating each of the models we consider plausible.

We know that p, the probability that a pregnancy comes from the treatment group

can take on any value between zero and one.

However, we'll start slow and instead of considering a continuous parameter space

for p, we will assume that it is plausible that the chances that a pregnancy comes

from the treatment group is 10% or 20% or 30% or 40% or all the way up to 90%.

Hence, we're considering nine models,

not just one model, as was the case for the classical frequentist paradigm.

Let's pause for a second and think about what it means for p to be equal to 20%.

This means that given a pregnancy occurs, there is a two to eight,

or one to four chance that it will occur in the treatment group.

1:35

Next, we need to specify the prior probabilities we want to

assign to these hypotheses.

The prior probabilities should reflect

our state of belief prior to the current experiment.

They should incorporate the information,

learn from all relevant research up to the current point in time.

However, should not incorporate information from the current experiment.

Suppose my prior probabilities for

each of the nine models is presented in this table.

2:03

I placed to have a prior at p is equals to 0.5, a prior probability of 52% and

equally divided the remaining probability among the other models.

This equal distribution implies that the benefit of the treatment is symmetric,

that the treatment is equally likely to be better or

worse than the standard treatment.

And the 52% prior at peak equals 0.5 implies that we believe

that there's a 52% chance that there is no difference between the treatments.

One natural question that you might have at this point is how

did you come up with those priors?

We will discuss prior specification in detail later in the course, so for

now let's stick with the chosen priors and work through

the mechanics of calculating the posterior probabilities and making a decision.

Now we're ready to calculate the probability of observed data,

given each of the models that we're considering.

This probability is called the likelihood.

In this example, this is simply the probability of the data, given the model.

Which can be written as the probability that k is equal to 4,

given that n is equal to 20, and the various values of p we decided to

consider as plausible models, 10% through 90%.

As we did in the previous video,

we can express the probability of a given number of successes

in a given number of independent trials with a binomial distribution.

We consider a sequence of probabilities of success from 10% to 90%,

increasing by 10%, we assign a 52% prior probability to p equals 0.5,

and 6% probabilities to all other models.

We won't actually use these prior probabilities in the calculation of

the likelihood, but they will become relevant for

the calculation of the posterior in the next slide.

Finally, we can calculate the likelihood as a binomial with four successes and

20 trials, when p is equal to the variety of values we're considering.

The results are summarized in this table.

The header row lists the models that we're considering, and in the next row,

the priors we discussed earlier are shown.

The last row of the table lists

the likelihood calculated using the binomial distribution.

The number of successes and the number of trials are the same for

each of these likelihoods, four and 20 respectively.

However, each likelihood listed uses a different probability of success

based on which model is based on.

4:30

Once the models are delineated and priors are expressed and the data are collected,

we can use Bayes' rule to calculate the posterior probability.

In other words, the probability of the model given the data.

So here's a reexpression of the Bayes' rule for model and data.

The probability of model, given data,

will be the probability of model and data divided by the probability of the data

5:08

We can once again do all of these calculations in r.

The numerator is simply the vector of prior probabilities we defined earlier and

the likelihood data given model we calculated.

The denominator is simply the sum of the probabilities for

the various models in the numerator.

This mimics the calculation based on probability trees that we've seen before.

Where the denominator sums up all possible probabilities

where the data might be coming from.

We also check to make sure that the posterior probabilities add up to one,

which they do.

The posterior probabilities are summarized in this table.

We can see that the posterior probability is highest at p is equal to 0.2.

So this model is the most likely model, based on the observed data.

The posterior probability at p is equal to 0.2 is 42.48%.

Even though we had assigned a low prior to this model,

the incorporation of the data gave this model a high probability.

This shouldn't be surprising, since four successes in 20 trials is basically 20%.

So the calculation of the posterior incorporated prior information and

likelihood of the data observed and the concept of data,

at least as extreme as observed placed no part in the Bayesian paradigm.

Finally note that the probability that p is equal to 0.5,

dropped from 52% in the prior to about 7.8% in the posterior.

This demonstrates how we update our beliefs based on observed data.

7:04

The Bayesian paradigm, unlike the frequentist approach,

also allows us to make direct probability statements about our models.

For example, we can calculate the probability that RU-486,

the treatment, is more effective than the control

as the sum of the posteriors of the models where p is less than 0.5.