0:00

Hi, my name is Brian Caffo and this is Mathematical Biostatistics BootCamp

Lecture five on Conditional Probability. So in this lecture, we are going to talk

about conditional probabilities and then the associated density functions for

calculating conditional probabilities basically just called conditional

densities and we'll talk about conditional maths functions for a discreet random

variables. We'll talk about Bayes Rule and then

briefly talk about an example of Bayes Rule using diagnostic test and then we'll

talk little about the so-called diagnostic likelihood ratios.

So let me give you some brief motivation for conditional probabilities.

I think we kind of internally do these things pretty easily.

So, imagine rolling a standard die, and we're assuming that the probability of

each face is one-sixth. Suppose you didn't know the outcome of the

die but someone would give you the information that die roll is odd.

So it had to be a one, a three or a five. Two or four or six is not possible, given

this extra information. The conditional on this new information,

everyone would probably agree that probability is now one-third.

And all we're going to do in the next couple of slides is mathematically develop

these ideas a little bit more completely. So let's develop the notion of conditional

probability, just generically talking about events.

So let's let B be any event. Such that the probability of B is greater

than zero. This condition's kind of important because

it doesn't make any sense to condition on the probability of an event occurring when

that event cannot occur. So it doesn't make any sense to talk about

the probability of A given B occurred if the probability that B occurred is exactly

zero. Just to put this in words it makes no

sense to talk about the probability that a coin is head given that the coin is on

it's side if you're not going to allow for the possibility for the coin to land on

it's side. So the definition of a conditional

probability of an event is the probability of the event A occurring given that the

event B has occurred, is the probability of the intersection divided by the

probability of B. Now notice if A and B are independent.

Then the probability of A given B, well the numerator component, the intersection.

A intersect B, factors into the product of the two probabilities.

Probability of A times the probability of B.

The probability of B cancels out the numerator and denominator and you're left

with the probability of A. So this actually makes a lot of sense that

if the events A and B are independent. Then the probability of A, given that B

has occurred is simply the probability of A without knowledge of whether or not B

has occurred. That is the information about whether B

has occurred is irrelevant to the calculation of the probability of A.

This matches our intuition as to what independence means and it's nice that the

mathematics works out that way. In fact in some probability texts, this is

their definition of independence as opposed to the definition that we gave

earlier. So let's just work through the formula

given our example with the die role just to convince ourselves that it's actually

working. We want the probability of a one, given

that the die roll is odd, so in this case B is a one, three or a five.

A is just a one, so the probability of A given that B has occurred is the

probability of the intersection. And in this case A is the set containing

one, B is the set containing one, three, and five.

So A is a subset of B. So when you intersect the two you just get

A by itself. So it works out to be the probability of A

divided by the probability of B. The probability of A by itself is one

sixth, the probability of B by itself is three sixth, and we get one third.

Exactly the answer that our intuition told us.

Okay. So that ends our very brief discussion of

basic conditional probability calculations using standard events, and generic

discussion of probability. Next, we're going to talk about

conditional densities, which will be our mathematical formulation for conditional

probabilities for our continuous random variables.

So welcome back troops. We're going to be talking now about

conditional densities now that we know a little bit more about conditional

probabilities, so conditional densities or mass functions are exactly densities in

mass functions that govern the behavior of a random variable condition on the value

that the other random variable or another random variable took a different value.

So just to tie this down a little bit, let's let f(x, y) be a bivariate density

or mass function and it governs the probabilistic behavior of the random

variables, capital X and capital Y. Now, I'm going to abuse notations slightly

and let the letter f be the joint density and f(x) be the marginal density

associated with x and f(y) be the marginal density or mass function associated with

y. And it's probably not the best notation to

use f for the joint density, f for the two marginals when they're all referring to

different things. So, you know, just keep in mind that the

arguments are kind of differentiating what I'm talking about here.

This is exactly very sloppy notation but I'm using it anyway.

So just to remind you the marginal density f(y) is the joint density f(x, y)

integrated over x or if the random variables happen to be discreet.

Then f(y) is the joint mass function f(x, y) summed over x.

So, in other words if you want to know regardless of what happened with respect

to x, what is the probability behavior of the random variable y, you have to

integrate over the random variable x. Overall, the potential values you can take

with what probabilities and then you get the marginal behavior of the random

variable y. In similar, you get the marginal for x.

F(x) is the integral of the joint density over y or the sum of the joint mass

function over y. Well, the conditional density is exactly,

say for example, f(x) given y is the joint density f(x, y) or mass function, divided

by the marginal f(y). It follows actually directly from the

definition of conditional probabilities that we just gave you a couple slides ago

and that we sort of all agreed on made a lot of sense.

7:01

Let me elaborate on that point. It's in fact in the discrete case where x

can only take so many values one, two, three, four, then this definition of

conditional probability is exactly the definition that we used from events were A

is the event that x = x, and B is the event that y = y.

So there's no confusion. It exactly agrees with our definition of

conditional probability. The continuous one is a little bit harder

to kind of motivate why this is the definition.

The event that x takes on a specific value or y takes on a specific value has

probability zero for continuous random variables and so, that kind of fails our

basic premise from conditional probability associated with events that the

probability of the event that we're conditioning on has to have probability

greater than, than zero. Now, note we're talking not about

conditional probabilities, we're talking about the construction of the conditional

densities which govern the behavior of conditional probabilities.

So, we haven't violated that rule from earlier but it still kind of seems to

break the spirit of the rule and how do we get at this idea?

How can we have a meaningful definition of the probabilistic behavior or a random

variable, given that another random variable takes on a specific value.

Well, here's the motivation that I like. So, imagine if you define the event, A

that the random variable x is less than or equal to a specific value little x and the

event B is that the random variable y lies in this interval from y to y plus some

small amount, say epsilon. Then now A and B are events that have

positive probability. And we can apply our standard definition

of conditional probability to talk about the probability of the event A given that

the event B has occurred, right? That would just follow from our standard

definition. So, actually let's formulate this.

So, the probability A given B is the probability of x being less than equal to

little x, given that y is in the set y to y + epsilon.

And then now in this case, nothing has probability zero.

We can just directly apply the probabilistic formula.

And I don't think this is terribly important for this class.

I just wanted this argument be here for those who want to see it.

But then. You can just follow through the arithmetic

it's not the calculus here, and get that basically this construction.

Yields the conditional distribution function associated with the x.

Given that y = y, as we let epsilon get smaller and smaller.

So as the conditioning event gets closer and closer to y conditioning on it being

the specific value y. We limit to, conditional distribution

function associated with x. And then, remember that density functions

are derivatives of distribution functions so if we just take the derivative of this,

then we get the conditional density function.

So we can see right here that if we differentiate this conditional

distribution function, we get exactly the definition of the conditional density that

we gave you before, f(x, y) / f(y). So if you're interested in this at this

level, then you can go through those arguments carefully, and to be fair, these

only cover. The definition in the continuous case when

we have differentiable distribution functions.

But this is more than enough for our case. If you're interested in it at a deeper

level even than this, where you have mixed continuous and discrete densities, then

you can take an advanced probability course somewhere; but, for our purposes,

this is enough. And so just to summarize, we have the

conditional probability definition associated with events that kind of

governs all of our thinking about conditional probabilities and that's the

probability of A given B is the probability of A intersect B divided by

the probability of B and then in the event you are talking about random variables

what we want talk about the probability of a random variable x, given that the random

variable y has taken on a specific value. It's the joint density or mass function

divided by the marginal. And it has a nice sort of parallel with

the probability associated with events and here we've gone through the arguments to

show how we get from these statements about events to this definition for mass

functions and density functions. So conditional densities actually have a

very nice geometric interpretation. So if you have a joint density f(x, y)

that's a surface. F yields the Z value, and XY is the plain.

So f(x, y) is a joint density. It's a surface, and it's volume under the

surface has to be one for it to be a joint density.

Well what is it mean to get the conditional density of x given that y

takes a particular value. The event that y takes a particular value

that's sort of like a plane at the point, let's say y is five, at the point y equals

five, that's a plane, and that plane slices through this surface and yields a

function. That function is just f(x, y) evaluated at

the point five, f(x, five), okay? So we have this surface.

We have this plane. The y = five plane that cuts through the

surface and then we have the function that is on that plane at f(x, five).

And that is exactly the conditional density, with the exception of now it

doesn't integrate to one. So we have to normalize it by something

that integrates to one. Well, that' exactly what we divide by

there, f(5). Let's go through a specific example.

We have f(x, y) = ye^-xy - y. For, x and y both greater than zero.

Now the marginal density associated with y, let's just perform the integral.

We integrate from zero into infinity, of the joint density function over x because

we want the, marginal associated with y. And you can perform the integral.

It works out to be e^-y. And then our conditional density then f(x)

given y, is the joint density, f(x, y) / f(y).

So just churn through the calculations and you get ye^-x<i>y.< /i> And so if you</i>

wanted to know what's the conditional density, the governing behavior of the

random variable x, given that y is, say, three, then that density.

Would be 3e^-x<i>3.< /i> Okay, so you just plug in y = three.</i>

So, now this function, if you plug in any possible value of Y, this function will

now give you the associated density function for the random variable x

conditioning on the information that y takes on that specific value.