0:00

>> We're now going to review some of the basic concepts from probability.

We'll discuss expectations and variances, we'll discuss Bayes' theorem, and we'll

also review some of the commonly used distributions from probability theory.

These include the binomial and Poisson distributions as well as the normal and

log normal distributions. First of all, I just want to remind all of

us what's a cumulative distribution function is.

A CDF, a cumulative distribution function is f of x, we're going to use f of x to

denote the CDF and we define f of x to be equal to a probability that a random

variable x is less than or equal to little x.

Okay. We also, for discrete random variables,

have what's called a probability mass function.

Okay. And a probability mass function, which

we'll denote with little p, it satisfies the following properties.

P is greater than or equal to 0, and for all events, A, we have that the

probability that x is in A, okay, is equal to the sum of p of x over all those

outcomes x that are in the event A. Okay.

The expected value of a discrete random variable, x, is then given to us by this

over here. So, it's the sum of the possible values of

the random variable x. These are the xi's, weighted by their

probabilities, p of xi. So, that's the expected value of x.

If I was to give you an example. Suppose, for example I tosses a dice.

So, it takes on 6 possible values. Okay, 1, 2, 3, 4, 5, and 6.

Okay. And it takes on each of these values with

probability, so that's wp, with probability 1 6th, with probability 1 6th

and all the way down 1 6th. So, in this case, for example, the

probability that x is greater than or equal to 4 is equal to, well, it's 1 6th

for 4, 1 6th for 5, and 1 6th for 6, so that's equal to 1 6th plus 1 6th plus 1

6th equals 1 half. Likewise, we can compute the expected

value of x. In this case, it is equal to 1 6th times 1

plus 1 6th times 2, and so on, plus 1 6th times 6.

And that comes out to be 3 and a half. Okay.

So, we also have the variance of a random variable.

It's defined as the expected value of x minus the expected value of x, all to be

squared. And if you expand this quantity out, you

can see that you'll also get this alternative representation, so that the

variance of x is also equal to the expected value of x squared minus the

expected value of x, all to be squared. Okay.

So, there, a discrete round of variables, probability mass function, and so on.

So, let's look at a couple of distributions.

The first distribution I want to talk about is the binomial distribution.

We say that a random variable x has a binomial distribution, and we write it as

x tilde binomial, or bin n, p, if the probability that x is equal to r, is equal

to n choose r times p to the r by 1 minus p to the n minus r.

And for those of you who have forgotten, n choose r is equal to n factorial divided

by r factorial times n minus r factorial. So, the binomial distribution arises, for

example, in the following situation. Suppose we toss a coin n times, and we

count the number of heads. Well then, the total number of heads has a

binomial distribution and we're assuming here that these are independent coin

tosses so that the result of one coin toss has no impact or influence on the, the

outcome of other coin tosses. The mean and variance of the binomial

distribution are given to you by these quantities here.

So, the expected value of x equals np, the variance of x equals np times 1 minus p.

Now, there's actually an interesting application of the binomial distribution

to finance. And it actually arises in the context of

analyzing fund manager performance. We'll actually return to this example

later in the course. But let me just give you a little flavor

of it now. So, suppose, for example, a fund manager

outperforms the market in any given year, with probability p.

And that she underperforms the market at probability 1 minus p.

So, we're assuming here that the fund manager either outperforms or

underperforms the market, only two possible outcomes.

And that they occur with probabilities p and 1 minus p respectively.

Suppose this fund manager has a track record of ten years, and that she has

outperformed the market in eight of these ten years.

Moreover, let's assume that the performance, the fund manager performance

in any one year is independent of the performance in other years.

So, a question that many of us would like to ask is the following.

How likely is a track record as good as this outperforming eight years out of ten,

if the fund manager had no skill? And, of course, if the fund manager had no

skill, we could assume maybe that p is equal 1 half.

Okay. So, actually, we can answer this question

using the binomial model, or the binomial distribution.

So, let x be the number of outperforming years.

Since the fund manager has no skill, then there are ten years, and the total number

of outperforming years x, is then binomial, with n equals 10, 10 years, and

p equals a half, okay? So, we can then compute the probability

that the phone manager does at least as well as outperforming in eight years out

of ten, by calculating the probability that X is greater than or equal to 8.

So, what we're doing here is calculating the probability that the fund manager

would have 8, 9, or 10 years out of 10 in which she outperformed the market.

And that is given to us by the sum of these binomial probabilities here.

So, these were the original binomial probabilities on each slide, and we summed

them from r equals 8, to n. And n, in this case, of course, is 10,

okay? So, that's one way to try and evaluate

whether the fund manager has just been lucky or not.

One can compute this probability and if it's very small, then you might conclude

that the fund manager was not lucky and that she had some skill.

But actually, this opens up a whole can of worms.

There are a lot of other related questions that are very interesting.

Suppose there are M fund managers, how well should the best one do over the

ten-year period if none of them had any skill?

So, in this case, you don't have just one fund manager as we had in this example so

far, we now have M of them, okay? And it stands to reason that even if none

of them had any skill, then as M gets large, you would expect at least one of

them or even a few of them to do very well.

Well, how can you analyze that? Again, you can use the binomial model and

what are called order statistics of the binomial model to do this.

And we'll actually return to this question later in the course.

Okay. So, let's talk about another distribution

that often arises in finance and financial engineering, that is the Poisson

distribution. We say, that x has a Poisson lambda

distribution so lambda is the parameter of the distribution.

If the probability that x equals r is equal to the lambda to the power of r

times e to the minus lambda, divided by r factorial.

And for those who have forgotten factorials, I also used it in the binomial

model a while ago. R factorial is equal to r times r minus 1

times r minus 2, all the way down to 2 times 1.

Okay. So, this is the Poisson distribution.

The expected value and the variance of a Poisson random variable are identical and

equal to lambda. So, for example, we'll actually just show

this result here. It's very simple and the mean is

calculated as follows. We know that the expected value of x is

equal to the sum of the possible values of x, so these are the r's, times the

probability that x is equal to r and r runs from 0 to infinity.

We can calculate that as follows. So, we have the summation of r and the

probability that X equals r. We know from up here, okay, and we can

substitute that down in here and now, we just evaluate the sum.

The first thing to notice is that when r equals 0, this term in the sum is equal to

0. So, we can actually ignore the 0, the

first element, the 0 element and replace the summation running from r equals 1.

So then, we get this quantity here. We can cancel this r out with the first r

up here and write, this is r minus 1 factorial.

We can also pull one of these lambdas out here leaving us with a lambda to the r

minus 1. And now, if we look at this quantity here,

this summation here, we see that this is the same as changing this to run from r

equals 1 to r equals 0 and replacing r minus 1 with r and r minus 1 factorial

with r factorial here. This total we see is equal to the sum of

the probabilities. These are the probability that x equals r,

so this is the sum of the probabilities that x equals 0, x equals 1, x equals 2,

so this is equal to 1. The total sum of probabilities must be

equal to 1, so this is equal to lambda. Okay, let's talk a little bit now about

Bayes' theorem. Let A and B be two events for which the

probability of B is nonzero, then the probability if A given B, and this is

notation we'll use throughout the course, this vertical line means it's a

conditional probability. S,o it's the probability of A given that B

has occurred, well, this is equal to the probability of A intersection B divided by

the probability of B. Alternatively, we can actually write this,

this numerator probably of A intersection B, as being the probability of B given A

by the probability of A. So, this is another way to write a Bayes'

theorem. And finally, if we like, we can actually

expand the denominator here, the probability of B, and write it as the

summation of the probability of B given Aj, by the probability of Aj.

Let me sum over all Aj's. For the Aj's, form a partition of the

sample-space. What do I mean by partition?

Well, I mean the following. So, Ai intersection Aj is equal to the

null set, for i not equal to j, and at least 1 Ai, at least, at least one Ai must

occur. And, in fact, because Ai intersection Aj

is equal to the null set, for i not equal to j, I can actually replace this

condition with the following, exactly one Ai must occur.

Okay. So, that's Bayes' theorem.

Let's look at an example. So, here's an example where we're going to

toss 2 fair 6-sided dice. So, Y1 is going to be the outcome of the

first toss, and Y2 would be the outcome of the second toss.

X is equal to the sum of the two, and that's what we plotted in the table here.

So, for example, the 9 here comes from the 5 on the first toss and 4 on the second

toss. So, 4 plus 5 equals 9.

So, that's X equals Y1 plus Y2. So, the question we're interested in

answering is the following. What is the probability of Y1 being

greater than or equal to 4, given that x is greater than or equal to 8?

Well, we can answer this using this guy here on the previous slide.

So, this is equal to the probability that Y1 is greater than or equal to 4 and X is

greater than or equal to 8, divided by the probability that X is greater than or

equal to 8. Okay.

So, how do we calculate these two quantities?

Let's look at the numerator first of all. So, we need two events here.

Y1 must be greater than or equal to 4 and X being greater than or equal to 8.

Okay. So, the first event is clearly captured

inside this box here, okay, because this corresponds to Y1 being greater than or

equal to 4. So, all of these outcomes correspond to

that event. The event that X is greater than or equal

to 8 corresponds to this event or these outcomes.

So therefore, the intersection of these two outcomes, where Y1 is greater than or

equal to 4 and X is greater than or equal to 8, is this area here, which is very

light, so let me do it a little bit darker.

So, it's this area here. Now, each of these cells is equally

probable and occurs at probability 1 over 36.

There are a total of 3, 4, 7, plus 5, 12. So that's 12 cells here.

So, the numerator occurs with probability 12 over 36.

And the, the denominator, the probability that X is greater than or equal to 8,

well, that's what we highlighted in the red here.

And the probability of that occurring, well, there's 12 plus these 3 additional

outcomes equals 15 outcomes. So, that's 15 over 36, and that is equal

to 4 over 5. So, that's our application of, of Bayes'

theorem. Okay.

So, let me talk a little about continuous random variables.

We say a continuous random variable x has a probability density function, or a PDF,

f. If f of x is greater or equal to 0, and

for all events, A, the probability that x is in A, or the probability that A has

occurred is the integral of the density, f of y, dy over A.

The CDF, cumulative distribution function, and the PDF are related as follows, f of x

is equal to the integral from minus infinity to little x of f of y dy.

And, of course, that's because we know that f of x, by definition, is equal to

the probability that X is less than or equal to x, so this, of course, is equal

to the probability that minus infinity is less than or equal to X, is less than or

equal to little x. So, this is our event A here and this

definition here. So, A is now integrated from, A is now the

event minus infinity less than or equal to the random variable x, less than or equal

to little x, so that's what we have over here.

So, it's often convenient to recognize the following, that the probability that x is

in this little integral here, x minus epsilon of 2 and x plus epsilon over 2.

Well, that's equal to this integral, x minus epsilon over 2 to x plus epsilon

over 2 times f of y dy, okay? And if you like, we can draw, something

like this. So, this could be the density, f of x.

This is x here, maybe we've got some point here which is little x, and this is x

minus epsilon over 2. This is x plus epsilon over 2.

So, in fact, what we're saying is that the probability is this shaded area, and it's

roughly equal to this value, which is f of x times epsilon, which is the width of

this interval here, okay? And, of course, the approximation clearly

works much better as epsilon gets very small.

Okay. So, there are continuous random variables.

Let me talk briefly about the normal distribution.

We say that X has a normal distribution or write X tilde N mu sigma squared if it has

this density function here. So, f of x equals 1 over root 2 pi sigma

squared times the exponential of minus x minus mu, all to be squared divided by 2

sigma squared. The mean and variance are given to us by

mu and sigma squared respectively. So, the normal distributions are very

important distribution in practice, its mean is at mu, its mode, the highest point

in the density is also at mu and approximately 95% of the probability

actually lies within plus or minus 2 standard deviations of the mean.

So, this is approximately equal to 95% for a normal distribution.

Okay. So, this is a very famous distribution.

It arises an awful lot in finance. It certainly has its weaknesses and we'll

discuss some of them as well later in the course.

A related distribution is the log-normal distribution.

And we will write that x has got a log-normal distribution with parameters mu

and sigma squared if the log of x, is normally distributed with mean mu and

variance sigma squared. The mean and variance of the log-normal

distribution as given to us by these two quantities here, and again, the log-normal

distribution plays a very important role in financial applications.