0:09
And it's continuous random variable and
we'll need it when we simulate NCAA tournament, NFL playoffs, ecetera.
Trying to figure out the chance of a team winning the football or
basketball team winning the game, given the point spread.
0:41
And the heights, pretty much following normal random variable.
And what you need to know is the mean and the standard deviation.
So the mean height of a NBA player is about 79 inches or 6 foot 7.
And the standard deviation is around 3.6 inches.
1:06
And so given a mean and standard deviation,
a normal random variable has what's called the probability density function.
And I'm using an add in called, that risk here.
So you probably don't want to try this at home.
But let me show you what the density function looks like.
I type in the mean standard deviation..
Okay now I go to [INAUDIBLE] which is for [INAUDIBLE] purposes.
Okay.
This is the density function of the height of NBA players.
So you can see, what do we know about a density function?
We'll write this down in a second.
But the height of the density,
gives the relative likelihood of the random variable being around there.
So the most likely height for a normal is the mean, 79 inches.
It also happens to be the 50th percentile on this.
1:59
The mean, the median and the most likely called the mode is in the same place,
the medians, the 50th percentile.
Okay, now if I would go to standard deviations on either area under,
should say next, area under this density function,
it's called the density function because of how dense the probability is.
Area under the density function is probable.
So if I say what's the chance somebody's between 71 inches tall, sorry 72.
That would be between six feet and six feet eight, that would be 80 inches tall.
2:36
Okay, and that would be the area under the curve.
About 58% of all NBA players should be between six feet and six feet eight.
Okay.
And the density function is what we call symmetric for the normal.
Okay, so area under the normal curve is probability.
Height is likelihood.
Total area under this curve is 1, and it's symmetric about the mean.
The mean of 79 inches, it looks the same to the left of the mean,
the density is to the right of the, so, in other words, so if I would go five inches,
let's say, below 79, to 74 inches, the height of the density would
be the same as I go five inches above, which would be 84 inches.
So in other words, if I would go 74 inches, okay, the height.
That would mean there was as many six foot two players as there are seven footers.
Okay.
4:06
So, that's the PDF for probably density function or
normal with the mean 79 inches and
the standard deviation one point six inches.
Okay, so how can I view normal probabilities in Excel?
In other words, I want to know the chance a player
is less than or equal to six feet, five inches tall.
4:39
There's a function NormDis for this, and all of this is in Excel.
So if I would say NormDis.
There's a Norm.Dis, or NormDis, it doesn't really matter.
So you say NormDis 77 inches,
should be less than a half because that's less than the mean.
Somewhere over here.
Okay, and the mean was 79.
The standard deviation was three point six, so you need the word true here.
Get the chance somebody is less or equal to six foot five.
And you get 29%.
So 29% of the players within six foot five are shorter in the NBA, according to this.
5:27
Well, you have to do one minus the probability they're less than or
equal to seven feet tall.
And you know the chance they're exactly seven feet tall is zero.
because to be seven feet tall, your height would have to be 84.000 inches.
5:54
Okay, so greater than or equal to eight percent.
This would estimate or at least seven and 12.
Okay, so
again, the functions here I would use via Excel 2013, you can see these functions.
6:09
I can just copy that, okay, so
you also sometimes want to get percentiles,
in other words, five percent of all
NBA players are taller than blank.
So that's 95th percentile, what number has 95% to the left of it and
five percent to the right.
You can use norm inverse here.
So norm disc is normal
probabilities.
Norm inverse gives percentile.
So then we'll get to rule of thumb for the normal way.
Okay, so I would say norm inverse, and I'd say the 95th percentile.
Say .95 and then the mean was what?
79.
Standard deviation, 3.6.
7:19
So five percent of the players fall around seven foot one.
And again, if I want to show that form, okay.
Now, really important thing for us, you know we need to find out why our cycle,
we talked about with Russia.
7:40
Okay, and that comes from the normal random variable.
So. I mean let's take a the basketball
players.
In other words, the chance a players height, and
this would work for any normal random variable,
is within two standard deviations of the mean, actually.
To be more precise 1.96, but we'll use 2.
So you take the area from 0 to 2 and double it by the symmetry, okay?
8:12
So, you could take norm dist, okay?
Sorry, let's just do it directly.
To find the probability somebody's within two standard deviations of the mean,
you take the probability either less than or
equal to two standard deviations above the mean.
Minus the chance there are more than two standard deviations,
that they're two standard deviations below the mean.
In other words, two standard deviations above the mean would be what, 86.2 inches.
Two standard deviations below the mean is 70.8 inches.
And so you subtract the probabilities to get the probability between them.
So I would take the mean plus two times the standard deviation.
9:02
And we have a mean of 79 and 3.6 comma true.
So that's the chance of being less or equal to be taking the probability
to the left of two standard deviations above the mean.
And again, it doesn't matter if I put the dot disk in there.
And then I go two standard deviations below the mean.
9:47
So we've got 79 plus 2 standard deviations.
And we've got 79 minus 2 standard deviations.
Oh, that's 36, sorry about that.
And there we go, 95.5%.
So that's the chance of being within two standard deviations of the mean for
a normal random variable, it's about 95%.
And that's where the idea of an outlier comes from being more in two standard
deviations away from the mean.
Chance of being within one standard deviation of the mean,
10:38
Take one dist, take the mean plus one standard deviation.
Take the mean, take the standard deviation.
That's the area to the left of one standard deviation above the mean.
And take away one standard deviation below the mean,
the area to the left of that.
[INAUDIBLE] True there.
If you do the word false, then you get the height of the bell curve, not the area.
Okay, 79- 3.6, 79.
There's the standard deviation.
And so that's the area to the left of 82.6 inches minus the area to the left of 75.4.
That should be a 79, let's check that I didn't screw that up.
79 from 3.6, okay, so
that should be 68% or so and that is right.
Okay, the chance for
a normal random variable being within one standard deviation of the mean is 68%.
Within two standard deviations, 95%.
Okay, so to close this out, and we'll come back to the normal random variable a lot,
when we try and figure out odds of teams winning the NCAA Tournament.
It turns out, the performance of a team in the NCAA Tournament is normally
distributed, the outcome of the game is normally distributed with the mean equal
to the point spread in a standard deviation about 11 points.
And in the NBA,
the standard deviation about the points spread is about 12 points.
And that let's us really simulate with what we know about Excel and
the Norm Inverse function.
We'll see how to simulate norm random variables using norm-inverse of the RAND
function.
13:19
Okay, let's do a quick example with some basketball.
So let's assume possessions don't alternate, but on every possession,
you could lose by 3 points, lose by 2 points, lose by 1 point.
13:44
And so we'll put a probability on this.
Let's say 50% of the time nothing happens.
Nobody scores.
That's a shocker.
Let's suppose 18% of the time you lose by 2 points.
20% of the time you win by two points.
So about 5%, one free throw.
5% here.
Okay, so those probabilities are 55, 73, 78.
Those add up too high.
So make this 0.4.
45, 60, 70.
Suppose 16% of the time, we make a three.
36 will give you [INAUDIBLE].
This is too high, so I put 16% here.
Okay so we're going to have to make these probabilities drop a bit,
let's make sure they add to 1.
14:46
Okay, so 1.2, so make this 0.3 and
we'll make this 0.08, okay?
Okay, now the expected margin of victory on each of these possessions,
again, I just weight the probabilities, The outcomes.
15:11
So I'd win by 0.24 points per possession, which is an awful lot.
Okay, so let's just assume there are 200 possessions here,
160 possessions for a game.
15:58
Okay, and I can make this look like it's random, there we go.
And so the random variable on each possession looks like this,
now that's not a bell curve.
But let's add up all the outcomes of these possessions and
see how much I win the game by.
16:17
And let's run that 1,000 times and you'll see it'll look like a bell curve.
It's almost magical, I mean.
Clearly, each of the rows here doesn't look like a normal random variable.
Okay, here I want like 91, or I want 580 okay, so I'm a good team here.
16:56
It'll take it a second here.
But what the add in will do is play out that cell 1,000,
5,000 times and draw a histogram or a graph of the results.
And you'll see, it should be pretty much a bell curve.
And this explains why the basketball game is made up of,
let's say, 150 to 200 possessions it explains why the total margin of victory,
as opposed to the point spread, follows pretty much a bell curve Okay.
19:11
Right there.
And you can find probabilities for the sum of the random variables.
By using the mean and standard deviation of the sum.
And basically assuming that the sum of the random variables
is normal even if the individual ones are not.
And that's again called the central limit theorem.
Explains why a lot of things in the real world look like a bell curve.
Well, we used the normal random variable a little bit in the next video when we talk
about the hot hand in our team's streaky.
Does momentum exist?
The answer is usually it does not.
And in particular when we come back to try and simulate the NCAA tournament we'll
read and talk about point spreads and probability of winning games.
How you take the point spread and figure out the chance one team will win the game
and connect that to the money line in gambling.
It'll be very important that we understand normal random variable.
So that's where we'll pause here and
then we'll talk about the hot handed streaks next video.