Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

来自 休斯敦大学系统 的课程

Math behind Moneyball

35 评分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

从本节课中

Module 5

You will learn basic concepts involving random variables (specifically the normal random variable, expected value, variance and standard deviation.) You will learn how regression can be used to analyze what makes NFL teams win and decode the NFL QB rating system. You will also learn that momentum and the “hot hand” is mostly a myth. Finally, you will use Excel text functions and the concept of Expected Points per play to analyze the effectiveness of a football team’s play calling.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, in this video we'll talk about probably the most important

random variable, the normal random variable.

And it's continuous random variable and

we'll need it when we simulate NCAA tournament, NFL playoffs, ecetera.

Trying to figure out the chance of a team winning the football or

basketball team winning the game, given the point spread.

And so basically a normal random variable's continuous can assume any

value, and a good example here would be the height on of an NBA player.

So I've looked at a lot of NBA players.

And the heights, pretty much following normal random variable.

And what you need to know is the mean and the standard deviation.

So the mean height of a NBA player is about 79 inches or 6 foot 7.

And the standard deviation is around 3.6 inches.

And so given a mean and standard deviation,

a normal random variable has what's called the probability density function.

And I'm using an add in called, that risk here.

So you probably don't want to try this at home.

But let me show you what the density function looks like.

I type in the mean standard deviation..

Okay now I go to [INAUDIBLE] which is for [INAUDIBLE] purposes.

Okay.

This is the density function of the height of NBA players.

So you can see, what do we know about a density function?

We'll write this down in a second.

But the height of the density,

gives the relative likelihood of the random variable being around there.

So the most likely height for a normal is the mean, 79 inches.

It also happens to be the 50th percentile on this.

The mean, the median and the most likely called the mode is in the same place,

the medians, the 50th percentile.

Okay, now if I would go to standard deviations on either area under,

should say next, area under this density function,

it's called the density function because of how dense the probability is.

Area under the density function is probable.

So if I say what's the chance somebody's between 71 inches tall, sorry 72.

That would be between six feet and six feet eight, that would be 80 inches tall.

Okay, and that would be the area under the curve.

About 58% of all NBA players should be between six feet and six feet eight.

Okay.

And the density function is what we call symmetric for the normal.

Okay, so area under the normal curve is probability.

Height is likelihood.

Total area under this curve is 1, and it's symmetric about the mean.

The mean of 79 inches, it looks the same to the left of the mean,

the density is to the right of the, so, in other words, so if I would go five inches,

let's say, below 79, to 74 inches, the height of the density would

be the same as I go five inches above, which would be 84 inches.

So in other words, if I would go 74 inches, okay, the height.

That would mean there was as many six foot two players as there are seven footers.

Okay.

That may not be exactly true, but

we'll assume again the height of NBA players is normally distributed.

Okay. So maybe it would help if I

copied that picture.

Put in summary.

Copy.

Let's see if I can paste that in there.

Okay.

So that's a normal random variable in sigma.

So, that's the PDF for probably density function or

normal with the mean 79 inches and

the standard deviation one point six inches.

Okay, so how can I view normal probabilities in Excel?

In other words, I want to know the chance a player

is less than or equal to six feet, five inches tall.

So that's 77 inches tall.

There's a function NormDis for this, and all of this is in Excel.

So if I would say NormDis.

There's a Norm.Dis, or NormDis, it doesn't really matter.

So you say NormDis 77 inches,

should be less than a half because that's less than the mean.

Somewhere over here.

Okay, and the mean was 79.

The standard deviation was three point six, so you need the word true here.

Get the chance somebody is less or equal to six foot five.

And you get 29%.

So 29% of the players within six foot five are shorter in the NBA, according to this.

What's the chance somebody is at least seven feet tall?

Well, you have to do one minus the probability they're less than or

equal to seven feet tall.

And you know the chance they're exactly seven feet tall is zero.

because to be seven feet tall, your height would have to be 84.000 inches.

So the chance of being less than or

equal to seven feet tall would be the areas for left of 84 inches.

We've got a mean of 79.

Standard deviation of 3.6 and then we've got true value.

Okay, so greater than or equal to eight percent.

This would estimate or at least seven and 12.

Okay, so

again, the functions here I would use via Excel 2013, you can see these functions.

I can just copy that, okay, so

you also sometimes want to get percentiles,

in other words, five percent of all

NBA players are taller than blank.

So that's 95th percentile, what number has 95% to the left of it and

five percent to the right.

You can use norm inverse here.

So norm disc is normal

probabilities.

Norm inverse gives percentile.

So then we'll get to rule of thumb for the normal way.

Okay, so I would say norm inverse, and I'd say the 95th percentile.

Say .95 and then the mean was what?

79.

Standard deviation, 3.6.

And the answer is about seven foot one.

So five percent of the players fall around seven foot one.

And again, if I want to show that form, okay.

Now, really important thing for us, you know we need to find out why our cycle,

we talked about with Russia.

So we said that anything that's more than two standard deviation away from the mean

is called an outlier.

Okay, and that comes from the normal random variable.

So. I mean let's take a the basketball

players.

In other words, the chance a players height, and

this would work for any normal random variable,

is within two standard deviations of the mean, actually.

To be more precise 1.96, but we'll use 2.

So you take the area from 0 to 2 and double it by the symmetry, okay?

So, you could take norm dist, okay?

Sorry, let's just do it directly.

To find the probability somebody's within two standard deviations of the mean,

you take the probability either less than or

equal to two standard deviations above the mean.

Minus the chance there are more than two standard deviations,

that they're two standard deviations below the mean.

In other words, two standard deviations above the mean would be what, 86.2 inches.

Two standard deviations below the mean is 70.8 inches.

And so you subtract the probabilities to get the probability between them.

So I would take the mean plus two times the standard deviation.

And we have a mean of 79 and 3.6 comma true.

So that's the chance of being less or equal to be taking the probability

to the left of two standard deviations above the mean.

And again, it doesn't matter if I put the dot disk in there.

And then I go two standard deviations below the mean.

So we've got something wrong there.

279, okay, I didn't put the standard deviation there.

So we've got 79 plus 2 standard deviations.

And we've got 79 minus 2 standard deviations.

Oh, that's 36, sorry about that.

And there we go, 95.5%.

So that's the chance of being within two standard deviations of the mean for

a normal random variable, it's about 95%.

And that's where the idea of an outlier comes from being more in two standard

deviations away from the mean.

Chance of being within one standard deviation of the mean,

Is around 68% and we can check that out.

Take one dist, take the mean plus one standard deviation.

Take the mean, take the standard deviation.

That's the area to the left of one standard deviation above the mean.

And take away one standard deviation below the mean,

the area to the left of that.

[INAUDIBLE] True there.

If you do the word false, then you get the height of the bell curve, not the area.

Okay, 79- 3.6, 79.

There's the standard deviation.

And so that's the area to the left of 82.6 inches minus the area to the left of 75.4.

That should be a 79, let's check that I didn't screw that up.

79 from 3.6, okay, so

that should be 68% or so and that is right.

Okay, the chance for

a normal random variable being within one standard deviation of the mean is 68%.

Within two standard deviations, 95%.

Okay, so to close this out, and we'll come back to the normal random variable a lot,

when we try and figure out odds of teams winning the NCAA Tournament.

It turns out, the performance of a team in the NCAA Tournament is normally

distributed, the outcome of the game is normally distributed with the mean equal

to the point spread in a standard deviation about 11 points.

And in the NBA,

the standard deviation about the points spread is about 12 points.

And that let's us really simulate with what we know about Excel and

the Norm Inverse function.

We'll see how to simulate norm random variables using norm-inverse of the RAND

function.

But why does the normal random variable often occur?

Occur in the real world?

And the answer is something called the central limit theorem.

So if you add up lots of random variables usually, say greater or equal to 30.

We say independent random variables meaning the value of one doesn't effect

the value of the other.

Even if each one is not normal, the sum will be normal.

Okay, let's do a quick example with some basketball.

So let's assume possessions don't alternate, but on every possession,

you could lose by 3 points, lose by 2 points, lose by 1 point.

Win by 1 point, win by 2 points, win by 3 points.

And so we'll put a probability on this.

Let's say 50% of the time nothing happens.

Nobody scores.

That's a shocker.

Let's suppose 18% of the time you lose by 2 points.

20% of the time you win by two points.

So about 5%, one free throw.

5% here.

Okay, so those probabilities are 55, 73, 78.

Those add up too high.

So make this 0.4.

45, 60, 70.

Suppose 16% of the time, we make a three.

36 will give you [INAUDIBLE].

This is too high, so I put 16% here.

Okay so we're going to have to make these probabilities drop a bit,

let's make sure they add to 1.

Okay, so 1.2, so make this 0.3 and

we'll make this 0.08, okay?

Okay, now the expected margin of victory on each of these possessions,

again, I just weight the probabilities, The outcomes.

So I'd win by 0.24 points per possession, which is an awful lot.

Okay, so let's just assume there are 200 possessions here,

160 possessions for a game.

And again, I'm using an add in here.

But I just want to show you each possession, my margin of victory,

will not be normally distributed, right?

It'll be this [INAUDIBLE] and I can model that,

Okay, I take the values and the probabilities, again, this uses an add in,

but it's worth seeing how easy this is to do.

Okay, and I can make this look like it's random, there we go.

And so the random variable on each possession looks like this,

now that's not a bell curve.

But let's add up all the outcomes of these possessions and

see how much I win the game by.

And let's run that 1,000 times and you'll see it'll look like a bell curve.

It's almost magical, I mean.

Clearly, each of the rows here doesn't look like a normal random variable.

Okay, here I want like 91, or I want 580 okay, so I'm a good team here.

So I'll make this an output cell, as it's called and

I'll run this 1,000 times to see how much I'd win by.

Okay.

We'll try 5,000.

Okay.

So now this will run here.

It'll take it a second here.

But what the add in will do is play out that cell 1,000,

5,000 times and draw a histogram or a graph of the results.

And you'll see, it should be pretty much a bell curve.

And this explains why the basketball game is made up of,

let's say, 150 to 200 possessions it explains why the total margin of victory,

as opposed to the point spread, follows pretty much a bell curve Okay.

So it's working here, we'll give it a second.

Okay, so it should be coming back to me in a minute.

It's trying here.

Okay.

So it's still coming over here, and

then I should definitely get a graph here of what happened.

It's trying.

Pretty slow here, but it's working on it.

Still working, but essentially the central limit theorem says you can find.

Okay, there we go.

Sorry about that.

But the central limit theorem says you can see that looks like a bell curve.

Okay?

Which is a normal limit variable.

Right there.

And you can find probabilities for the sum of the random variables.

By using the mean and standard deviation of the sum.

And basically assuming that the sum of the random variables

is normal even if the individual ones are not.

And that's again called the central limit theorem.

Explains why a lot of things in the real world look like a bell curve.

Well, we used the normal random variable a little bit in the next video when we talk

about the hot hand in our team's streaky.

Does momentum exist?

The answer is usually it does not.

And in particular when we come back to try and simulate the NCAA tournament we'll

read and talk about point spreads and probability of winning games.

How you take the point spread and figure out the chance one team will win the game

and connect that to the money line in gambling.

It'll be very important that we understand normal random variable.

So that's where we'll pause here and

then we'll talk about the hot handed streaks next video.