Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

来自 休斯敦大学系统 的课程

Math behind Moneyball

34 评分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

从本节课中

Module 3

You will learn how Monte Carlo simulation works and how it can be used to evaluate a baseball team’s offense and the famous DEFLATEGATE controversy.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, in the next couple of videos we're going to introduce you to the exciting

topic of resample.

So many of you have probably taken a statistics class in the past, and

you did something called hypothesis testing.

Like you might try and test whether two populations have the same mean or

different means.

And there are so many tests.

There's the two sample z test.

There's two sample t test, matched pairs,

unequal variances, equal variances.

And I am sure most of you, including professors,

don't remember how these tests work.

Okay, hypothesis testing is really for the PhDs to remember this stuff.

But there is a really simple unified approach to circuit

testing hypothesis called resampling, okay?

And it doesn't make any assumptions.

Like, for instance, some of these tests assume populations are normally

distributed and we can use this in a couple of videos to analyze the flaking,

which has been a big sports and math problem.

Or we could use it, you'll see, in the next spreadsheet,

which may be this video or the following video, does the training

technique actually significantly improve the performance of a player?

Okay, well lets start with a non-sport example.

Let's supposed we have 12 people who tragically

have very advanced, let's say prostate cancer.

Okay, and so the old treatment, the per treatment,

only two of the six people survived and then we gave a a new drug, or

a new treatment to six people who really we think are identical in pretty

much how the cancer had spread and thankfully five out of six survived.

What is the probability that the new treatment is better than

the old treatment?

And you might recognize this as a test difference between proportions, and I mean

I don't want to do it that way because every test, I've gotta teach you a new set

of math, but resampling fits in well with simulation, and it's really simple.

Okay, here's what you do.

You generate a new set of six people who took the new treatment by sampling with

replacement the six people, five of whom survived and one, unfortunately, did not.

And then you generate a new set of six people from the old treatment.

And then basically, using the old treatment [INAUDIBLE].

Then you look at the number of survivors in the resampled data for

the new treatment and the old treatment, take the difference.

The probability of the new treatment is better is simply the fraction of let's say

1000 iterations on a data table,

that the number of survivors with the new treatment beats the old treatment.

Okay, so we could resample.

We just need the number one through six.

So we can resample the new.

So we can take a ran between 1 though 6,

and we might get the same person twice, that's okay.

So I think that's six people.

Notice I got three 6s there, beware of that, watch the omen.

Okay, so the actual result.

So I could use the index function.

I could say =index, see how versatile this function is,

in this array pick off from this row in the first column.

So when this is a five, I pick off the fifth person.

When this is a three, I pick off the third person.

So that 2 means I pick up the first person like that.

Okay, and so only when I pick up the sixth person do I get a 0.

So now I could do the same thing, I could resample the old tree.

So I could do a ran between 1 through 6.

And the actual result, Again, I

do an index function on these guys' dollar sign using the F4 key to dollar sign.

This is the row, and this is the column.

So when this is a 2, I get a 1.

When it's anything higher than 2, I get a 0.

Okay, now so let's look at total survivors.

So I would just add this column.

Well six survived, that's great.

And then I'd copy this over here.

That's one and I take the difference.

So I take the new survivors minus the old survivors.

And I'll run it 1,000 times and

see what fraction of the time that the new survivors are better, but

the number of new survivors minus old survivors is greater than 0.

Oh, let's run it 5,000 times, oh, well, 1000 times.

So I'll make that a 1.

You should know by now, fill series.

And you go 1 through 1,000 in columns.

And the output cell,

from the resampling is the new survivors minus the old survivors.

So you date the data table.

Data what if analysis.

It's not going to be.

And we'll do a column input cell, that's a blank cell.

And we should count how many times, so

the probability of the new treatment is better.

You should count, how many of those thousand do we get a positive number?

So it's a count if [INAUDIBLE] dollar sign.

But I can say greater than >0 and divide by 1,000.

And I get about 95%, it's going to close to that 95%, 94%.

I run this 1,000 times.

So there's around a 95% chance the new treatment is better.

And usually in statistics we want a 95% level of proof.

And this is amazing, because we've only got 12 people.

Okay, or you could couch this in terms of sports.

If I get five out of six free throws, and you get two out of six,

what's the chance I'm the better foul shooter?

And it's around 95%.

There's only about 5%.

Here's 93%.

I think it would come out if I ran more arrangements a little less than 95%.

But I never had to talk about binomial random variable, or

any distributional assumptions, which is really nice.

Okay, so we're on a roll here.

So let's do the next example here, which is training.

Okay, so let's suppose you've got a training technique.

Are people stronger after they worked on your training technique for,

let's say, a month, okay?

So, what we've got here is a straight rating on each person before

they entered the training program and after they entered the training program.

Okay, so some times they got worse.

This guy got worse, the higher number is better.

This guy got better.

He went from 4 to 9, and so the question is,

is the result after significantly better than the results before?

Okay.

And so what we can do is,

look at the difference if we want to know what's the chance

The mean strength after is greater

than the mean strength before.

And it's above 95%.

And we sort of consider that we've proven that this is a really good training

technique.

Okay, so we gotta do, let's move this a little bit.

Okay, so

we were going to do a random number against how many people we got there, 14.

Okay, we're going to resample.

So I do a rand between 1 through 14.

I can repeat the people Like this.

So notice I'll probably repeat somebody.

With two tens there.

So we then pick off the difference, we could use a v lookup here.

We've got this on the left.

Or I could use index function, but let's do v lookup.

Off the re-sample thing, and

it's in this range, I would take the fourth column.

It doesn't matter if I say true or false here.

Ok, so when I get a 1, the difference is 0,

when I resample the third person the difference is 0, so

I'll get some positive differences and I'll get some negative differences.

There's a -1 like I picked the fourth person here, and

that actually made the person less strong afterwards.

Notice these results are paired because the only difference is after minus before,

the people were similar.

So now I would take the average, what's the mean resampled improvement?

Let's say run this 1,000 times and see what's the chance it's positive.

Oh, let's run it 5,000 times this time.

So I will put a 1 there, And let's go home.

Fill > Series >

Columns.

1 through 5000.

And 8 improvement.

Do a data table, Ctrl+Shift, right arrow, Ctrl-Shift, down arrow.

Data what if analysis.

Column input cell is blank.

And let her rip there.

Okay, now how many of those,

the resampled improvement is positive?

And it's really high.

I can countif, how many of these guys are greater than zero?

So greater than zero, and I divide by 5,000.

99% chance, and if I hit F9 again,

I think it's pretty close to 99, looks like 99% chance,

That training program makes people stronger.

Okay, so my phone's ringing, so I think we'll stop the video right here,

and we'll talk about Deflategate in the next video.