Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

来自 University of Houston System 的课程

Math behind Moneyball

44 个评分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

从本节课中

Module 5

You will learn basic concepts involving random variables (specifically the normal random variable, expected value, variance and standard deviation.) You will learn how regression can be used to analyze what makes NFL teams win and decode the NFL QB rating system. You will also learn that momentum and the “hot hand” is mostly a myth. Finally, you will use Excel text functions and the concept of Expected Points per play to analyze the effectiveness of a football team’s play calling.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

How many times have you watched a basketball game and

the announcer will say, the player is red hot?

How many times have you watched Sports Center and the announcer will say,

this team is red hot, has lots of momentum going into the playoffs?

Well, basically most of the time, players don't get the hot hand and

the teams don't have significant momentum.

How can we approach this problem mathematically?

Well, let's look at a sequence of wins and losses.

Let's look at the 2013 Red Sox, who closed quickly,

closed strong at the end of the season and basically made the playoffs and

won the world championship which was great because of the Boston marathon tragic.

Boston probably needed that more than any other city, whether or

not your a Red Sox fan or not.

But let's a sequence of wins and losses.

Or it could be shots made or shots missed.

But how could you tell if you have excessive?

Well let's suppose a team went 81-81.

If you saw 81 wins followed by 81 losses

that would indicate streakiness, right?

That's extreme streakiness.

Now momentum, they got red hot, and then they got really cold.

Okay, but if you had win, loss, win, loss for the whole season,

that would be like no streakiness.

No momentum.

How can you sort of come up with a measure of how streaky a team's behavior was,

well you look at the sequence of runs.

A run is, not a run in baseball, it's a sequence of consecutive wins or

losses where good shots are missed shots.

And you can use statistics, I think it's called the Wald Walfowitz test.

Runs test, to see what a normal number of runs given how many wins and

loses are in your data set, okay?

And if there is no streakiness, the mean number of runs should be two times

the number of wins times the number of losses divided by N is the number of

games, plus one.

Okay, and then the standard deviation would be the mean minus

one times the mean minus two divided by games minus one.

The square root of the that whole thing.

Okay.

And now how do we tell if something is unusually high or

low on the number of runs?

Again, low number of indicates streakiness.

A high number of runs sort of anti streakiness.

Well, we use what we know about the normal random variable.

The number of runs usually comes close to following a normal random variable.

So anything that's more than two standard deviations above expected,

or two standard deviations below expected,

would indicate something significant without getting really too technical.

That's called the Z-score.

When you take a random variable, the value minus the mean,

and you divide by the standard deviation, that's called a Z-score.

And essentially a lot of statistics is about when you compute the Z-score,

it is greater than plus 2 or minus 2, and that sort of indicates significance.

Again, that comes from the normal random variable, the fact that 95% of the time,

we're basically within two standard deviations.

So let's apply this idea to the 2013 Red Sox.

So for baseball reference, we've got every game did they win or lose.

And near the end of the season, well they had a 5-game winning streak.

They lost one, won two.

Lost one, won three, lost two.

So, I mean, they got fairly hot at the end of the season.

Okay, so let's see how many runs there were in the other sequence,

consecutive sequences of wins and losses.

What was the expected number and the standard deviation?

Get a Z-score and then we'll see there wasn't a significant result based on

the plus two standard deviations or minus two standard deviations threshold.

And then we'll talk a little bit about the hot hand in basketball, a classic paper by

again, the great, great late psychologist Amos Tversky and his colleagues.

Okay, so how many wins are there in this sequence?

So if I called this results, I could give it a range.

I think I gave it a range name already, Results.

Okay so I count how many Ws there, so I could use countif and results.

How many wins, oops,

my keyboard is failing.

97.

Now the loses must be 65 because there's a 162 games.

But let's try, quote L.

Okay, now what's the mean number?

Okay, the mean number is going to be 2 times the wins times the losses and

I gave these range names divided by, in this case 162 plus 1.

So that'd be 79, we'd expect to see 79 runs, or

basically a run again is a consecutive sequence of Ws or Ls.

Now the standard deviation is the square root of means minus one.

Times means minus two.

Divided by 161 in this case.

So that's about six.

So we would need, roughly, 91 runs on the upside, if you add 12 to this, or

66 on the low side to be significant.

Okay.

Now how many actual runs are there?

Well, here we have to use instinct.

So we'd start out, there's one run, whether it's W or L.

Now how do we know how many runs we have?

Well, basically, if this matches the outcome before, then

we have the same number of runs as before otherwise we have one more run stored.

Okay.

So that would one run.

That's two runs,

three runs, four runs, five runs, that's still five runs because it's a win.

6 runs, 6 runs.

And I go down to the bottom I see 86 runs.

Okay, now, how much above or below average is that in standard deviations?

So we saw 86 runs and the expected was 79.

We divide by that standard deviation.

And we get that 1.17.

So what we saw, was actually anti-streakiness

because there weren't less runs, there were more runs than were expected.

But it wasn't that unusual.

It was about 1.2 standard deviations above average.

So there wasn't significant streakiness here and

you rarely will see significant streakiness.

Even though teams may win 10 or 12 games in a row,

it's just the human brain sees streaks or sees clusters when they don't exist.

And that's an explanation perhaps for

a lot of people who think they live in a cancer cluster.

because there's going to be some part of the country with a higher

than average cancer rate, just as there could be a part of the country

with a lower than average cancer rate.

So let's talk a little bit about the hot hand in basketball and

there's been some recent research saying maybe there really is the hot hand.

You ask anybody who plays basketball,

do you think your more likely to make the next shot if you made the last shot and

when I ask my students this 90% of the hands go up, of course I'm getting hot.

Well, it just turns out I think not to be true.

So in this classic hot hand in basketball paper from the cognitive

psychology journal in 1985, okay, here's an example of some of the data,

they use more sophisticated tests than runs tests.

Okay, but basically, look at some famous NBA players free throw shoot, so

if you've got the hot of hand, if you made the first foul shot you should be more

likely to make the second foul shot and if he missed the first foul shot.

So for example, Larry Bird, when he missed the first foul shot,

he made the second one 91%.

When he made the first foul shot, he made the second one 88%.

And you just don't see much difference here.

And the difference is really aren't significant for any of these players,

given the number of foul shots that they took, and you basically see this a lot.

Matter of fact, there might be an anti hot hand that you're more likely

to miss the next shot if you made the last shot.

Why might that be?

Because you might think you're in the zone so you're red hot, so

you take a worse shot than you should've and then you're more likely to miss it.

Okay, so be wary when you hear the announcer say, this player's red hot,

although Steph Curry looks like he's always red hot, to be honest.

Okay, that's because he just makes every shot it seems to me.

Okay, but basically when your local baseball or

basketball announcer says, wow, our team is red hot, we've got the momentum.

If you go ahead and do the runs test, I bet you'll find out that basically this

streakiness is well within the bounds of randomness.