Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

来自 休斯敦大学系统 的课程

Math behind Moneyball

34 评分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

从本节课中

Module 8

You will learn how to use game results to rate sports teams and set point spreads. Simulation of the NCAA basketball tournament will aid you in filling out your 2016 bracket. Final 4 is in Houston!

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, we learned some cool stuff about how to predict the final score of an NFL game

for each team, in the last video based on past performance.

Let's suppose we try to apply this to soccer.

Okay, we could run into a big problem.

So let's suppose if they don't score much in soccer,

the mean goals are 1.5 in a soccer game.

And let's suppose the home edge was .5 goals.

And let's suppose we have US playing Germany and we're not this bad I hope,

but let's suppose relative to the top 100 countries.

If we run the model in the last spreadsheet that we worked on,

the US is minus 0.6 goals on offense and Germany is minus 0.8 goals.

And on defense US, we don't have a great goalie Howard anymore.

Suppose we're plus 0.1 goal and Germany is 0.8 goals better on offense.

So let's predict the final score of this game.

And you can see there's going to be a big problem.

Notably predicting the US.

So the US, the game's at Germany.

Okay.

So we start out with one and a half goals for the US.

We lose 0.25 at the home edge.

Okay. We lose 0.6 because of our bad offense.

In Germany we lose 0.8 for their good defense.

And we predict minus .15 goals for the US.

I don't need to go any further.

I don't think we've ever seen a soccer game where a team scored negative goals,

so the additive model has a problem.

So what we want to do is a multiplicative model, and

this is often a common issue in forecasting.

Do you do seasonal into seeds as additive or multiplicative?

For example, you could say for Amazon, December

they sell two billion more than an average month.

You could say December they sell 50%..

And the multiplicative model usually turns out to be better.

In the limited studies I've done, this multiplicative model seems to forecast

better than the additive model and you'll see if you do one of the homework or

test questions that, for the 2013 Super Bowl,

basically it would have predicted a closer Super Bowl than the additive model cause

the good defense will cancel out a good offense more in the multiplicative model.

For instance, okay, if a team is, we'll explain,

we'll go back to that after we talk about the multiplicative model.

All right so what are the changing cells involved in the multiplicative model?

There'll be an average number of goals that we score, a team scores.

There'll be a home edge and this home edge will be a little tricky.

You'll see how it works in there.

If the home edge is like 1.1, it doesn't mean the home team scores 10%

more goals than the away team, the way I do it.

It really means they score about 20%, or double that,

more goals than the away team.

And then there's an offense and defense factor.

Let's for instance for Arsenal here,

it would mean Arsenal scored 43% more goals than the average team.

And the average of these multiplicative factors should be one,

not zero as it was in the additive model.

So one team scores 50% goals than average, the other team should score 50%.

Some teams should score 50% goals less than average.

Okay, so like in this US Germany game, let's suppose

we'll make up some parameters and see how you forecast the goals.

And then we can use the solver to pick that parameters that best

forecast the scores in the game.

So here we have premier league 2012, I guess it is.

Let's look at the dates here.

Yeah, it looks like we've got the premier league 2012 [INAUDIBLE] scores.

The home team, the away team, the home team goals, the away team goals and we'll

basically fit this model to these scores and we'll predict the outcome of a game.

Okay, but let's suppose we were doing world cup teams and

suppose again the average goals is 1.5.

Let's suppose the home edge is 1.1.

And what that means is I expect the home team, I'll multiply their predicted goals

by 1.1 and the away team's goals I'll multiply by one divided by 1.1.

So if you think about that, on a percentage basis,

the home edge would be 1.1 divided by one

divided by 1.1 which equals 1.1 squared.

Okay.

And so let's suppose again we've got U.S.

and Germany, and we'll make up some offense and defense ratings for them.

And predict the score of the game.

Okay, so suppose the US is 70% of an average team on offense and

on defense, they give up 5% less goals in average.

Germany scores 40% more goals than average offense and

gives up maybe three quarters the average number of goals okay.

So we can forget, so the game's at Germany.

So what's the US prediction?

Okay you start with the average of one and a half, you divide it by 1.1.

Multiply it by one divide it by 1.1.

Okay, and then the US scores 70% as many goals as average.

And Germany gives up 75% as many goals as average.

So a rational prediction would be,

1.5 times one divided by 1.1

times 0.7 times 0.75.

And I've got, let's see.

1.5.

One divided by 1.5.

US really had made an improvement there.

Sorry, one divided by 1.1 would be correct there.

Where I messed up was here.

So it should be one, see, okay that's 0.75, that's the problem.

Okay, so I get US scores 0.7 goals.

Then, the question would be,

what's the chance that they score one goal or two goals?

That's a difficult issue.

I mean, if you can figure out from predicted goals for each team,

the probability distribution of the score of a soccer game,

I think you could make it for about [INAUDIBLE] money betting.

Some people use the Poisson random variable.

I'm not sure that's right.

Now, the German prediction.

You'd start with that 1.5.

They're at home, they get this.

Their offense is 40% better than average, but at least the US has a decent defense.

Okay, and so if I conform to a text.

Okay, and so I would predict 2.2 to 0.7.

Germany wins.

Now of course what you want to know from that is what's the chance Germany wins

the game, U.S. wins the game in a draw because that's what you would bet on.

And the other thing you bet on, I think, is the total goals in the game over or

under two and a half goals.

So you want to figure out the probability of that.

Not that gets to be tricky, and

I don't think we're going to go that far in this class.

But I mean having the prediction for the expected goals, you should be able to

translate that into probabilities, and we'll talk a little bit more about how you

do that in NCAA basketball and NFL football a little bit later in the course.

Okay, so now how would we figure out the ratings for the soccer the Premier League?

All right, so the home prediction following what we,

the changing cells are offense and defense ratings for each team.

A home edge, and then a mean.

And the home prediction, how many goals will predict the home team to score?

We take the average, we bump it up by the home edge.

We look up in the second column, the home team's offense.

Use V lookup rules.

Multiply by in the third column the away team's defense.

That's how many goals the home team should score.

Predicted.

And the away team, we start with the mean.

We divide by the home edge.

So if it's a 1.1, that's multiplying by one over 1.1.

We look up the home team's defense and the away team's offense and

then we minimize some scored errors here.

Okay and that'll be our target cell.

Okay, now it turns out with a multiplicative forecast model.

because we're multiplying things here.

The ordinary GRG solver does not work very well.

The one we've been using.

It's because you might have multiple peaks and valleys and

basically the solver has trouble finding, sort of in this case, the lowest valley.

So we need what's called GRG multistart, I'll show you how this works.

Okay.

And you need balance on the changing cells.

Upper and lower bounds.

Now what multistart does, it tries say a hundred different starting points for

all your changing cell combinations,

finds the best answer from each starting point and takes the best of the best.

This is a very powerful technique but

you if that little very few people know about but it really works well okay.

So what we've got here is solver.

Let's look at our solver window.

We want to minimize the sum of squared errors like before we could do absolutes.

We want to change the offense and defense ratings, the home edge and the mean goals.

Now here is where we need the balance on the changing cells.

Now here the things have to be non negative, because if you've got an index,

you can't have a negative index to predict negative goals.

So you need bounds on the changing cells, so you just pick reasonable bounds.

You could say the bounds are less than 1000, but

then the solver would take a longer time.

So mean goals and home edge, they gotta be less than three.

No soccer team averages more than three goals a game, and

if it bumps up against the bound, you raise the bound.

The average of the offense and defense rating should equal one.

That's not a bound, that's a constrain, and

then everything else should be less than or equal to three.

I don't think any team's going to score triple or give up triple the goals of

an average team or they wouldn't survive in premier league.

I like their system.

If you're really bad and you got knocked out of premier league go down a level and

the good teams move up.

I think the 76'ers should be penalized

in that fashion in the way they've been playing.

Okay, so now the GRG non linear, we check that but we go to options.

There's a button, automatic scaling, I think that makes it work better.

But here's the key.

Under GRG non linear, use multistart and require balance on variables.

Make sure you check that and it should work.

So I'm going to change the numbers here.

Let's suppose we average one goal, we have no home edge and

I don't know, we can make these all ones to start.

And I'll change a couple of these, I'll say Manchester City's at two and a .05,

whatever.

Okay. So the point is,

I have to have numbers to start but

I really don't need numbers that are correct.

So again, what the will do here, it'll try for

my changing cells, how many do I have here?

There's 20 teams.

I have 42 changing cells.

The is 200 changing cells, I believe, and

200 constraints for non-linear models might be a hundred changing cells, but

what I'll do is I'll keep having solver, we'll keep we'll pick a starting value for

each of these 42 changing cells, run it from there, and find the best answer.

Pick another set of starting values, run it from there, and find the best answer.

And if you let it go until it doesn't improve for a minute or

so, you should be fine.

Okay, so now if I let it run here.

Okay, and it's working well.

And I'm pretty sure it's done, the 967.

Once it hangs, goes to this window that says incumbent and sub problems,

it's pretty much done.

But we'll let it run a little bit here.

Okay.

And then we'll make a prediction.

Let's suppose Manchester United.

We'll say Chelsea's playing at Arsenal for that game.

But usually, you can stop it with the escape key, but

you should let it run til it says its done.

Solver has converged improbability to an optimal solution.

But I'm going to hit the escape key and stop it.

And I think we got just about what we had before.

So let's predict.

Let's say Redding at Chelsea.

Redding is not very good, and Chelsea is pretty good, okay.

Well let's say Chelsea at Reading.

Okay.

So they've got a 1.42 offense, 0.75 defense.

And Reading 0.79 and 1.36.

Home edge is 1.12.

And the mean goals is 1.38.

Okay, so let's predict how this game would go.

So Chelsea, okay, you start with that 1.38 goals.

Okay.

And then you multiply by one over the home edge because they're on the road.

And they're going to score 42% more.

And Reading is going to give up 36% more, so this'll be a disaster.

Okay.

So I've got, let's check that, I've got 2.38 goals for Chelsea.

Okay, now let's do Reading.

Okay, so they're going to start at 1.38.

Yay, they get a home edge.

They'll score 12% more.

Their offense is terrible.

They will score 79% of average.

And their defense, sorry Chelsea's defense will knock them down to three quarters.

So I've got Reading at 0.92.

Just like the Germany game almost.

So I predict Chelsea 2.38 to 0.92.

Now, the question is how does that translate to

a probability of various outcomes?

We'll talk about the Poisson random variable when we get the props bets.

You could assume the number of goals is Poisson and figure out the probability of

each possible score of the game and then figure out the probability Chelsea wins.

Draws or loses the game.

And total goals goes over 2.5 goals.

But I'm not.

Is that valid here?

Okay, so

in the next video we'll briefly talk about something called regularization.

We figured out the best way to post hoc, figure how well the teams play.

But is that the best way to predict the future?

The answer is probably not, so I want to give you a brief introduction

in very simple terms to the concept of regularization,

which can be used to optimize the predictive value of rating systems.

We won't go into this very deeply.

But you should be aware of this if you're doing any forecasting at all,

be it sports or not.