0:00

Folks, welcome back. this is Matt again. And I'm going to talk a little bit about

now about repeated games when players discount the future payoffs. And let's

talk a little bit about more, more about that and what that means. So when we're

looking at discounted repeated games, the idea is that we're, we're looking at games

where there are players playing the same game over and over and over again. But

instead of looking at the limit of the means in terms of some limit of the

average of, of what the payoffs are going to be in the distant future, instead

people are, are looking at value in today versus tomorrow differently. So, the idea

of discounted repeated games is the future's uncertain. You're often motivated

somewhat by what happens today. and you trade off today versus the future. So,

it's not the infinite future that you care about, but you say, I really care about

today. I care about it a little bit more than tommorow. So maybe tomorrow's value

is say 80 or 90% of what today's value is. And that means that the next day is worth,

if I say today's worth one, tomorrow's worth 0.9, the next day's worth 0.81 0.72,

etc. So, things are, are decaying exponentially

in terms of discounting. And, so the idea here is, is if I misbehave today, now I

have to think about what are the, how are people going to react to that? So, if

we're trying to support cooperative behavior in a prisoner's dilemma, I can

behave today, or I can cheat and deviate, defect. And if I do that I'm going to get

a temporary gain, and then I'm going to possibly be punished in the future. So,

the kinds of questions that are going to be important here is, will people want to

punish me in the future? Is it going to be in their interest? how much do I care? Do

I care about it? what's my discount? Do I care a lot about the future or just a

little bit? so we're looking at a stage game. So again, a stage game just take a

normal form game, we're going to play that repeatedly over time. And now each player

has the discount factor. So , Player 1 has a discount factor and so forth. Discount

factor is going to be taken to B and 0,1. Generally, we'll take beta i to be

strictly less than one so that it's of more interest. If it's equal to zero, then

it means that you don't care about the future at all. It's basically just a one

stage game. So generally, the interesting case is going to be when players care

somewhat about the future, but they care more about today than tomorrow and so

forth. often in these games, people look at the situations with a common discount

factor so everybody has the same discount factor which will make things fairly

easier in some cases. And then, the idea of discounting is then the path that you

get from a whole sequence of actions. So profile of actions, a one played in the

first period and, and at in the t-th period, and so forth. What you just do is

you sum up these payoffs, but now you weight them by an exponentially decreasing

function which is the discount vector raised to the power of t. So, if I care,

you know, I, I get if, if this payoff was one every day, I'd be getting 1 today plus

0.9 plus 0.81 plus 0.72, etc., right? So that's the idea. Okay.

So when we look at these games again players can condition their play on past

history. So history, a finite history of, of some length t is just going to be a

list of everything that's happened at every date. so here, a1 is equal to a

profile of what every player did in period 1. So in the first time we played this

game, what did everyone do? And generally, at is going to be what everybody did at

time t, right? So, we've got at1 to atn. so these things are vectors, and they tell

us what everybody did in the first period, what everybody did in the second period,

and so forth. And then, we can talk about all finite histories. So all possible

histories that I could be faced with when I am playing this game, all the kinds of

things I'm going to have to think about. What am I going to do if this happens?

What am I going to do if that happens? So in an infinitely repeated game, I've got

all these histories. What I'm going to do in each circumstance? So a strategy is a

map from every possible history into a possibly mixed strategy, over what I can

do in the, in the given period facing the giving history. So, if we're looking at a

prisoner's dilemma, people can either cooperate or defect in a, in a given

period. So, if we're thinking about a history of a given length 3, one

possibility would be the following. We both cooperated in the first period. Maybe

Player 2 defected in the second period. And then, both of them defected in the

third period. So that would be a possible history. And then, they could say, okay.

Now what are we going to do in the, in the fourth period. maybe we'll let bygones be

bygones and try and get back to cooperation. Maybe we'll just defect,

we're angry at each other, who knows. Okay.

So, for strategy for fourth period would be what, what you do after you've seen

different histories of the 1st 3 periods. So, sub-game perfection again is same as

usual. Profile strategies that are Nash in every subgame. What's a subgame here?

5:28

subgames are just starting some period and talk about what remains. So, it has to be

a Nash equilibrium following every possible history. So, if you take some

history, start at that point, it has to be Nash for forever on. So strategies now are

going to be specifications of what we would do in every situation. And then,

we've got Nash in every history. one thing to check and it's important here is

repeatedly playing a Nash equilibrium of the stage game. so just find a static Nash

equilibrium of whatever game it is. So, for instance, defect, defect in the

prisoner's dilemma. Just play that forever, no matter what's happened in the

past, it's always going to be subgame perfect. So, for every possible history,

everybody's going to say that they're going to play the Nash equilibrium forever

on and going forward. you can check that that's subgame perfect equilibrium, right?

That's going t o be Nash in every possible subgame. So check if everyone else is

doing that, I wouldn't want to deviate. So, just think a little bit about the

logic of that. Because it's, there's a lot of possible subgames to think about but

you can convince yourself that that's true. Okay.

So, so solving the repeated prisoner's dilemma, let's think about it a little bit

with the context of discounting now. So, let's suppose that what we want to do is

we want us to stay in cooperation, right? So we've got our standard prisoner's

dilemma. I put in payoffs here of 3,3 for cooperating, 5,0 from you defecting, and

the other person cooperating, and 1,1 if you both defect. So, the only Nash

equilibrium of the static game is defect, defect with payoff 1. We want to support

3, 3 if we can. So, cooperate as long as everyone has in the past, and a, defect

forever in the future if anyone deviates. So, when is this in equilibrium? Okay? So

clearly, that's not in equilibrium. So if we set beta i equals 0, for both players,

we can't make this work, right? Because I don't care about the future, nobody cares

about the future. then we'd end up with so many defect, defect in every period being

the only subgame perfect equilibrium. Players only care about the present

they're always just going to miopicly defect. They don't care about the future

so nothing's going to work. So, the question here is, for which betas can we

sustain this kind of strategy, which is cooperate as long as everyone has? And if

we ever, if cooperation breaks down, then we just say, forget it. We're going to

defect forever after. Okay. Let's have a peek. So if you cooperate and

the other players cooperating. If no one's failed to cooperate in the past, what do

we get? We get 3 in perpetuity, right? So we get 3 plus beta times 3, so take a

common discount factor for now. beta squared times 3, beta cubed in the third

period, and so forth. So, in perpetuity, if you remember your sum of series, that's

just, the value of that is just 3 over 1 minu s beta. Okay.

What happens if I defect? And people were playing this, this grim trigger strategy.

Well, everybody else is cooperating. The other person's cooperating in the first

period. So, I'm going to manage to change from cooperate to defect. I'm going to get

a 5 in the first period. But then, they're going to see that. And the next period,

they react to it. They defect, and they say they're, that everybody's going to

defect forever after. So then, in perpetuity we get a bunch of ones, right?

So, what do we get? We get 5, and then beta times 1, beta squared, and so forth.

And if you remember your sum of series here, this is just beta times a 1 plus a

beta 1, and so forth. This is beta times 1 over 1 minus beta. So, if I deviate, what

happens is, in the first period, I get a gain, but then I lose in the subsequent

periods. So, there's a trade off. And how big that trade off is depends on the size

of the discount factor. So we've got these two different payoffs. We can look at the

difference between these. if I stay cooperating instead of defecting, I'm

giving up 2 today. I could, I could gain by defecting. But then, I keep the

benefits of cooperation in the future. So, I don't ruin things and that, that means

I'm getting a bunch of 2's extras in the future. And so, when you look at this, the

value of this is beta times 2 over 1 minus beta minus the 2 I'm foregoing today. And

when do I want us to keep cooporating? As long as this is non-negative, right? If

this becomes negative, then I'm worse off today by cooperating. I might as well just

defect. So, difference is non-negative if this thing is such that beta is greater

than 1 minus beta, or basically beta needs to be greater than or equal to a half. So,

if you just go through the algebra of solving this inequality, you'll get beta

greater than or equal to a half. So, as long as people care about tomorrow, at

least half as much as today, they're going to be willing to cooperate in this, in the

repeated prisoner's dilemma, with these particular payoffs that we looked at

before. So, when we're looking at this, this payoff, this payoff structure here,

then we've got a situation where beta has to be if each beta i is at least a half,

then they can sustain cooperation in this finitely repeated, or infinitely repeated

prisoner's dilemma. Okay. So, let's change the numbers a little bit

and see what happens. So now, let's try and make defection a little bit more

attractive, right? So, instead of 5, we'll make it worth 10 to defect. So now,

defection looks really attractive. what, what has to happen? Well, we can go

through the same exact calculations we just did, but we're just going to change

the numbers, right? So, we've got the same if cooperating in perpetuity is worth the

3 over 1 minus beta. The only difference is, we're getting a higher number here,

and then we're still going back to the defect. So there's a, a little bit more

temptation today. And when you do the differences here, you know, you get the

same kind of thing. Except now, instead of a minus 2, we've got a minus 7 difference.

you're foregoing 7 units for not defecting today. So, when you go through and solve

for that, now beta has to be at least 7 9ths before players are going to be

willing to cooperate. So, you have to care about tomorrow at least 7 9ths as much as

today, okay? And so, you can see the basic logic here, right? So there's tradeoffs of

punishments tomorrow versus a good payoff on today. And that, the, the, whether or

not something can hold together as an equilibrium, what's it going to be

determined by? We have to know how big is the future versus the present. How

tempting is the defection versus the current. the, the what, what we're doing

in the current period. how big is the threat. So, how bad is it if, what,

whatever the thing that we're resorting to in the future. How bad is that in terms of

the trade-off. All these things are going to matter in terms of holding together

cooperation in these kinds of settings. And that gets back to the discussion we

had a little earlier about say, OPEC, right? There's a temptation to pump more

oil today. how much do you care about the future? What's your beta? What's the

reaction going to be? If I start pumping more oil, how are they going to react to

that? Are they going to start pumping more oil and driving the price down? How much

is that going to hurt me? All of those things matter, and they determine whether

an equilibrium can hang together or not. Okay.

So, basic logic play something with relatively high payoffs. Even if it's not

an equilibrium of a static game, you can sustain it. and you sustain it by having

punishments. If anyone deviates, you resort to something that has lower payoffs

at least for that player. And the important thing is that it all has to be

credible, has to be an equilibrium in the subgame that goes forward in order to make

that work. And, it has to be that the lower payoffs in the future are enough to

make sure that you, you know, you deter people from deviating in the present.