0:00

In this video, we're going to look at the equilibria of infinitely repeated games.

Â So in order to talk about equilibria, we need to begin by asking what are we going

Â to mean by a pure strategy in an infinitely repeated game? And you may like

Â at this point to pause the video and see if you can answer that question before I

Â go on and tell you the answer. If you need a hint, you should remember that the rule

Â of thumb that we've used to define strategies as we've changed game

Â representations has been to say that a pure strategy should be whatever you need

Â to tell another person to play in the game on your behalf and to end up doing exactly

Â what you would have done. So a pure strategy has always kind of been the

Â policy that you would, you would follow. Making all of the decisions in, in the

Â same way as you would have actually done it. So whatever you would need to

Â communicate about that policy. Well, in an infinitely repeated game your strategy is

Â going to be a choice of action at every decision point that you would have. which

Â means an action that you would take every stage game. And bear in mind, that when

Â you take those actions, you actually get to reason about everything that you've

Â seen before in the game which is to say, you can remember all of your own previous

Â actions and you can also remember all of the actions in previous stage games by

Â other players. So, so really, your pure strategy space is mapping from every

Â history to an action choice that you would make. Clearly, this is an infinite number

Â of actions. So, unlike the previous games we've looked at, extensive form games and

Â normal form games, you're not even going to have a finite set of pure strategies in

Â an infinitely-repeated game. Let me give you some examples of some famous pure

Â strategies in infinitely-repeated games, nevertheless, to give you the sense that

Â we can say interesting things about pure strategies, even though the set of

Â possible pure strategies is. Is infinite. So think about, playing repeated

Â prisoner's dilemma, so , I'm going to play the prisoner's dilemma game infinitely.

Â One famous strategy is called tit-for-tat. Turns out that they were famous

Â competitions where people submitted programs that played a prisoner's dilemma

Â very repeatedly and then they looked at how these programs did against each other.

Â And, tit-for-tat was a stradegy that did famously well in those competitions. The

Â way it works is it starts out cooperating, and then if it observes that it's opponent

Â defect every defects in the previous round the opponent defected, then tit-for-tat

Â defects in the next round then it goes back to cooperation. So then, if it

Â defected but its opponent cooperated, then it'll respond to that by cooperating. So

Â if tit-for-tat plays itself it'll cooperate forever., but if it plays

Â against somebody who defects it's going to intuitively kind of punish that defection

Â by defecting itself in the next round. But it's very forgiving, it's going to go back

Â to cooperation after one punishment. In contrast, the trigger strategy is kind of

Â really a mean-spirited version of tit-for-tat. So, it starts out by

Â cooperating, but if its opponent ever defects, that's it. It's just going to

Â defect forever. So it's going to pull a trigger and say., I'm, I'm never going to

Â forgive you. You've wronged me once. I'm going to wrong you back until the end of

Â time. That's why we call it trigger. So, you can see that, in both of these cases,

Â I've been able to describe to you how these strategies would respond to anything

Â from the infinite history that they would see kind of in algorithmic terms without

Â actually kind of writing out a strategy kind of in a formal language. So, you can

Â see that it is possible to think kind of coherently about interesting strategies in

Â infinitely repeated games. Now, of course, what we would really like to do as game

Â theorists is describe the Nash equilibria of infinitely repeated games. The kind of

Â approach that we've taken in the past, has been to first of all show how we can make

Â an induced normal format of a game. We, we figure out what a strategy is in the game,

Â we show how to make the induced normal form. And then, we just appeal to Nash's

Â theorem, and say, because we've got a normal form game, we know that there is a

Â Nash equilibrium, and so, everything kind of goes through the way it did before.

Â Well, that worked well for us in the past, because we always ended up with finite

Â sized induced normal forms. And unfortunately, because we have an infinite

Â number of pure-strategies, we're going to get an infinity by infinity matrix, even

Â in the two player case. And so, we're not going to have something to which Nash's

Â theorem applies anymore, because now we have an infinite-sized game and Nash's

Â theorem only works for finite games, which means games whose matrices are finite and

Â contain a finite number of numbers. And that means we, we don't have any reason

Â based on what we know so far, even to be sure that equilibria do exists in these

Â games. on the other hand, because there are an infinite number of strategies,

Â there might even be an infinite number of pure-strategy equilibria. So we've seen

Â the, the fact that in the past it's possible to have an infinite number of

Â mixed strategy equilibria. For example if I, in a normal forum game, have two

Â strategies. two pure strategies that I'm indifferent between, then any mixture

Â between them can also be a best response for me. So that there's a sense in which I

Â can have a infinite number of mixed strategy equilibria. But here, I can have

Â an infinite number of qualitatively different pure-strategy equilibria

Â potentially, that seems like a problem. So, we're not going to be able to somehow

Â list off the equilibria, necessarily, of a repeated game. Interestingly, there is

Â still something that we can coherently do to give a satisfying answer about the Nash

Â equilibria of infinitely repeated games. And in this video, I'm going to tell you

Â what it is, and actually because, it's so satisfying I'm even going to prove this

Â theorem to you, so you really will understand how this one works.

Â And, here is the idea of the theorem that we will eventually prove, we can

Â characterize what payoffs can be achieved under equilibrium. And, we can give an

Â example of equilibrium strategies that result in those payoffs, without giving

Â every equilibrium that results in those payoffs. So, we're going to kind of

Â characterize which strategies are in equilibrium in infinitely repeateded

Â games, sort of via their payoffs. I'm going to be able to tell you precisely

Â which payoff vectors are achievable in equilibrium. So to do that, I need to give

Â you a little bit of notation so we can talk about these payoff vectors. So this

Â is kind of our slide of notation, once we get through this we've going to have all

Â the building blocks we need to prove our theorem. So, we're going to start with

Â some n player game so this is just our stage game, some normal form game. And

Â we're also going to start with some payoff vector here, and by this, what I mean is

Â the utilities that the players get in the game. Now, in this video I'm going to talk

Â about the average reward case, where each of these numbers actually encodes the

Â utility that I get on average by following my strategy in the infinitely repeated

Â game. And so, that's going to say, I care just in the same way about payoffs that I

Â get now and payoffs that I get a million iterations into the future. we can do

Â something similar in the discounted reward case and that's going to be a different

Â topic of a different video. But for this video, we're going to talk only about the

Â average reward case and the reason is the proof is a little bit easier to think

Â about in the average reward case. Even though the discounted reward case, reward

Â case is maybe a bit more of a practical setting. We can get across the, the key

Â ideas of both proofs by this proof and this one's a bit easier. So, so let's do

Â it to do that, I need to remind you of the concept of the minmax value here which is

Â something that we kind of introduced in the concept of zero games, but it turns

Â out to be very important for repeated games. So I'm going to remind you of what

Â it is here. So, this is the mathematical definition of the minmax value, but let,

Â let me start by saying in words what it means, because the, the nested min and max

Â operators are a little bit hard to think about. Essentially what, what i's minmax

Â value is, is it's the amount of utility that player i is able to obtain for

Â himself. If all of the other players who we call minus i are completely unconcerned

Â about their own utility, and instead, they're just trying to hurt i as much as

Â possible. So, they all play that strategy from their joint mixed strategy space,

Â that, if i responds to that by doing the best thing that i can for himself, then

Â i's utility is as low as possible. So they're trying to minimize i's utility,

Â given that i is trying to maximize in response to their minimization and the

Â number that comes out at the end is the amount of utility that i can get by doing

Â as, as best he can against everybody trying to hurt him as much as possible. So

Â that's i's minmax value. So, intuitively, if i want to punish you as much as

Â possible in a game, and you know that I'm trying to punish you, that's, the, the

Â amount of utility that you get is your minmax value.

Â So I will say that a payoff profile, remember, a payoff profile is a payoff

Â amount for everybody, is enforceable, I'll give it the name enforceable if it's the

Â case that everybody's payoff in that payoff profile is at least their minmax

Â value. So if anybody is getting less than their minmax value in a given payoff

Â profile, I will say that that's an unenforceable payoff profile. And I will

Â say that a payoff profile is feasible, I'll use, I'll use this technical term

Â feasible if the following condition is true. intuitively, what I, what I want to

Â say about feasibility is that it's possible to actually get that payoff

Â profile by combining together payoffs in, in the actual game that's being played.

Â Notice that enforcibility doesn't actually require that it's possible. You know, I

Â could say the payoff $1 million for me, a million for you in prisoner's dilemma is

Â an enforceable payoff profile, because it's above both of our min max values, but

Â there's no way we can actually each get a million in prisoner's dilemma, because the

Â numbers don't go that high. So feasibility is going to talk about what it's actually

Â possible to do. And the way we're going to say this is I'm going to restrict myself

Â to rational numbers, I'm going to say, if there exists rational and non-negative

Â values alpha a, such that for all players, I could express player i's element from

Â this payoff vector as a sum overall of the payoff profile, action profiles in the

Â normal form game of this alpha a times i's utility for that a. So, intuitively oh,

Â and then, of course, I need this let me just say this last part of the condition,

Â that these alphas all sum to one, across all of the action profiles. So let me kind

Â of explain to you what this means. So, oops, so I have kind of a normal form game

Â and what I want to say is I have some weight on each cell in the game. These are

Â the alphpa a's and they all sum to one. And what I want to do is take my payoff

Â from this cell weighted by the alpha for that cell, my payoff for this next cell

Â weighted by my alpha for that cell, and sum all of those, those weighted utilities

Â for me up and then I get ri. So I, I, what I want to say is there exist alphas that I

Â could use that would blend together the payoffs in the actual game in such a way

Â that I get ri. And furthermore, that has to simultaneously be true for everybody

Â else, and in particular, it has to be true with the same alpha a's for everybody

Â else. So I can come up with some alphas that may, that get me my ri, but then I

Â have to use those same alphas to get rj for player j. I have to have the same

Â weights on all of the, the cells here so that I, I get the same numbers for

Â everybody. So, so let's think about the following game, which will give you an

Â examp le of how feasibility works. So, let's say I have a game that looks like

Â this. So in this game, I claim the payoff profile 1,1, is enforcable, and the reason

Â is I can put a weight of 50% on this cell, a weight of 50 percent in this cell, and

Â weights of zero on these cells. And you can see my payoff in this game is 50%

Â times 2 plus 0 times 0 plus 0 plus times 0 plus 50% times 0, which is 1. The other

Â players' payoff in this game is 50% times 0 plus 0 times 0 plus 0 times 0 plus 50%

Â times 2, which is also 1. So that is a feasible payoff in this game. On the other

Â hand, the payoff 2,2 is not feasible in this game, and the reason is, there's no

Â way that I can have weights on these cells that sum up to one and that give both of

Â us two, because for me to get a payoff of two, this would have to be one. That's the

Â only cell where I get a payoff at all. And for my opponent to get two, this one would

Â have to be one, and then they would both sum up to more than, than one. This

Â condition over here would be violated and so we just can't do it, so that's not a

Â feasible payout. ,, , Okay. Nol, we're ready to prove the folk theorem. So the,

Â and it's kind of funny it's called the folk theorem. It's kind of like a folk

Â song for mathematicians. So a folk song is a song that everybody knows, but nobody

Â really knows who wrote it first. And a folk theorem is a theorem that people all

Â kind of knew and talked about before they got around to writing it down and they're

Â not really sure quite where it came from. So, this is the folk theorem of game

Â theory, and despite having uncertain origins, it's, it's very important. So in

Â the folk theorem basically tells us what the Nash equilibria are of an infinitely

Â repeated game in terms of these payoffs and in terms of the definitions that I

Â just told you. So there are different folk theorems for different settings. This is

Â the folk theorem for infinitely repeated games with average rewards. So, the folk

Â theorem has two parts which basically stems from the fact that I've made a

Â restriction here to rational alphas. Turns out we don't actually need that, that the

Â math gets more annoying if we have to deal with real values. So, to keep things

Â simple, I'm doing this just for rational numbers. So, the folk theorem says that in

Â any n-player game, which we're going to repeat infinitely, and, for any payoff

Â 15:48

vector r1 to rn just like we've been talking about. We're, it, first of all

Â we're, we're going to talk about the case where if r is the payoff in a Nash

Â equilibriuum of the infinitely repeated game with average rewards. What we can

Â conclude about the payoff vector? And what we conclude is that for every player, r

Â has to be enforceable. Remember, enforceable means greater than or equal to

Â that player's minmax value. So I can conclude that if r is the payoff in an

Â equilibrium of the infinitely repeated game with average rewards, then for

Â everybody that r must be enforceable. That seems like kind of a weak thing to say.

Â The second part of the folk theorem is kind of more surprising. It says,

Â basically, that's the only restriction we need if r is enforceable, and furthermore,

Â if it's feasible. If it's kind of possible to make it out of the payoffs of the game,

Â then, r is the payoff in some Nash equilibrium. So, together, what this says

Â is that, basically, feasibility and enforceability is the whole story of Nash

Â equilibrium. Enforceability seems like this very small thing. It says you're

Â getting no less than the payoff that you get if everyone punishes you as much as

Â it's possible for them to punish you. And, feasibility says it's kind of possible to

Â get these payoffs in the game at all. And the folk theorem says. That's basically

Â it, as long as you meet those two conditions, you've got a Nash equilibrium.

Â That's, that's basically what there is to say about Nash equilibrium. You'll notice

Â there's, there's a little bit of wiggle room between these two statements these

Â two parts of the theorem statement which has to do with the fact that we're talking

Â about feasibil ity here and we're not talking about it here. So we can't

Â conclude that every Nash equilibrium is feasible. You might wonder why that is,

Â that's just because some of them aren't expressible as rationals. Some of them,

Â the alphas wouldn't be rational. So that's the part that I'm kind of leaving out of

Â this proof, but that, that's a technical point. So the broad thing that you should

Â remember about the folk theorem is that, basically, feasibility and enforceability

Â together are equivalent to Nash equilibrium in repeated games. So if, if

Â you wanted to stop here, you now know what the folk theorem is, but I encourage you

Â to keep watching the rest of the video, and you'll see how we actually prove this

Â there. So.

Â Let's begin by saying proving the first part. So saying why is it that if the

Â payoff is achievable in a Nash equilibrium that I can conclude that it must be

Â enforceable? Well to prove this, I'm going to prove by contradiction. So I'm going to

Â begin by supposing the payoff vector is not enforceable. And if not, then that

Â means there must be some player i for whom his payoff, ri, is strictly less than his

Â min max value. Now, let me consider a situation in which this player I changes

Â his strategy. Let's imagine that he, instead, deviates to playing bi of s minus

Â ih. I'll tell you in a second what that means. any time he's in a history h of the

Â repeated game. So, what is this thing? It says, this is the strategy of the other

Â players. Let's assume that he gets to know what that is, because we're talking about

Â a Nash equilibrium. And, and we'll say, bi is just his best response, every time the

Â other players are playing their strategy profile s minus i given h for every

Â history h So, let's just consider a case where he just best responds to what the

Â other players are doing. Well, by the definition of a minmax strategy, you have

Â to get a payoff of at least your minmix value in every stage game if you follow

Â the strategy that we just defined. Because, remember, the definition of a

Â minmax v alue is that you get is, is that everyone is trying to hurt you as much as

Â possible, and then you're best responding to that. So intuitively, if I'm already

Â getting less than my minmax valu, that means I can't be best responding to

Â everybody else, because if they're trying to hurt me as much as possible, which is

Â the worst case for me, and I'm best responding to them, I will get my minmax

Â value. So intuitively I could change my strategy to best responding and that has

Â to improve things for me, that would have to bring me up to by minmax value. And

Â because that would improve my utility, that's a better response for me that what

Â I was doing before. And because so basically, here we've just kind of

Â constructed a profitable deviation for player i. And because a profitable

Â deviation exists for him, that means the strategy he was playing before couldn't

Â possibly have been a Nash equilbrium. And so, that leads us to, to derive a

Â contradiction from our that we made at the beginning that r was not enforceable. So

Â you can see we can conclude that if r is not enforceable, we, we can't have a Nash

Â equilibrium and, and that then proves what we want it to prove, that being in an Nash

Â equilibrium implies enforceability. So that's part one, that's kind of the easy

Â direction and the less interesting direction. Let's now do the, the second

Â and more interesting part, which says all I need to assume is that the payoff

Â profile is both feasible and enforceable and that means that I've found a Nash

Â equilibrium. And what's interesting about this part is that we're going to do it by

Â construction. So I'm going to show you how If you're given any feasible and enforf,

Â and enforceable payoff profile, you can build a set of strategies for both players

Â which are in equilibrium with respect to each other, and which obtain exactly that

Â average payoff for both players. So, first of all, let's just kind of introduce some

Â bookkeeping notation that we'll use here. So, since r is feasible, and since we've

Â assumed that these alphas a re rational, then we can write each r-i as follows. So,

Â basically we can make up new variables, beta a and gamma, where we've replaced

Â each alpha a by beta a divided by gamma, and, basically, that, that's not hard to

Â do. Because we know that the alphas are rational. So, we know that we, it's

Â possible to write each of these alphas as a fraction. and then we can make gamma

Â into their common denominator, rnd we can set the betas appropriately. And then,

Â it's possible to come up with betas and gammas that make this expression true. So,

Â so, that shouldn't be too surprising, that just follows from, from the fact that

Â these alphas are rational and from feasibility. Now, I'm going to construct a

Â strategy profile, and then, on the next, slide I'm going to argue to you that, that

Â strategy profile is in equilibrium or that, that we can turn it into

Â equilibrium. So, let's make a strategy profile that cycles through all of the

Â outcomes using cycles of length gamma. And the way it's going to work is that each,

Â each cycle is going to repeat action a exactly beta a times. So we're going to

Â have our kind of game matrix here. And, remember, beta and gamma are both

Â integers, so, so for example, let's say here we have, let's say we have gamma is

Â seven and say here we have beta as two. So here, alpha a is 2 over 7, let's say here

Â it's zero. over 7, 0 over 7, I'll just write the betas from this point on.

Â 0,1,2,0,2,0. So, so let's say that's, that's what we

Â had in this particular game, then what the strategy would do is it's just going to

Â cycle through. So player 1's strategy would be to play,

Â let's, let's give these names, A, B, C, D, E, F. So player one's strategy would be to

Â play A, A, B, B, B, C, C, and then go back, and keep doing that forever. And

Â player 2's strategy would be to play D, D., E, F, F, E, E, and then go back. And

Â so, if the 2 players played those two sets of strategies together in a coordinated

Â way forever, they would cycle through exactly hitting every action profile in

Â this game accord ing to the betas in, in just the way that the betas say. And, and

Â let's denote such a sequence a to the t. Now let me make strategy the, the real

Â strategy is for the players that I'm going to claim are in equilibrium here. Let's

Â define a strategy si of player i to be a trigger version of this strategy. So, if

Â nobody deviates, then si tells the player to play just what at would tell them to

Â play. So, the players are going to begin by kind of cycling through these outcomes

Â in just the way that I told you. But, if one player notices that the other player

Â didn't do what they were supposed to do according to at, then they're going to hit

Â the trigger and instead from that point on they're going to play the minmax value

Â against the other player. So if, and sorry that's, that's just what this says here So

Â if there's ever a period in which somebody does the wrong thing, then everybody is

Â going to gang up on that person forever, and play the minmax p-, p-, play the

Â strategy against that person that causes that person to get their minmax value. So.

Â So, so let me just say that one more time to make sure we've all got it. So, sir 8t

Â is a sequence where, which is constructed so as to hit every action profile here

Â exact, exactly the number of times given by these beta integers and we'd cycle

Â through that forever. And the strategy we're interested in is a trigger version

Â of that, that says everyone tries to do that, but if somebody does the wrong thing

Â Everybody punishes them forever and gives them their minmax value. Okay.

Â So, I want to claim that this is a Nash equilibrium of the infinitely repeated

Â game with average rewards. So first, notice that if everyone does play

Â according to this strategy, then everyone will get an average payoff of ri just as

Â we wanted. now you might be thrown off by the fact that, that, that sometimes

Â they're not quite getting r-i because, they're only half-way through the

Â sequence. But, we're only interested in the limit. And, so, if you look at

Â averages over periods of time of length gamma. So, you look at what happens after

Â gamma, what happens after two gamma, what happens after three gamma. You, you'll be

Â able to convince yourself that they really do get the right payoffs. Because at,

Â indeed after every period of length gamma, they get exactly the payoff ri by

Â construction. so, so indeed this does lead to the payoff they were trying to get.

Â What, what remains to show is that this is a Nash equilibrium that nobody can gain by

Â deviating. And indeed I claim that it is. So, to show that let's, let's imagine that

Â everybody else is playing according to this strategy and some player J deviates

Â at some point. Well, if that happens, then for all time after that point, player j is

Â going to get his minmax payoff, and, that's going to render the deviation

Â unprofitable. Because we've assumed that this payoff profile is feasible and

Â enforceable and that means that he was already getting at least his minmax value.

Â And because this is going to happen for all time, right? This is going to happen

Â forever afterwards, that's just going to end up dominating the average. Everything

Â that's happened for that finite amount of time beforehand is going to be washed out

Â of the average. And instead, his average payoff is going to turn out to be his, his

Â min max payoff. And by our, But by the, the thing we're trying to prove here, we

Â know that that's less than or equal to rj. And so, there's no reason that he can gain

Â by such a deviation. And, that suffices to prove the folk theorem. So, so what we've

Â seen now is that essentially, feasibility and enforceability. correspond to, the

Â payoff profiles that are achievable in Nash equilibrium of an infinitely repeated

Â game with average reward.

Â