Let's now speak briefly about stochastic games. This is a topic that lends itself
to a very long discussion and it's, it's quite a complicated one, but we'll touch
on the, the main points and position it in the landscape of topics we're discussing.
And so, the, the starting point are repeated games. as we know, a repeated
game is simply a game in normal form, for example, that we repeat over and over
again. For example, we play Prisoner's Dilemma once, twice, three times, maybe a
finite time, maybe an infinite number of time. Then, we accumulate all the rewards
all the time to sum overall rewards. A stochastic game really is a generalization
of it, where we play a game repeatedly, but not necessarily the same game. So, we
play a game depending on how we play the game, let's say, Prisoners Dilemma, we
each got some payoff. But depending on how we play the game, we also
probabilistically transition to some other game, play that in turn, and the, the, and
the process continues. A graphical way to look at it is here. If, if, if this here
is a, is a repeated game, where you play the same game over and over again, here,
you play the game and then if you happen to play this, you'll transition to this
game. If you happen to play this, you'll maybe transition to the same game. If you
play this, you, you'll transition here. If you play this, maybe you'll play this game
again, and so on. From each game, you transition probabilistically to, to, to
other games. this is a stochastic game. formally speaking, it's, it's, it's the
following tuple. It's a lot of notation, but the concept is exactly what we, we
saw. We have a, a finite set of state Q. We have a set of players. we have a set of
actions where actions are available to, to, to, to specific players of A sub i, is
the action available to play i. And then, we have two, two functions. We have the
transition probability function. So, this is, depending on the state we're in and on
the actions we took, we move to each of any of the other state s or the, the very
same state with a certain probability as governed by this probability distribution.
And, and similarly, our reward is the reward function which tells us if in a
certain state, a certain action profile was taken by the agents, then this is a
reward to, to that particular agent to, to each of the agents. So, r sub i is a
reward to, to agent i. that's the formal definition. notice that you sort of assume
the implicity that you have the same action spaces here but you, you could
define it otherwise. It simply would involve more notation. There's nothing
inherently important about the action spaces being the same in the different
games within the stochastic game. So, just a few final comments on it. First of all
as we as we saw, this obviously generalizes a notion of a repeated game.
But also, it generalizes a notion of an MDP or a Markov Decision Process. If a, if
it's a stochastic game, if a repeated game is a stochastic game with only one game, a
Markov Decision Process or MDP, is a game with only one player. And so, you have
states there that, where the agents take, agent takes an action, receives a
remuneratory reward, and probably moves to some other state.
And the only difference is that, that he's the only actor in the setting. I mention
this because while MDPs have been studied substantially in a variety of disciplines
from optimization to Computer Science to pure Math and beyond, but also, these two
perspectives of a generalization repeated games, and of MDPs, give you a sense for
the theory and investigations into stochastic games. So, from repeated games,
we inherit the definitions of different ways of aggregating rewards over time. You
can have limited average rewards, future discountage rewards whereas, from the, the
literature on optimization and on MDPs, we get a notion such as stationarity and
Markovian strategies. These have to do with, we also have a
notion of reachability about the structure of the underlying transition probability.
And so again, these are issues that are involved that we won't get to into more in
this lecture but at least we flagged their existence.