A Learning Process for Games
We report new results on a slightly generalized version of the classical learning process for games proposed by Brown (1951), and further analyzed by Robinson (1951), Miyasawa (1961) and Shapley (1964). the learning process rests on an exogenous prior distribution over strategies - a mixed strategy combination - and an initial weight, a number w e [0,8[, given to that prior. In step s of the learning process, a vector best reply to the weighted average of the strategy combinations of all previous steps is calculated. The average is then updated by this last best reply, and the learning process keeps track of the updated averages. The learning process can be given several interpretations: A preplay thought process of a one shot game (fictitious play), or a sequence of actual plays of a repeated game. We show that if the average strategy combination of our process converges to a specific point, this point is a Nash equilibrium. If it does not converge, the pure strategies used with positive probability in the limit, i.e. in cluster points, are 'rationalizable', and in fact have the 'best response property' in the sense of Pearce (1984). We show that for all 2 x 2 bimatrix games, for any prior and any initial weight, the average strategy combination converges, and we characterize the solution for these games. It turns out that if the initial weight is large, there is a striking similarity between the convergence point of the learning process and the equilibrium traced by the tracing procedure of Harsanyi (1975). Finally we apply the learning process to (the agent strategic form) extensive form games, assuming a completely mixed prior and a strictly positive initial weight reflecting initial uncertainty. In games of perfect information the process always converges to a sub-game perfect equilibrium. In games of imperfect information, if the process, as well as the beliefs derived from it by Bayes' rule, converges, then the convergence point is a sequential equilibrium.
Year of publication: |
1990-12
|
---|---|
Authors: | Hendon, Ebbe ; Jacobsen, Hans Jørgen ; Nielsen, Michael Teit ; Sloth, Birgitte |
Institutions: | Økonomisk Institut, Københavns Universitet |
Saved in:
Saved in favorites
Similar items by person
-
Learning, Tracing, and Risk Dominance
Hendon, Ebbe,
-
Expected Utility under Uncertainty
Hendon, Ebbe,
-
Fictitious Play in Extensive Form Games
Hendon, Ebbe,
- More ...