CMU-CS-23-121
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-23-121

Scalable Learning and Solving of Extensive-Form Games

Chun Kai Ling

Ph.D. Thesis

June 2023

CMU-CS-23-121.pdf


Keywords: Game Theory, Machine Learning, Multiagent Decision Making

Strategic interactions between agents can be formalized by game theory. In re- cent years modern learning techniques, coupled with the ubiquity of data and pro liferation of compute, have paved the way towards computationally solving large games, with broad applications ranging from poker to security. However, several challenges arise when considering the current state of the art in large-scale game solving: (i) game parameters in many realistic settings are often unspecified; and (ii) there is a notable lack of scalable solvers for large, general-sum extensive-form games (most existing approaches being limited to zero-sum games).

In this thesis, we tackle in these two challenges in two ways. First, we study the inverse problem in game theory, where game parameters are to be learnt given only samples drawn from its quantal response equilibrium. An end-to-end, differentiable game-solving module fully compatible with modern deep learning architectures is presented. This approach allows for the learning of game parameters in two-player normal-form and extensive-form zero-sum games in a scalable manner via gradient descent. Second, we extend state of the art online Nash solvers currently used in zero-sum games to Stackelberg and correlated equilibria in general-sum extensive-form games. In both cases, we present efficient online search algorithms which avoid computation in branches of the game tree not encountered in actual play. These algorithms allow approximate solving of large general-sum games which offline solvers struggle with, while still providing guarantees on performance. Third, we show how to solve Stackelberg equilibrium in large perfect information general-sum games by learning Enforceable Payoff Functions–functions which capture tradeoffs in utility between players–rather than the usual scalar values. We demonstrate how to learn these functions using a variant of fitted value iteration, and quantify the suboptimality incurred by errors in function approximation.

150 pages

Thesis Committee:
J. Zico Kolter (Co-chair)
Fei Fang (Co-chair)
Vincent Conitzer
Tuomas Sandholm
Michael Bowling (University of Alberta)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]