CMU-CS-20-132 Computer Science Department School of Computer Science, Carnegie Mellon University
Equilibrium Finding for Large Adversarial Noam Brown Ph.D. Thesis September 2020
Imperfect-information games model strategic interactions involving multiple agents with private information. A typical goal in this setting is to approximate an equilibrium in which all agents' strategies are optimal. This thesis describes a number of advances in the computation of equilibria in large adversarial imperfect information games. Together, these new techniques made it possible for the first time for an AI agent to defeat top human professionals in full-scale poker, which had for decades served as a grand challenge problem in the fields of AI and game theory. We begin by introducing faster equilibrium-finding algorithms based on counterfactual regret minimization (CFR), an iterative algorithmic framework that converges to a Nash equilibrium in two-player zero-sum games. We describe new variants of CFR that use discounting to dramatically speed up convergence. These discounting variants are now the state-of-the-art equilibrium-finding algorithms for large adversarial imperfect-information games. We also describe theoretically sound pruning techniques that can speed up CFR by orders of magnitude in large games. Additionally, we introduce the first general technique for warm starting CFR. Next, we describe new ways to scale CFR to extremely large games via automated abstraction and function approximation. In particular, we introduce th first provably locally optimal algorithm for discretizing continuous action spaces in imperfect-information games. We extend this into an algorithm that converges to an equilibrium even in games with continuous action spaces. We also introduce Deep CFR, a form of CFR that uses neural network function approximation rather than bucketing-based abstractions. Deep CFR was the first non-tabular form of CFR to scale to large games and enables CFR to be deployed in settings with little domain knowledge. We also present new search techniques for imperfect-information games that ensure the agent's search strategy cannot be exploited by an adversary. These new forms of search outperform past approaches both in theory and in practice. We describe how these search techniques were used in the construction of Libratus to defeat top humans in two-player no-limit poker for the first time. Additionally, we introduce a method for depth-limited search that is orders of magnitude more efficient than prior approaches. We describe how this depth-limited search technique was incorporated into Pluribus, which defeated top human professionals in multiplayer poker for the first time. Finally, we present an algorithm that combines deep reinforcement learning with search at both training and test time, which takes a major step toward bridging the gap between research on perfect-information games and imperfect-information games.
234 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |