CMU-CS-22-144 Computer Science Department School of Computer Science, Carnegie Mellon University
Strategic Behavior Prediction Kevin G.A. Waugh Ph.D. Thesis August 2022
Predicting strategic goal-oriented multi-agent behavior from observations of play is a ubiquitous task that spans not only computer science, but economics, finance, decision theory, and psychology. Unlike in the single-agent setting, agents cannot myopically maximize their utility. Instead, they must reason about the intentions and goals of the others' to compete for common resources or to cooperate to achieve a common goal. This complicates matters dramatically, as the natural solution concepts are computationally intractable and do not fail gracefully when even a single player deviates from the equilibrium. Furthermore, the behavior we observe is typically non-smooth and often intentionally deceptive to improve robustness and incorporate aspects of information hiding, e.g., bluffing, misleading or slow-playing. In this thesis, we advocate estimating dual variables, i.e., utilities and regrets, instead of working with the primal behavior directly. We argue these dual variables are easier to estimate, in both theory and practice, than the primal behavior. In particular, these utilities are often relatively smooth, and exhibit nice problem-specific structure. Additionally, in some cases these dual estimates are all that are available–primal procedures are simply inapplicable. We consider behavior prediction in normal-form games and introduce the inverse correlated equilibria (ICE) polytope. This polytope contains the joint-strategies that have no more internal regret than the demonstrated behavior under a potentially unknown utility function. This is similar in spirit, and inspired by, single-agent imitation learning/inverse reinforcement learning methods like Abbeel and Ng [1] and Ziebart et al. [97], which guarantee the expected reward of the prediction matches the demonstrated behavior. Our method, MaxEnt ICE, predicts the maximum entropy joint-strategy from the ICE polytope. This results in a convex objective that we efficiently optimize with simple gradient-based algorithms as well as strong statistical guarantees on the quality of the resulting prediction. This approach provides robust game-theoretic behavior estimates even when observations are scarce. We experiment with our approach on human behavior from psychological studies, as well as real-world firm market entry behavior. To conclude, we tie our estimation approach to the large body of work on game abstraction and equilibrium computation in zero-sum extensive-form games. In par ticular, a crucial step of equilibrium computation is estimating the utilities induced by a strategy. The chosen abstraction serves as a model class providing computational tractability and improving generalization. Following this insight, we introduce regression counterfactual regret minimization (RCFR), a generalization of CFR, an algorithm for zero-sum equilibrium computation. In essence, RCFR drastically improves flexibility in abstraction design.
83 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |