CMU-CS-09-112 Computer Science Department School of Computer Science, Carnegie Mellon University
Thresholded-Rewards Decision Problems: Colin McMillen April 2009 Ph.D. Thesis
In timed, zero-sum games, winning against the opponent is more important than the final score. A team that is losing near the end of the game may choose to play aggressively to try to even the score before time runs out. In this thesis, we consider the problem of finding optimal policies in stochastic domains with limited time, some notion of score, and in complex environments, such as domains including opponents. This problem is relevant to many intelligent decision making tasks, not just games, as nearly every decision made in the real world depends on time. The work presented in this thesis has broad applications to domains possessing the key features of control under uncertainty, limited time, and some notion of score. We introduce the concept of thresholded-rewards problems as a means to effectively reason about acting in domains with limited time and with some notion of score, progress, or intermediate reward. In a thresholded-rewards problem, the amount of true reward received is determined at the end of the time horizon, by applying an arbitrary threshold function to the amount of intermediate reward (e.g., score) accumulated during execution. We utilize Markov decision processes (MDPs) and semi-Markov decision processes (SMDPs) to model domains with stochastic actions. We introduce algorithms for finding optimal policies in MDPs and SMDPs with thresholded rewards. We also introduce heuristics for finding approximately optimal policies for thresholded-rewards MDPs. We analyze how a team should change strategy in response to an opponent whose behavior is initially unknown but slowly reveals itself during execution. We also introduce a sampling-based control algorithm that allows for effective action in domains in which rewards are hidden from the agent. We perform controlled experiments to evaluate our algorithms in three timed domains. Robot soccer and Capture the Flag are timed, adversarial games in which two teams compete to be ahead in score at the end of the game. We further extend our approach to address the reCAPTCHA domain, in which we are given a set of words that need to be transcribed before some time deadline. The control problem consists of maximizing the probability that all the words have been transcribed before the deadline. Through our theoretical and experimental results, we show that the algorithms presented in this thesis enable effective action in stochastic domains with limited time and some notion of score. 198 pages
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |