CMU-CS-20-136 Computer Science Department School of Computer Science, Carnegie Mellon University
Striving for Safe and Efficient Deep Reinforcement Learning Harshit Sushil Sikchi M.S. Thesis December 2020
Reinforcement Learning has seen tremendous progress in the past few years solving games like Dota and Starcraft, but little attention has been given to the safety ofdeployed agents. In this thesis, keeping safety in mind, we make progress in different dimensions of Reinforcement learning–Planning, Inverse RL, and Safe Model-FreeRL.
Towards the goal of safe and efficient Reinforcement Learning, we propose:
2) An Inverse Reinforcement Learning method ƒ-IRL that allows specifying preferences using state-marginals or observations only. We derive an analytical gradient to match general ƒ-divergence between agents and experts state marginal. ƒ-IRL achieves more stable convergence than the Adversarial Imitation approaches that rely on min-max optimization. We show that ƒ-IRL outperforms state-of-the-art IRL base-lines in sample efficiency. Moreover, we show that the recovered reward function can be used in downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics. 3) A model-free Safe Reinforcement Learning method, Lyapunov Barrier Policy Optimization (LBPO), that uses a Lyapunov-based barrier function to restrict the policy update to a safe set for each training iteration. Our method also allows the user to control the agent's conservativeness with respect to the constraints in the environment. LBPO significantly outperforms state-of-the-art baselines in terms of the number of constraint violations during training while being competitive in terms of performance.
101 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |