CMU-CS-20-138 Computer Science Department School of Computer Science, Carnegie Mellon University
Towards Broader and More Efficient Object Yufei Wang M.S. Thesis December 2020
Reinforcement learning ("RL") has achieved great success in many robotic object manipulation tasks, such as pushing, grasping, tossing, inserting, and more. However, there remain some challenges in applying RL to a broader range of object manipulation tasks in the real world. First, it is challenging to design the correct reward function, as well as to obtain it directly from high-dimensional images in the realworld. Second, although great progress has been made in the regime of rigid object manipulation, manipulating deformable objects remains challenging due to its high dimensional state representation, and complex dynamics. In this thesis, we aim to push forward the application of deep RL to object manipulation, by proposing the following solutions to address these two challenges. Specifically, for obtaining a reward function directly from images, current image-based RL algorithms typically operate on the whole image without performing object-level reasoning. This leads to ineffective reward functions. In this thesis, we improve upon previous visual self-supervised RL by incorporating object-level reasoning and occlusion reasoning. We use unknown object segmentation to ignore distractors in the scene for better reward computation and goal generation; we further enable occlusion reasoning by employing a novel auxiliary loss and training scheme. We demonstrate that our proposed algorithm, ROLL (Reinforcement learning with Object Level Learning), learns dramatically faster and achieves better final performance compared with previous methods in several simulated visual control tasks. We further propose a new inverse reinforcement learning method for learning the reward function to match the given expert state density. Our main result is the analytic gradient of any ƒ-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, ƒ-IRL, that recovers a stationary reward function from the expert density by gradient descent. We show that ƒ-IRL can learn behaviors from a hand-designed target state density or implicitly through expert observations. Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks. Moreover, we show that the recovered reward can be used to quickly solve downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics. To facilitate the research of using deep RL to explore the challenges of deformable object manipulation, in this thesis, we present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments. Our benchmark will enable reproducible research in this important area. Further, we evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms, including dealing with a state representation that has a high intrinsic dimensionality and is partially observable. The experiments and analysis indicate the strengths and limitations of existing methods in the context of deformable object manipulation that can help point the way forward for future methods development.
70 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |