CMU-CS-20-138
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-138

Towards Broader and More Efficient Object
Manipulation via Deep Reinforcement

Yufei Wang

M.S. Thesis

December 2020

CMU-CS-20-138.pdf


Keywords: Reinforcement Learning, Robotic MManipulation, Image-based Reinforcement Learning, Inverse Reinforcement Learning, Self-supervised Learning, Reward Learning, State Marginal Matching, Deformable Object Manipulation, Benchmarking

Reinforcement learning ("RL") has achieved great success in many robotic object manipulation tasks, such as pushing, grasping, tossing, inserting, and more. However, there remain some challenges in applying RL to a broader range of object manipulation tasks in the real world. First, it is challenging to design the correct reward function, as well as to obtain it directly from high-dimensional images in the realworld. Second, although great progress has been made in the regime of rigid object manipulation, manipulating deformable objects remains challenging due to its high dimensional state representation, and complex dynamics. In this thesis, we aim to push forward the application of deep RL to object manipulation, by proposing the following solutions to address these two challenges.

Specifically, for obtaining a reward function directly from images, current image-based RL algorithms typically operate on the whole image without performing object-level reasoning. This leads to ineffective reward functions. In this thesis, we improve upon previous visual self-supervised RL by incorporating object-level reasoning and occlusion reasoning. We use unknown object segmentation to ignore distractors in the scene for better reward computation and goal generation; we further enable occlusion reasoning by employing a novel auxiliary loss and training scheme. We demonstrate that our proposed algorithm, ROLL (Reinforcement learning with Object Level Learning), learns dramatically faster and achieves better final performance compared with previous methods in several simulated visual control tasks.

We further propose a new inverse reinforcement learning method for learning the reward function to match the given expert state density. Our main result is the analytic gradient of any ƒ-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, ƒ-IRL, that recovers a stationary reward function from the expert density by gradient descent. We show that ƒ-IRL can learn behaviors from a hand-designed target state density or implicitly through expert observations. Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks. Moreover, we show that the recovered reward can be used to quickly solve downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics.

To facilitate the research of using deep RL to explore the challenges of deformable object manipulation, in this thesis, we present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments. Our benchmark will enable reproducible research in this important area. Further, we evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms, including dealing with a state representation that has a high intrinsic dimensionality and is partially observable. The experiments and analysis indicate the strengths and limitations of existing methods in the context of deformable object manipulation that can help point the way forward for future methods development.

70 pages

Thesis Committee:
David Held (Chair)
Katerina Fragkiadaki
Deepak Pathak

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]