CMU-CS-20-110 Computer Science Department School of Computer Science, Carnegie Mellon University
Unsupervised Domain Adaptation for Visual Navigation Shangda (Harry) Li M.S. Thesis May 2020
Recent advances in artificial intelligence, especially in the fields of computer vision and reinforcement learning, have made it possible to train visual navigation agents with great performance in a wide variety of navigation tasks. For example, in computer photo-realistic simulation of real-world apartments, the trained agent can reliably navigate to a specified coordinate, or a room of a specified type such as kitchen and bathroom. When asked to explore as much area as possible under a fixed time budget, the trained agent exhibit great memory of where it has been to and strategic planning. All these tasks require the agent to process raw first-personi mages to construct a meaningful understanding and representation of the room such as where the walls and obstacles are located, and conduct structural and semantic reasoning to determine its path, the room type, or the floor plan. However, for most learning-based navigation agents, the training and testing are done in the same simulation environment. In order for these methods to be practical in the real world, they need to be transferable to unseen environments and non-simulated environments. We propose an unsupervised domain adaptation method for visual navigation, which trains an image translation model that translates the images of the evaluation environment that the agent is never trained on, into images of the training environment where the agent learns to perform the task, so that the agent can recognize the translated images and achieve good performance in the evaluation environment. The image translation model is trained given an already trained agent, so that it could take advantage of the task-relevant representations learned by the agent to ensure those representations are preserved during translation. We conduct both simulation-to-simulation and simulation-to-real-world experiments to demonstrate the effectiveness of our method in helping the trained agents adapt to unseen environments. In the simulation-to-simulation environment, the proposed method outperforms several baselines including direct transfer and popular generic image translation methods such as CycleGAN, across two different visual navigation tasks. In the simulation-to-real-world experiment, the agent enhanced by our method achieves significantly better performance than those without the enhancement. 45 pages
Thesis Committee
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |