CMU-CS-20-110
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-110

Unsupervised Domain Adaptation for Visual Navigation

Shangda (Harry) Li

M.S. Thesis

May 2020

CMU-CS-20-110.pdf


Keywords: Machine Learning, Reinforcement Learning, Unsupervised Learning, Visual Navigation, Multimodal Adaptation, Computer Vision, Vision for Robotics, Generative Adversarial Network

Recent advances in artificial intelligence, especially in the fields of computer vision and reinforcement learning, have made it possible to train visual navigation agents with great performance in a wide variety of navigation tasks. For example, in computer photo-realistic simulation of real-world apartments, the trained agent can reliably navigate to a specified coordinate, or a room of a specified type such as kitchen and bathroom. When asked to explore as much area as possible under a fixed time budget, the trained agent exhibit great memory of where it has been to and strategic planning. All these tasks require the agent to process raw first-personi mages to construct a meaningful understanding and representation of the room such as where the walls and obstacles are located, and conduct structural and semantic reasoning to determine its path, the room type, or the floor plan.

However, for most learning-based navigation agents, the training and testing are done in the same simulation environment. In order for these methods to be practical in the real world, they need to be transferable to unseen environments and non-simulated environments.

We propose an unsupervised domain adaptation method for visual navigation, which trains an image translation model that translates the images of the evaluation environment that the agent is never trained on, into images of the training environment where the agent learns to perform the task, so that the agent can recognize the translated images and achieve good performance in the evaluation environment. The image translation model is trained given an already trained agent, so that it could take advantage of the task-relevant representations learned by the agent to ensure those representations are preserved during translation.

We conduct both simulation-to-simulation and simulation-to-real-world experiments to demonstrate the effectiveness of our method in helping the trained agents adapt to unseen environments. In the simulation-to-simulation environment, the proposed method outperforms several baselines including direct transfer and popular generic image translation methods such as CycleGAN, across two different visual navigation tasks. In the simulation-to-real-world experiment, the agent enhanced by our method achieves significantly better performance than those without the enhancement.

45 pages

Thesis Committee
Louis-Philippe Morency (Chair)
Matthew R. Gormley

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]