CMU-CS-07-124
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-07-124

Error Awareness and Recovery
in Conversational Spoken Language Interfaces

Dan Bohus

May 2007

Ph.D. Thesis

CMU-CS-07-124.pdf


Keywords: Spoken dialog systems, conversational spoken language interfaces, error detection, error recovery strategies, error recovery policies, dialog management, RavenClaw, implicitly-supervised learning

One of the most important and persistent problems in the development of conversational spoken language interfaces is their lack of robustness when confronted with understanding-errors. Most of these errors stem from limitations in current speech recognition technology, and, as a result, appear across all domains and interaction types. There are two approaches towards increased robustness: prevent the errors from happening, or recover from them through conversation, by interacting with the users.

In this dissertation we have engaged in a research program centered on the second approach. We argue that three capabilities are needed in order to seamlessly and efficiently recover from errors: (1) systems must be able to detect the errors, preferably as soon as they happen, (2) systems must be equipped with a rich repertoire of error recovery strategies that can be used to set the conversation back on track, and (3) systems must know how to choose optimally between different recovery strategies at run-time, i.e. they must have good error recovery policies. This work makes a number of contributions in each of these areas.

First, to provide a real-world experimental platform this error handling research program, we developed RavenClaw, a plan-based dialog management framework for task-oriented domains. The framework has a modular architecture that decouples the error handling mechanisms from the domain-specific dialog control logic; in the process, it lessens system authoring effort, promotes portability and reusability, and ensures consistency in error handling behaviors both within and across domains. To date, RavenClaw has been used to develop and successfully deploy a number of spoken dialog systems spanning different domains an interaction types. Together with these systems, RavenClaw provides the infrastructure for the error handling work described in this disseratation.

To detect errors, spoken language interfaces typically rely on confidence scores. In this work we investigated in depth current supervised learning techniques for building error detection models. In addition, we proposed a novel, implicitly-supervised approach for this task. No developer supervision is required in this case; rather, the system obtains the supervision signal online, from naturally-occurring patterns in the interaction. We believe this learning paradigm represents an important step towards constructing autonomously self-improving systems. Furthermore, we developed a scalable, data-driven approach that allows a system to continuously monitor and update beliefs throughout the conversation; the proposed approach leads to significant improvements in both the overall effectiveness and efficiency of the interaction.

We developed and empirically investigated a large set of recovery strategies, targeting two types of understanding-errors that commonly occur in these systems: misunderstandings and nonunderstandings. Our results add to an existing body of knowledge about the advantages and disadvantages of these strategies, and highlight the importance of good recovery policies.

In the last part of this work, we proposed and evaluated a novel online-learning based approach for developing recovery policies. The system constructs runtime estimates for the likelihood of success of each recovery strategy, together with confidence bounds for those estimates. These estimates are then used to construct a policy online, while balancing the system's exploration and exploitation goals. Experiments with a deployed spoken dialog system showed that the system was able to learn a more effective recovery policy in a relatively short time period.

278 pages


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]