CMU-CS-97-173
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-97-173

Lattice Based Language Models

Pierre Dupont*, Ronald Rosenfeld

September 1997

CMU-CS-97-173.ps


Keywords: Speech recognition, statistical language modeling, lattice basedmodels, smoothing techniques


This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5% perplexity reduction over a word trigram model.

28 pages

*Department of Mathematics, University Jean Monnet, 23 rue P. Michelon, 42023 Saint-Etienne Cedex, France.


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by [email protected]