CMU-CS-85-136 Computer Science Department School of Computer Science, Carnegie Mellon University
Learning to Recognize Speech Sounds: A Theory and Model CMU-CS-85-136 Gary L. Bradshaw June 1985 - Thesis Theories of human speech perception have emphasized the role of innate feature detectors in speech comprehension. Empirical evidence suggests that theories based on specilized feature detectors are wrong, and that human listeners improve in their ability to identify the basic sounds of their language. A learning theory of speech perception is proposed to account for the evidence. To test the theory, a computer simulation, NEXUS, was created. When provided with a simple vocabulary of the names of the letters of the alphabet, NEXUS was able to create descriptions of all words, identify the similarities between words, and simplify the network by eliminating redundant information. The resulting word network was used to classify new instances of speech. Performance of NEXUS was superior to that of a state-of-the-art speech recognition system, Cicada, on both speakers tested. NEXUS serves as a sufficiency proof of the learning theory, although the lack of detailed learning data precludes stronger comparisons with human performance. NEXUS also demonstrates that learning heuristics can be very useful in building computer systems to perform perceptual tasks, such as speech recognition or vision. These heuristics do not require statistical assumptions about the form of the distribution underlying the data.
90 pages | |
Return to:
SCS
Technical Report Collection This page maintained by [email protected] |