CMU-CS-19-125 Computer Science Department School of Computer Science, Carnegie Mellon University
Representation Learning for Voice Profiling Daanish Ali Khan M.S. Thesis August 2019
Voice-profiling is the deduction of a speakers characteristics from their voice, a problem that has many applications in audio forensics, law enforcement, security and health-care. Speaker characteristics that can determined include the speakers gender, age, and ethnicity along with other physical and demographic characteristics. Prior work on computational voice-profiling techniques modelled the production of voice as a physical system, and defined multiple voice signal features that encode speaker characteristics. Recent advances in artificial neural networks has resulted in an improvement in performance across voice profiling tasks, but such methods are often purely data-driven; the representation and relationships between voice and speaker characteristics are learned from a large dataset, not necessarily leveraging the knowledge-based voice features from prior work. We identify the key challenges of modern voice profiling as being: 1) learning a representation that captures the complex relationship between voice and speakerparameters, 2) designing a representation that is resilient to real world noise, and 3) learning a representation that is generalizable across recording conditions and speaker characteristics. In this work, we combine domain-specific signal-processing features with state of the art neural network techniques to learn a generalizable audio representation for voice-profiling. The learned representation is evaluated on multiple voice-profiling tasks including prediction of speaker gender, native language, and geographical origin. We experimentally show significant improvements in real world performance of voice profiling using our proposed speech representation. 34 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |