CMU-CS-20-109
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-109

Monaural Source Separation in the Wild

Tianjun Ma

M.S. Thesis

May 2020

CMU-CS-20-109.pdf


Keywords: Machine learning, audio signal processing, monaural source separation, source separation dataset, deep neural network, multi-headed self-attention

Monaural source separation refers to the process of extracting individual components from a mixture, where the mixture is a single-channel audio recording of multiple sources emitting sounds simultaneously, and the individual components are the constituent sounds emitted by each source. In recent years, data-driven approaches using deep neural network-based models for monaural source separation have been shown to outperform their non-data-driven counterparts. However, these approaches are designed using specialized datasets in which the sources belong to a constrained set of categories and the mixtures are not very representative of audio mixtures in the real world. Consequently, whether existing models could generalize to more complex source separation settings is open to questions. In this work, we want study and formalize the notion of monaural source separation in real-world scenarios and explore model designs that adapt to such complex settings. Specifically, we present the Wild-Mix Dataset, a synthetic dataset in which mixtures consist of sources belonging to a variety of sound categories and are synthesized in dynamic ways. We also present ASTNet, the first supervised learning model to utilize multi-headed attention to tackle monaural source separation. We show that the Wild-Mix Dataset is a challenging benchmark for evaluating model performance in complex real-world scenarios and that ASTNet achieves the state-of-the-art performance on the Wild-Mix Dataset.

24 pages

Thesis Committee
Louis-Philippe Morency (Chair)
Bhiksha Raj

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]