EFFICIENT AUDIO SOURCE SEPARATION USING MEL-SPECTROGRAMS

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Subjects

Audio
LSTM
Mel-Spectrogram
Separation
Source
Unmix

Abstract

Audio source separation deals with extracting a source of audio from a mixture, for example vocals from a musical recording. Recent strides have been made in the release of the Open-Unmix GitHub project in September of 2019 to provide new researchers with a framework to hit the ground running with state-of-the-art techniques. The base architecture uses a 3 layer bidirectional LSTM to complete a pixel-wise regression problem to estimate masks for each source'sspectrogram. The theses explores the idea of replacing the spectrograms in this process with mel-spectrograms which have achieved marginally better results in other audio problems such as speech recognition. A novel inverse function to convert from mel-spectrogram to spectrogram is provided that runs exponentially faster than the best available function with similar accuracy. We found that the results documented by the Open-Unmix project were reproducible and that the mel-spectrogram model did not provide an improvement.

EFFICIENT AUDIO SOURCE SEPARATION USING MEL-SPECTROGRAMS

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract