EFFICIENT AUDIO SOURCE SEPARATION USING MEL-SPECTROGRAMS

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

Audio source separation deals with extracting a source of audio from a mixture, for example vocals from a musical recording. Recent strides have been made in the release of the Open-Unmix GitHub project in September of 2019 to provide new researchers with a framework to hit the ground running with state-of-the-art techniques. The base architecture uses a 3 layer bidirectional LSTM to complete a pixel-wise regression problem to estimate masks for each source'sspectrogram. The theses explores the idea of replacing the spectrograms in this process with mel-spectrograms which have achieved marginally better results in other audio problems such as speech recognition. A novel inverse function to convert from mel-spectrogram to spectrogram is provided that runs exponentially faster than the best available function with similar accuracy. We found that the results documented by the Open-Unmix project were reproducible and that the mel-spectrogram model did not provide an improvement.