EFFICIENT AUDIO SOURCE SEPARATION USING MEL-SPECTROGRAMS
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Abstract
Audio source separation deals with extracting a source of audio from a mixture, for example vocals from a musical recording. Recent strides have been made in the release of the Open-Unmix GitHub project in September of 2019 to provide new researchers with a framework to hit the ground running with state-of-the-art techniques. The base architecture uses a 3 layer bidirectional LSTM to complete a pixel-wise regression problem to estimate masks for each source'sspectrogram. The theses explores the idea of replacing the spectrograms in this process with mel-spectrograms which have achieved marginally better results in other audio problems such as speech recognition. A novel inverse function to convert from mel-spectrogram to spectrogram is provided that runs exponentially faster than the best available function with similar accuracy. We found that the results documented by the Open-Unmix project were reproducible and that the mel-spectrogram model did not provide an improvement.
