Year:2019
Supervisor(s):Prof. Sharon Gannot, Yochai Yemini
Student(s):Bnaya Levy, Mordehay Moradi
Filtering out interfering sound signals, a procedure known as source separation has been a longstanding problem in the speech processing community. The visual modality is strongly correlated with its accompanied audio. For example, lip reading and motion trajectories can be exploited to improve the separation quality.
In this project, a source separation technique for e.g. musical instruments was implemented. A weakly supervised approach for source separation was utilised. A pretrained ResNet visual branch first provides weak labels for the objects in the video. On the audio side, the non-negative matrix factorization (NMF) decomposition of the mixture signal is computed, resulting in a learnt dictionary matrix for the magnitude spectrum. The sources are separated by assigning the basis vectorsto the objects defined by the visual labels.