Education
Topic A
Bianco, Michael J. and Gannot, Sharon and Fernandez-Grande, Efren and Gerstoft, Peter,
"Semi-supervised source localization in reverberant environments using deep generative modeling",
The Journal of the Acoustical Society of America We present a method for acoustic source localization in reverberant environments based on semi-supervised machine learning (ML) with deep generative models. Source localization in the presence of reverberation remains a major challenge, which recent ML techniques have shown promise in addressing. Despite often large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. In semi-supervised learning, ML systems are trained using many examples with only few labels, with the goal of exploiting the natural structure of the data. We use variational autoencoders (VAEs), which are generative neural networks (NNs) that rely on explicit probabilistic representations, to model the latent distribution of reverberant acoustic data. VAEs consist of an encoder NN, which maps complex input distributions to simpler parametric distributions (e.g., Gaussian), and a decoder NN which approximates the training examples. The VAE is trained to generate the phase of relative transfer functions (RTFs) between two microphones in reverberant environments, in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The performance this VAE-based approach is compared with conventional and ML-based localization in simulated and real-world scenarios.
submitted
Robust beamforming
Relative transfer function (RTF) estimation
Self-Localization and Mapping
Tutorial/Review Paper
Synchronization
Distributed acoustic sensor networks
Binaural
Bayesian methods
T. Dvorkind and S. Gannot,
"Speaker localization using the unscented Kalman filter",
in Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA), Rutgers University, Piscataway, New-Jersey, USA, Mar. 2005. Simplex analysis
Other
Theoretical study and performance analysis
Echo cancellation and echo-path estimation
Maximum Likelihood and Expectation-Maximization (batch and recursive)
A. Eisenberg, B. Schwartz, and S. Gannot,
"Online blind audio source separation using recursive expectation-maximization",
in Interspeech, Brno, The Czech Republic, 2021. In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.
Manifold Learning
Deep neural networks
Localization and Tracking
T. Dvorkind and S. Gannot,
"Speaker localization using the unscented Kalman filter",
in Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA), Rutgers University, Piscataway, New-Jersey, USA, Mar. 2005. Noise reduction
In Review
R. Opochinsky, G. Chechik, and S. Gannot,
"Deep ranking-based DOA tracking algorithm",
submitted to 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021. We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAE). Localization in reverberant environments remains a challenge, which machine learning (ML) has shown promise in addressing. Even with large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. We address this issue by perform semi-supervised learning (SSL) with convolutional VAEs. The VAE is trained to generate the phase of relative transfer functions (RTFs), in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The VAE-SSL approach is compared with SRP-PHAT and fully-supervised CNNs. We find that VAE-SLL can outperform both SRP-PHAT and CNN in label-limited scenarios.
Single Microphone
Multi-microphone
A. Eisenberg, B. Schwartz, and S. Gannot,
"Online blind audio source separation using recursive expectation-maximization",
in Interspeech, Brno, The Czech Republic, 2021. In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.
Speaker Separation
A. Eisenberg, B. Schwartz, and S. Gannot,
"Online blind audio source separation using recursive expectation-maximization",
in Interspeech, Brno, The Czech Republic, 2021. In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.
Derverberation
Copyright Notice
Downloading of any paper is permitted for personal use only.
Permission to reprint / republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the author(s) and the respective publisher.
Copyright and all other rights therein are retained by authors or by other copyright holders.
All persons downloading this information are expected to adhere to the terms and constraints invoked by each publisher and author’s copyright.
In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Sharon Gannot