Skip links to main content

Prof. Sharon Gannot

semanticscholar.org

researchgate.net

Home Positions Sponsorship News & Events Contact Me ASP Lab

Publications
Conferences & Workshops

Education

Topic A

Bianco, Michael J. and Gannot, Sharon and Fernandez-Grande, Efren and Gerstoft, Peter, "Semi-supervised source localization in reverberant environments using deep generative modeling", The Journal of the Acoustical Society of America

BibTeX

@article{Bianco2020b,
author = {Bianco, Michael J. and Gannot, Sharon and Fernandez-Grande, Efren and Gerstoft, Peter},
title = {Semi-supervised source localization in reverberant environments using deep generative modeling},
journal = {The Journal of the Acoustical Society of America},
volume = {148},
number = {4_Supplement},
pages = {2662--2662},
year = {2020},
url = {https://doi.org/10.1121/1.5147419},
}

copy to clipboard

We present a method for acoustic source localization in reverberant environments based on semi-supervised machine learning (ML) with deep generative models. Source localization in the presence of reverberation remains a major challenge, which recent ML techniques have shown promise in addressing. Despite often large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. In semi-supervised learning, ML systems are trained using many examples with only few labels, with the goal of exploiting the natural structure of the data. We use variational autoencoders (VAEs), which are generative neural networks (NNs) that rely on explicit probabilistic representations, to model the latent distribution of reverberant acoustic data. VAEs consist of an encoder NN, which maps complex input distributions to simpler parametric distributions (e.g., Gaussian), and a decoder NN which approximates the training examples. The VAE is trained to generate the phase of relative transfer functions (RTFs) between two microphones in reverberant environments, in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The performance this VAE-based approach is compared with conventional and ML-based localization in simulated and real-world scenarios.

Ayal Schwartz and Sharon Gannot and Shlomo E. Chazan, "magnitude or phase? a two stage algorithm for dereverberation"

BibTeX

@misc {schwartz2022magnitude,
title= {Magnitude or Phase? A Two Stage Algorithm for Dereverberation},
author= {Ayal Schwartz and Sharon Gannot and Shlomo E. Chazan},
year= {2022},
eprint= {2211.00607},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

copy to clipboard

submitted

Ayal Schwartz and Sharon Gannot and Shlomo E. Chazan, "magnitude or phase? a two stage algorithm for dereverberation"

BibTeX

@misc {schwartz2022magnitude,
title= {Magnitude or Phase? A Two Stage Algorithm for Dereverberation},
author= {Ayal Schwartz and Sharon Gannot and Shlomo E. Chazan},
year= {2022},
eprint= {2211.00607},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

copy to clipboard

Robust beamforming

A. Barnov, A. Gendelman, A. Schreibman, E. Tzirkel-Hancock, and S. Gannot, "A robust RLS implementation of the ANC block in GSC structures", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Barnov2021robust,
title={A ROBUST {RLS} IMPLEMENTATION OF THE {ANC} BLOCK IN {GSC} STRUCTURES},
author={Anna Barnov and Alex Gendelman and Amos Schreibman and Eli Tzirkel-Hancock and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

Relative transfer function (RTF) estimation

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

. Koldovsky and S. Gannot, "Dictionary-based sparse reconstruction of incomplete relative transfer functions", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Koldovsky2021dictionary,
title={Dictionary-Based Sparse Reconstruction of Incomplete Relative Transfer Functions},
author={Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

Self-Localization and Mapping

Tutorial/Review Paper

Synchronization

Distributed acoustic sensor networks

Binaural

Bayesian methods

T. Dvorkind and S. Gannot, "Speaker localization using the unscented Kalman filter", in Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA), Rutgers University, Piscataway, New-Jersey, USA, Mar. 2005.

BibTeX

@inproceedings{Dvorkind2005UKF,
title={Speaker Localization Using the Unscented {Kalman} Filter},
author={Dvorkind, T.G. and Gannot, S.},
booktitle={Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA)},
year={2005},
month={Mar.},
address={Rutgers University, Piscataway, New-Jersey, USA}
}

copy to clipboard

Simplex analysis

Other

Theoretical study and performance analysis

Echo cancellation and echo-path estimation

Maximum Likelihood and Expectation-Maximization (batch and recursive)

A. Eisenberg, B. Schwartz, and S. Gannot, "Online blind audio source separation using recursive expectation-maximization", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Eisenberg2021online,
title={Online Blind Audio Source Separation using Recursive Expectation-Maximization},
author={Aviad Eisenberg and Boaz Schwartz and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.

Manifold Learning

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

Deep neural networks

Y. Yemini, E. Fetaya, H. Maron, and S. Gannot, "Scene-agnostic multi-microphone speech dereverberation", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Yemini2021agnostic,
title={Scene-Agnostic Multi-Microphone Speech Dereverberation},
author={Yochai Yemini and Ethan Fetaya and Haggai Maron and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

Localization and Tracking

T. Dvorkind and S. Gannot, "Speaker localization using the unscented Kalman filter", in Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA), Rutgers University, Piscataway, New-Jersey, USA, Mar. 2005.

BibTeX

@inproceedings{Dvorkind2005UKF,
title={Speaker Localization Using the Unscented {Kalman} Filter},
author={Dvorkind, T.G. and Gannot, S.},
booktitle={Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA)},
year={2005},
month={Mar.},
address={Rutgers University, Piscataway, New-Jersey, USA}
}

copy to clipboard

Noise reduction

A. Barnov, A. Gendelman, A. Schreibman, E. Tzirkel-Hancock, and S. Gannot, "A robust RLS implementation of the ANC block in GSC structures", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Barnov2021robust,
title={A ROBUST {RLS} IMPLEMENTATION OF THE {ANC} BLOCK IN {GSC} STRUCTURES},
author={Anna Barnov and Alex Gendelman and Amos Schreibman and Eli Tzirkel-Hancock and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

In Review

R. Opochinsky, G. Chechik, and S. Gannot, "Deep ranking-based DOA tracking algorithm", submitted to 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Opochinsky2021rank,
title={Deep Ranking-Based {DOA} Tracking Algorithm},
author={Renana Opochinsky and Gal Chechiky and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAE). Localization in reverberant environments remains a challenge, which machine learning (ML) has shown promise in addressing. Even with large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. We address this issue by perform semi-supervised learning (SSL) with convolutional VAEs. The VAE is trained to generate the phase of relative transfer functions (RTFs), in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The VAE-SSL approach is compared with SRP-PHAT and fully-supervised CNNs. We find that VAE-SLL can outperform both SRP-PHAT and CNN in label-limited scenarios.

Single Microphone

Multi-microphone

Y. Yemini, E. Fetaya, H. Maron, and S. Gannot, "Scene-agnostic multi-microphone speech dereverberation", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Yemini2021agnostic,
title={Scene-Agnostic Multi-Microphone Speech Dereverberation},
author={Yochai Yemini and Ethan Fetaya and Haggai Maron and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

A. Eisenberg, B. Schwartz, and S. Gannot, "Online blind audio source separation using recursive expectation-maximization", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Eisenberg2021online,
title={Online Blind Audio Source Separation using Recursive Expectation-Maximization},
author={Aviad Eisenberg and Boaz Schwartz and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.

A. Barnov, A. Gendelman, A. Schreibman, E. Tzirkel-Hancock, and S. Gannot, "A robust RLS implementation of the ANC block in GSC structures", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Barnov2021robust,
title={A ROBUST {RLS} IMPLEMENTATION OF THE {ANC} BLOCK IN {GSC} STRUCTURES},
author={Anna Barnov and Alex Gendelman and Amos Schreibman and Eli Tzirkel-Hancock and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

. Koldovsky and S. Gannot, "Dictionary-based sparse reconstruction of incomplete relative transfer functions", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Koldovsky2021dictionary,
title={Dictionary-Based Sparse Reconstruction of Incomplete Relative Transfer Functions},
author={Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

Speaker Separation

A. Eisenberg, B. Schwartz, and S. Gannot, "Online blind audio source separation using recursive expectation-maximization", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Eisenberg2021online,
title={Online Blind Audio Source Separation using Recursive Expectation-Maximization},
author={Aviad Eisenberg and Boaz Schwartz and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

In this paper, we present a multiple-speaker direction of arrival (DOA) tracking algorithm with a microphone array that utilizes the recursive EM (REM) algorithm proposed by Cappé and Moulines. In our model, all sources can be located in one of a predefined set of candidate DOAs. Accordingly, the received signals from all microphones are modeled as Mixture of Gaussians (MoG) vectors in which each speaker is associated with a corresponding Gaussian. The localization task is then formulated as a maximum likelihood (ML) problem, where the MoG weights and the power spectral density (PSD) of the speakers are the unknown parameters. The REM algorithm is then utilized to estimate the ML parameters in an online manner, facilitating multiple source tracking. By using Fisher-Neyman factorization, the outputs of the minimum variance distortionless response (MVDR)-beamformer (BF) are shown to be sufficient statistics for estimating the parameters of the problem at hand. With that, the terms for the E-step are significantly simplified to a scalar form. An experimental study demonstrates the benefits of the using proposed algorithm in both a simulated data-set and real recordings from the acoustic source localization and tracking (LOCATA) data-set.

Derverberation

Y. Yemini, E. Fetaya, H. Maron, and S. Gannot, "Scene-agnostic multi-microphone speech dereverberation", in Interspeech, Brno, The Czech Republic, 2021.

BibTeX

@inproceedings{Yemini2021agnostic,
title={Scene-Agnostic Multi-Microphone Speech Dereverberation},
author={Yochai Yemini and Ethan Fetaya and Haggai Maron and Sharon Gannot},
booktitle={Interspeech},
ADDRESS={Brno, The Czech Republic},
year={2021}
}

copy to clipboard

Copyright Notice

Downloading of any paper is permitted for personal use only.
Permission to reprint / republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the author(s) and the respective publisher.

Copyright and all other rights therein are retained by authors or by other copyright holders.

All persons downloading this information are expected to adhere to the terms and constraints invoked by each publisher and author’s copyright.

In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Sharon Gannot