Prof. Sharon Gannot | Publications | Conferences & Workshops

Daniel Fejgin and Elior Hadad and Sharon Gannot and Zbyněk Koldovský and Simon Doclo, "Comparison of frequency-fusion mechanisms for Binaural direction-of-arrival estimation for multiple speakers", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Seoul. Korea , April. 2024

BibTeX

@INPROCEEDINGS{Fejgin2024Comparison,
author={Daniel Fejgin and Elior Hadad and Sharon Gannot and Zbyn{v{e}}k Koldovsk{`y} and Simon Doclo1},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={COMPARISON OF FREQUENCY-FUSION MECHANISMS FOR
BINAURAL DIRECTION-OF-ARRIVAL ESTIMATION FOR MULTIPLE SPEAKERS},
year={2024},
month = apr,
address={Seoul. Korea}
}

copy to clipboard

Idan Cohen and Sharon Gannot and Ofir Lindenbaum, "Unsupervised acoustic scene mapping based on acoustic features and dimensionality reduction", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Seoul. Korea , April. 2024

BibTeX

@INPROCEEDINGS{Cohen2024unsupervised,
author={Idan Cohen and Sharon Gannot and Ofir Lindenbaum},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={UNSUPERVISED ACOUSTIC SCENE MAPPING BASED ON ACOUSTIC FEATURES AND DIMENSIONALITY REDUCTION},
year={2024},
month = apr,
address={Seoul. Korea}

copy to clipboard

Antonio Andriella and Raquel Ros and Yoav Ellinson and Sharon Gannot and S'everin Lemaignan, "Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots", in ACM/IEEE International Conference on Human Robot Interaction (HRI) Boulder, Colorado, USA. Mar. 2024

@INPROCEEDINGS{Hu2022GRHC,
author={Yonggang Hu and Sharon Gannot and Thushara D. Abhayapala},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={GENERALIZED RELATIVE HARMONIC COEFFICIENTS},
year={2023},
month = jun,
address={Rhodes Island, Greece}
}

copy to clipboard

In literature, sound source localization under the far- and near-field scenarios are mostly addressed as independent tasks using different approaches. This causes a tedious task to detect the type of sound-field, whereas in practice there may not be a clear boundary between the far- and near-field soundfield. In contrast, this paper proposes a multi-channel feature denoted generalized relative harmonic coefficients (generalized RHC) in the spherical harmonics domain, which can equally localize both far- and near-field sound source without requiring any adjustments. We derive the analytical expression of this feature and summarize its unique properties, which facilitate two single-source directional-of-arrival estimators: (i) using a full grid search over the directional space; and (ii) a closed-form solution without any grid search. Experimental study in realistic noisy and reverberant environments under both near-field and far-field conditions validates the efficacy of the proposed algorithm.

Ayal Schwartz and Sharon Gannot and Shlomo E. Chazan, "magnitude or phase? a two stage algorithm for dereverberation"

BibTeX

copy to clipboard

audiogithub

G. F. Miller, A. Brendel, W. Kellermann, and S. Gannot, "Misalignment recognition in acoustic sensor networks using a semi-supervised source estimation method and Markov random fields", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Toronto, Ontario, Canada, Jun. 2021.

abstractBibTeX

@inproceedings{Opochinsky2021rank,
title={Deep Ranking-Based {DOA} Tracking Algorithm},
author={Renana Opochinsky and Gal Chechiky and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAE). Localization in reverberant environments remains a challenge, which machine learning (ML) has shown promise in addressing. Even with large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. We address this issue by perform semi-supervised learning (SSL) with convolutional VAEs. The VAE is trained to generate the phase of relative transfer functions (RTFs), in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The VAE-SSL approach is compared with SRP-PHAT and fully-supervised CNNs. We find that VAE-SLL can outperform both SRP-PHAT and CNN in label-limited scenarios.

A. Barnov, A. Gendelman, A. Schreibman, E. Tzirkel-Hancock, and S. Gannot, "A robust RLS implementation of the ANC block in GSC structures", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Barnov2021robust,
title={A ROBUST {RLS} IMPLEMENTATION OF THE {ANC} BLOCK IN {GSC} STRUCTURES},
author={Anna Barnov and Alex Gendelman and Amos Schreibman and Eli Tzirkel-Hancock and Sharon Gannot},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

A. Sofer, T. Kounovsky, J. Cmejla, Z. Koldovsky, and S. Gannot, "Robust relative transfer function identification on manifolds for speech enhancement", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Sofer2021robust,
title={Robust Relative Transfer Function Identification on Manifolds for Speech Enhancement},
author={Sofer, Amit and Kounovsk{`y}, Tom{'a}{v{s}} and v{C}mejla, Jaroslav and Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

audio

. Koldovsky and S. Gannot, "Dictionary-based sparse reconstruction of incomplete relative transfer functions", in the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021.

BibTeX

@inproceedings{Koldovsky2021dictionary,
title={Dictionary-Based Sparse Reconstruction of Incomplete Relative Transfer Functions},
author={Koldovsk{`y}, Zbyn{v{e}}k and Gannot, Sharon},
booktitle={29th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Dublin, Ireland},
year={2021}
}

copy to clipboard

A. Bross, B. Laufer-Goldshtein, and S. Gannot, "Multiple speaker localization using mixture of Gaussian model with manifold-based centroids", in 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020.

abstractBibTeX

BibTeX

@inproceedings{bross2021multiple,
title={Multiple speaker localization using mixture of Gaussian model with manifold-based centroids},
author={Bross, Avital and Laufer-Goldshtein, Bracha and Gannot, Sharon},
booktitle={28th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Amsterdam, The Netherlands},
pages={895--899},
year={2021}
}

copy to clipboard

A data-driven approach for multiple speakers localization in reverberant enclosures is presented. The approach combines semi-supervised learning on multiple manifolds with unsupervised maximum likelihood estimation. The relative transfer functions (RTFs) are used in both stages of the proposed algorithm as feature vectors, which are known to be related to source positions. The microphone positions are not known. In the training stage, a nonlinear, manifold-based, mapping between RTFs and source locations is inferred using single-speaker utterances. The inference procedure utilizes two RTF datasets: A small set of RTFs with their associated position labels; and a large set of unlabelled RTFs. This mapping is used to generate a dense grid of localized sources that serve as the centroids of a Mixture of Gaussians (MoG) model, used in the test stage of the algorithm to cluster RTFs extracted from multiple-speakers utterances. Clustering is applied by applying the expectation-maximization (EM) procedure that relies on the sparsity and intermittency of the speech signals. A preliminary experimental study, with either two or three overlapping speakers in various reverberation levels, demonstrates that the proposed scheme achieves high localization accuracy compared to a baseline method using a simpler propagation model.

Y. Hu, T. Abhayapala, P. N. Samarasinghe, and S. Gannot, "Decoupled direction-of-arrival estimations using relative harmonic coefficients,", in 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020.

abstractBibTeX

BibTeX

@inproceedings{hu2021decoupled,
title={Decoupled direction-of-arrival estimations using relative harmonic coefficients},
author={Hu, Yonggang and Abhayapala, Thushara D and Samarasinghe, Prasanga N and Gannot, Sharon},
booktitle={2020 28th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Amsterdam, The Netherlands},
pages={246--250},
year={2021}
}

copy to clipboard

Traditional source direction-of-arrival (DOA) estimation algorithms generally localize the elevation and azimuth simultaneously, requiring an exhaustive search over the two-dimensional (2-D) space. By contrast, this paper presents two decoupled source DOA estimation algorithms using a recently introduced source feature called the relative harmonic coefficients. They are capable to recover the source’s elevation and azimuth separately, since the elevation and azimuth components in the relative harmonic coefficients are decoupled. The proposed algorithms are highlighted by a large reduction of computational complexity, thus enable a direct application for sound source tracking. Simulation results, using both a static and moving sound source, confirm the proposed methods are computationally efficient while achieving competitive localization accuracy.

J. Cmejla, T. Kounovsky, S. Gannot, Z. Koldovsky, and P. Tandeitnik, "MIRaGe: Multichannel database of room impulse responses measured on high-resolution cube-shaped grid", in 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020.

abstractBibTeX

BibTeX

@inproceedings{cmejla2021mirage,
title={MIRaGe: Multichannel Database of Room Impulse Responses Measured on High-Resolution Cube-Shaped Grid},
author={{v{C}}mejla, Jaroslav and Kounovsk{`y}, Tom{'a}{v{s}} and Gannot, Sharon and Koldovsk{`y}, Zbyn{v{e}}k and Tandeitnik, Pinchas},
booktitle={28th European Signal Processing Conference (EUSIPCO)},
pages={56--60},
ADDRESS={Amsterdam, The Netherlands},
year={2021}
}

copy to clipboard

We introduce a database of multi-channel recordings performed in an acoustic lab with adjustable reverberation time. The recordings provide detailed information about room acoustics for positions of a source within a confined area. In particular, the main positions correspond to 4104 vertices of a cube-shaped dense grid within a 46 × 36 × 32 cm volume. The database can serve for simulations of a real-world situations and as a tool for detailed analyses of beampatterns of spatial processing methods. It could be used also for training and testing of mathematical models of the acoustic field.

Y. Laufer and S. Gannot, "A Bayesian hierarchical model for blind audio source separation", in 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020.

abstractBibTeX

BibTeX

@inproceedings{laufer2021bayesian,
title={A Bayesian hierarchical model for blind audio source separation},
author={Laufer, Yaron and Gannot, Sharon},
booktitle={28th European Signal Processing Conference (EUSIPCO)},
ADDRESS={Amsterdam, The Netherlands},
pages={276--280},
year={2021}
}

@INPROCEEDINGS{Adler2018,
AUTHOR="Aviel Adler and Ofer Schwartz and Sharon Gannot",
TITLE="A weighted multichannel {Wiener} filter and its decomposition to {LCMV} beamformer and post-filter
for source separation and noise reduction",
note={Best paper award},
BOOKTITLE="International conference on the science of electrical engineering (ICSEE)",
ADDRESS="Eilat, Israel",
MONTH=dec,
YEAR="2018"
}

copy to clipboard

Speech enhancement and source separation are well-known challenges in the context of hands-free communication and automatic speech recognition. The multichannel Wiener filter (MCWF) that satisfies the minimum mean square error (MMSE) criterion, is a fundamental speech enhancement tool. However, it can suffer from speech distortion, especially when the noise level is high. The speech distortion weighted multichannel Wiener filter (SDW-MWF) was therefore proposed to control the tradeoff between noise reduction and speech distortion for the single-speaker case. In this paper, we generalize this estimator and propose a method for controlling this tradeoff in the multi-speaker case. The proposed estimator is decomposed into two successive stages: 1) a multi-speaker linearly constrained minimum variance (LCMV), which is solely determined by the spatial characteristics of the speakers; and 2) a multi-speaker Wiener postfilter (PF), which is responsible for reducing the residual noise. The proposed PF consists of several controlling parameters that can almost independently control the tradeoff between the distortion of each speaker and the total noise reduction.

H. Hammer, G. Rath, S. E. Chazan, J. Goldberger, and S. Gannot, "Speech enhancement with deep neural networks using mixture of Gaussians based labels", in International conference on the science of electrical engineering (ICSEE), Eilat, Israel, Dec. 2018.

@INPROCEEDINGS{Markovich-Golan2018Performance,
AUTHOR="Shmulik Markovich-Golan and Sharon Gannot and Walter Kellermann",
TITLE="Performance Analysis of the {Covariance-Whitening} and the {Covariance-Subtraction} Methods for Estimating the Relative Transfer
Function",
BOOKTITLE="The 26th European Signal Processing Conference (EUSIPCO)",
ADDRESS="Rome, Italy",
MONTH=sep,
YEAR=2018
}

copy to clipboard

Estimation of the relative transfer functions (RTFs) vector of a desired speech source is a fundamental problem in the design of data-dependent spatial filters. We present two common estimation methods, namely the covariance-whitening (CW) and the covariance-subtraction (CS) methods. The CW method has been shown in prior work to outperform the CS method. However, thus far its performance has not been analyzed. In this paper, we analyze the performance of the CW and CS methods and show that in the cases of spatially white noise and of uniform powers of desired speech source and coherent interference over all microphones, the CW method is superior. The derivations are validated by comparing them to their empirical counterparts in Monte Carlo experiments. In fact, the CW method outperforms the CS method in all tested scenarios, although there may be rare scenarios for which this is not the case.

A. Brendel, S. Gannot, and W. Kellermann, "Localization of multiple simultaneously active speakers in an acoustic sensor network", in IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM), Sheffield, United Kingdom (Great Britain), Jul. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Brendel2018Localization,
AUTHOR="Andreas Brendel and Sharon Gannot and Walter Kellermann",
TITLE="Localization of Multiple Simultaneously Active Speakers in an Acoustic
Sensor Network",
BOOKTITLE="IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM)",
ADDRESS="Sheffield, United Kingdom (Great Britain)",
MONTH=jul,
YEAR=2018
}

copy to clipboard

This paper addresses the localization of an unknown number of acoustic sources in an enclosure. We extend a well established algorithm for localization of acoustic sources, which is based on the Expectation Maximization (EM) algorithm for clustering of phase differences by a Gaussian mixture model. Supporting a more appropriate probabilistic model for spherical data such as direction of arrival or phase differences, the von Mises distribution is used to derive a localization algorithm for multiple simultaneously active sources. Experiments with simulated room impulse responses confirm the superiority of the proposed algorithm to the existing method in terms of localization performance.

X. Li, B. Mourgue, L. Girin, S. Gannot, and R. P. Horaud, "Online localization of multiple moving speakers in reverberant environments", in IEEE 10th Sensor Array and Multichannel

abstractBibTeX

BibTeX

@INPROCEEDINGS{Li2018Online,
AUTHOR="Xiaofei Li and Bastien Mourgue and Laurent Girin and Sharon Gannot and Radu P. Horaud",
TITLE="Online Localization of Multiple Moving Speakers in Reverberant Environments",
BOOKTITLE="IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM)",
ADDRESS="Sheffield, United Kingdom (Great Britain)",
MONTH=jul,
YEAR=2018
}

copy to clipboard

This paper addresses the problem of online multiple moving speakers localization in reverberant environments. The direct-path relative transfer function (DP-RTF), as defined by the ratio between the first taps of the convolutive transfer function (CTF) of two microphones, encodes the inter-channel direct-path information and is thus used as a localization feature being robust against reverberation. The CTF estimation is based on the cross-relation method. In this work, the recursive least-square method is proposed to solve the cross-relation problem, due to its relatively low computational cost and its good convergence rate. The DP-RTF feature estimated at each time-frequency bin is assumed to correspond to a single speaker. A complex Gaussian mixture model is used to assign each observed feature to one among several speakers. The recursive expectation-maximization algorithm is adopted to update online the model parameters. The method is evaluated with a new dataset containing multiple moving speakers, where the ground-truth speaker trajectories are recorded with a motion capture system.

S. E. Chazan, S. Gannot, and J. Goldberger,, "Training strategies for deep latent models and applications to speech presence probability estimation", in The 14th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Guildford, UK, Jul. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Chazan2018strategies,
AUTHOR="Shlomo E. Chazan and Sharon Gannot and Jacob Goldberger",
TITLE="Training Strategies for Deep Latent Models and Applications to Speech Presence Probability Estimation",
BOOKTITLE="The 14th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)",
ADDRESS="Guildford, UK",
MONTH=jul,
YEAR=2018
}

copy to clipboard

In this study we address models with latent variable in the context of neural networks. We analyze a neural network architecture, mixture of deep experts (MoDE), that models latent variables using the mixture of expert paradigm. Learning the parameters of latent variable models is usually done by the expectation-maximization (EM) algorithm. However, it is well known that back-propagation gradient-based algorithms are the preferred strategy for training neural networks. We show that in the case of neural networks with latent variables, the back-propagation algorithm is actually a recursive variant of the EM that is more suitable for training neural networks. To demonstrate the viability of the proposed MoDE network it is applied to the task of speech presence probability estimation, widely applicable to many speech processing problem, e.g. speaker diarization and separation, speech enhancement and noise reduction. Experimental results show the benefits of the proposed architecture over standard fully-connected networks with the same number of parameters.

S. E. Chazan, J. Goldberger, and S. Gannot, "DNN-based concurrent speakers detector and its application to speaker extraction with LCMV beamforming", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, Apr. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Chazan2018a,
AUTHOR={Shlomo E. Chazan and Jacob Goldberger and Sharon Gannot},
TITLE="{DNN}-BASED CONCURRENT SPEAKERS DETECTOR AND ITS APPLICATION TO SPEAKER EXTRACTION WITH {LCMV} BEAMFORMING",
BOOKTITLE="IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP)",
ADDRESS="Calgary, Alberta, Canada",
MONTH=apr,
YEAR=2018,
}

copy to clipboard

audio

In this paper, we present a new control mechanism for LCMV beamforming. Application of the LCMV beamformer to speaker separation tasks requires accurate estimates of its building blocks, e.g. the noise spatial cross-power spectral density (cPSD) matrix and the relative transfer function (RTF) of all sources of interest. An accurate classification of the input frames to various speaker activity patterns can facilitate such an estimation procedure. We propose a DNN-based concurrent speakers detector (CSD) to classify the noisy frames. The CSD, trained in a supervised manner using a DNN, classifies noisy frames into three classes: 1) all speakers are inactive – used for estimating the noise spatial cPSD matrix; 2) a single speaker is active – used for estimating the RTF of the active speaker; and 3) more than one speaker is active – discarded for estimation purposes. Finally, using the estimated blocks, the LCMV beamformer is constructed and applied for extracting the desired speaker from a noisy mixture of speakers.

B. Laufer-Goldshtein, R. Talmon, I. Cohen, and S. Gannot, "Multi-view source localization based on power ratios", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, Apr. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Laufer-Goldshtein2018a,
AUTHOR={Bracha Laufer-Goldshtein and Ronen Talmon and Israel Cohen and Sharon Gannot},
TITLE="MULTI-VIEW SOURCE LOCALIZATION BASED ON POWER RATIOS",
BOOKTITLE="IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP)",
ADDRESS="Calgary, Alberta, Canada",
MONTH=apr,
YEAR=2018,
}

copy to clipboard

Despite attracting significant research efforts, the problem of source localization in noisy and reverberant environments remains challenging. Novel learning-based methods attempt to solve the problem by modelling the acoustic environment from the observed data. Typically, appropriate feature vectors are defined, and then used for constructing a model, which maps the extracted features to the corresponding source positions. In this paper, we focus on localizing a source using a distributed network with several arrays of unidirectional microphones. We introduce new feature vectors, which utilize the special characteristic of unidirectional microphones, receiving different parts of the reverberated speech. The new features are computed locally for each array, using the power-ratios between its measured signals, and are used to construct a local model, representing the unique view point of each array. The models of the different arrays, conveying distinct and complementing structures, are merged by a Multi-View Gaussian Process (MVGP), mapping the new features to their corresponding source positions. Based on this unifying model, a Bayesian estimator is derived, exploiting the relations conveyed by the covariance terms of the MVGP. The resulting localizer is shown to be robust to noise and reverberation, utilizing a computationally efficient feature extraction.

X. Li, S. Gannot, L. Girin, and R. Horaud, "Multisource MINT using convolutive transfer function", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, Apr. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Li2018a,
AUTHOR={Xiaofei Li and Sharon Gannot and Laurent Girin and Radu Horaud},
TITLE="MULTISOURCE {MINT} USING CONVOLUTIVE TRANSFER FUNCTION",
BOOKTITLE="IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP)",
ADDRESS="Calgary, Alberta, Canada",
MONTH=apr,
YEAR=2018,
}

copy to clipboard

The multichannel inverse filtering method, i.e. multiple input/output inverse theorem (MINT), is widely used. However, it is usually performed in the time domain, and based on the long room impulse responses, thus it has a high computational complexity and a large number of near-common zeros. In this paper, we propose to perform MINT in the short-time Fourier transform (STFT) domain, in which the time-domain filter is approximated by the convolutive transfer function. The oversampled STFT is used to avoid frequency aliasing, which however leads to a common zero region in the subband frequency response due to the frequency response of the STFT window. A new inverse filtering target function concerning the STFT window is proposed to overcome this problem. In addition, unlike most studies using MINT for single source dereverberation, the multisource MINT is proposed for both source separation and dereverberation.

Y. Laufer and S. Gannot, "A Bayesian hierarchical model for speech enhancement", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, Apr. 2018.

abstractBibTeX

BibTeX

@INPROCEEDINGS{LauferYaron2018a,
AUTHOR={Yaron Laufer and Sharon Gannot},
TITLE="A {B}AYESIAN HIERARCHICAL MODEL FOR SPEECH ENHANCEMENT",
BOOKTITLE="IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP)",
ADDRESS="Calgary, Alberta, Canada",
MONTH=apr,
YEAR=2018,
}

@INPROCEEDINGS{Hadad2016d,
AUTHOR="Elior Hadad and Daniel Marquardt and Simon Doclo and Sharon Gannot",
TITLE="Comparison of binaural multichannel {Wiener} filters with binaural cue preservation of the interfering source",
BOOKTITLE="International conference on the science of electrical engineering (ICSEE)",
ADDRESS="Eilat, Israel",
MONTH=nov,
YEAR="2016"
}

copy to clipboard

An important objective of binaural speech enhancement algorithms is the preservation of the binaural cues of the sources, in addition to noise reduction. The binaural multichannel Wiener filter (MWF) preserves the binaural cues of the target but distorts the noise binaural cues. To optimally benefit from binaural unmasking and to preserve the spatial impression for the hearing aid user, two extensions of the binaural MWF have therefore been proposed, namely, the MWF with partial noise estimation (MWF-N) and MWF with interference reduction (MWF-IR). In this paper, the binaural cue preservation of these extensions is analyzed theoretically. Although both extensions are aimed at incorporating the binaural cue preservation of the interferer in the binaural MWF cost function, their properties are different. For the MWF-N, while the binaural cues of the target are preserved, there is a tradeoff between the noise reduction and the preservation of the binaural cues of the interferer component. For the MWF-IR, while the binaural cues of the interferer are preserved, those of the target may be slightly distorted. The theoretical results are validated by simulations using binaural hearing aids, demonstrating the capabilities of these beamformers in a reverberant environment.

D. Y. Levin and S. Gannot, "A statistical model for room impulse responses encompassing early and late reflections", in International conference on the science of electrical engineering (ICSEE), Eilat, Israel, Nov. 2016.

BibTeX

@INPROCEEDINGS{Levin2016a,
AUTHOR="Dovid Y. Levin and Sharon Gannot",
TITLE="A statistical model for room impulse responses encompassing early and late reflections",
BOOKTITLE="International conference on the science of electrical engineering (ICSEE)",
ADDRESS="Eilat, Israel",
MONTH=nov,
YEAR="2016"
}

copy to clipboard

B. Laufer-Goldshtein, R. Talmon, and S. Gannot, "A real life experimental study on semi-supervised source localization based on manifold regularization", in International conference on the science of electrical engineering (ICSEE), Eilat, Israel, Nov. 2016.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Laufer-Goldshtein2016a,
AUTHOR="Bracha Laufer-Goldshtein and Ronen Talmon and Sharon Gannot",
TITLE="A real-life experimental study on semi-supervised source localization based on manifold regularization",
BOOKTITLE="International conference on the science of electrical engineering (ICSEE)",
ADDRESS="Eilat, Israel",
MONTH=nov,
YEAR="2016"
}

copy to clipboard

Recently, we have presented a semi-supervised approach for sound source localization based on manifold regularization. The idea is to estimate the function that maps each relative transfer function (RTF) to its corresponding position. The estimation is based on an optimization problem which takes into consideration the geometric structure of the RTF samples, which is empirically deduced from prerecorded training measurements. The solution is appropriately constrained to be smooth, meaning that similar RTFs are mapped to close positions. In this paper, we conduct a comprehensive experimental study with real-life recordings to examine the algorithm performance in actual noisy and reverberant conditions. The influence of the amount of training data as well as changes in the environmental conditions are also being examined. We show that the algorithm attains accurate localization in such challenging conditions.

A. Barnov, A. Cohen, M. Agmon, V. B. Bracha, S. Markovich-Golan, and S. Gannot, "A dynamic TF-GSC beamformer for distributed arrays with dual-resolution speech-presence-probability estimators", in International conference on the science of electrical engineering (ICSEE), Eilat, Israel, Nov. 2016.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Barnov2016,
AUTHOR="Anna Barnov and Alejandro Cohen and Morag Agmon and Vered Bar Bracha and Shmulik Markovich-Golan and Sharon Gannot",
TITLE="A Dynamic {TF-GSC} Beamformer for Distributed Arrays with Dual-Resolution Speech-Presence-Probability Estimators",
BOOKTITLE="International conference on the science of electrical engineering (ICSEE)",
ADDRESS="Eilat, Israel",
MONTH=nov,
YEAR="2016"
}

copy to clipboard

The problem of speech enhancement using a distributed microphones array in a dynamic scenario where speaker, noise and microphone arrays are free to move is considered. The transfer function generalized sidelobe canceler (TF-GSC) spatial filter [1] which optimizes the minimum variance distortionless response (MVDR) criterion is used for enhancing the desired speech signal. A novel speech presence probability (SPP) estimator is proposed based on [2]. By using a dual-resolution SPP, the proposed estimator is able to detect noise dominant frequencies during speech, and thus improve noise tracking capability. We test the proposed algorithm in real dynamic scenarios, and demonstrate its consistent signal to noise ratio (SNR) improvement using a distributed microphone array consisting of 2 devices and 4 microphones.

S. E. Chazan, S. Gannot, and J. Goldberger, "A phoneme-based pre-training approach for deep neural network with application to speech enhancement", in International Workshop on Acoustic Signal Enhancement (IWAENC), Xi'an, China, Sep. 2016.

abstractBibTeX

BibTeX

@INPROCEEDINGS{Chazan2016a,
AUTHOR="Shlomo E. Chazan and Sharon Gannot and Jacob Goldberger",
TITLE="A phoneme-based pre-training approach for deep neural network with application to speech enhancement",
BOOKTITLE="International Workshop on Acoustic Signal Enhancement (IWAENC)",
ADDRESS="Xi'an, China",
MONTH=sep,
YEAR="2016",
note="Best paper award."
}

BibTeX

@INPROCEEDINGS{Schwartz2016,
AUTHOR={Ofer Schwartz and Sharon Gannot and Emanu"{e}l A.P. Habets},
TITLE="Joint maximum likelihood estimation of late reverberant and speech power spectral density in noisy environments",
BOOKTITLE="IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP)",
ADDRESS="Shanghai, China",
MONTH=mar,
YEAR=2016,
}

copy to clipboard

An estimate of the power spectral density (PSD) of the late reverberation is often required by dereverberation algorithms. In this work, we derive a novel multichannel maximum likelihood (ML) estimator for the PSD of the reverberation that can be applied in noisy environments. Since the anechoic speech PSD is usually unknown in advance, it is estimated as well. As a closed-form solution for the maximum likelihood estimator is unavailable, a Newton method for maximizing the ML criterion is derived. Experimental results show that the proposed estimator provides an accurate estimate of the PSD, and outperforms competing estimators. Moreover, when used in a multi-microphone dereverberation and noise reduction algorithm, the best performance in terms of the log-spectral distance is achieved when employing the proposed PSD estimator.

X. Li, L. Girin, R. Horaud, and S. Gannot, "Noise power spectral density estimation based on regional statistics", in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Shanghai, China, Mar. 2016.

@inproceedings{Gannot2008filter,
title={A Filter Design and Implementation Experiment Using Simulink and {Texas Instruments C6713DSK} Board},
author={Gannot, S.},
booktitle={European DSP Education and Research Symposium (EDERS)},
year={2008},
month={Jun.},
address={Tel-Aviv, Israel}
}

copy to clipboard

A. Meiri, S. Melman, J. Fainguelernt, and S. Gannot, "Real time implementation of convolutive blind source separation using TI-6713DSK board,", in European DSP Education and Research Symposium (EDERS), Tel-Aviv, Israel, Jun. 2008.

@inproceedings{habets2006dual,
title={Dual-microphone speech dereverberation in a noisy environment},
author={Habets, E.A.P. and Gannot, S. and Cohen, I.},
booktitle={The IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)},
pages={651--655},
year={2006},
month={Ayg.},
address={Vancouver, Canada}
}

copy to clipboard

Speech signals recorded with a distant microphone usually contain reverberation and noise, which degrade the fidelity and intelligibility of speech, and the recognition performance of automatic speech recognition systems. In E. Habets (2005) presented a multi-microphone speech dereverberation algorithm to suppress late reverberation in a noise-free environment. In this paper we show how an estimate of the late reverberant energy can be obtained from noisy observations. A more sophisticated speech enhancement technique based on the optimally-modified log spectral amplitude (OM-LSA) estimator is used to suppress the undesired late reverberant signal and noise. The speech presence probability used in the OM-LSA is extended to improve the decision between speech, late reverberation and noise. Experiments using simulated and real acoustic impulse responses are presented and show significant reverberation reduction with little speech distortion

S. Tabiby, N. Tal, J. Fainguelernt, and S. Gannot, "Real-time implementation of a subspace dere- verberation method", in European DSP Education and Research Symposium (EDERS), Munich, Germany, Apr. 2006.

BibTeX

@inproceedings{Tabiby2006,
title={Real-Time Implementation of a Subspace Dereverberation Method},
author={Tabiby, S. and Tal, N. and Fainguelernt, J. and Gannot, S.},
booktitle={European DSP Education and Research Symposium (EDERS)},
year={2006},
month={Apr.},
address={Munich, Germany}
}

copy to clipboard

H. Bluemanfeld, Y. Rahamim, and S. Gannot, "Real-time implementation of an energy-based voice activity detector", in European DSP Education and Research Symposium (EDERS), Mu- nich, Germany, Apr. 2006.

BibTeX

@inproceedings{Blumenfeld2006,
title={Real-Time Implementation of an Energy-Based Voice Activity Detector},
author={Bluemanfeld, H. and Rahamim, Y. and Gannot, S.},
booktitle={European DSP Education and Research Symposium (EDERS)},
year={2006},
month={Apr.},
address={Munich, Germany}
}

copy to clipboard

G. Reuven, S. Gannot, and I. Cohen, "Dual source TF-GSC and its application to echo cancellation", in The International Workshop on Acoustic Echo and Noise Control (IWAENC), Eindhoven, the Netherlands, Sep. 2005, pp. 89–92.

BibTeX

@inproceedings{Reuven2005dual,
title={Dual Source {TF-GSC} and its Application to Echo Cancellation},
author={Reuven, G. and Gannot, S. and Cohen, I.},
booktitle={The International Workshop on Acoustic Echo and Noise Control (IWAENC)},
address={Eindhoven, the Netherlands},
month={Sep.},
pages={89--92},
year={2005}
}

copy to clipboard

T. Dvorkind and S. Gannot, "Speaker localization using the unscented Kalman filter", in Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA), Rutgers University, Piscataway, New-Jersey, USA, Mar. 2005.

BibTeX

@inproceedings{Dvorkind2005UKF,
title={Speaker Localization Using the Unscented {Kalman} Filter},
author={Dvorkind, T.G. and Gannot, S.},
booktitle={Joint workshop on Hand-Free Speech Communication and Microphone Arrays (HSCMA)},
year={2005},
month={Mar.},
address={Rutgers University, Piscataway, New-Jersey, USA}
}

copy to clipboard

G. Reuven, S. Gannot, and I. Cohen, "Multichannel acoustic echo cancellation and noise reduction in reverberant environments using the transfer-function GSC", in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii, USA, Apr. 2007.

BibTeX

@inproceedings{reuven2007multichannel,
title={Multichannel acoustic echo cancellation and noise reduction in reverberant environments using the transfer-function {GSC}},
author={Reuven, G. and Gannot, S. and Cohen, I.},
booktitle={the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
month={Apr.},
year={2007},
address={Honolulu, Hawaii, USA}
}

copy to clipboard

G. Reuven, S. Gannot, and I. Cohen, "Joint acoustic echo cancellation and transfer function GSC in the frequency domain", in The 23rd Convention of IEEE Israel (IEEEI), Herzliya, Israel, Sep. 2004, pp. 412–415.

@inproceedings{gannot2003postfilter,
title={Speech Enhancement Based on the General Transfer Function {GSC} and Postfiltering},
author={Gannot, S. and Cohen, I.},
booktitle={the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
month={Apr.},
year={2003},
address = {Hong-Kong, China}
}

copy to clipboard

audio

In speech enhancement applications microphone array postfiltering allows additional reduction of noise components at a beamformer output. Among microphone array structures the recently proposed general transfer function generalized sidelobe canceller (TF-GSC) has shown impressive noise reduction abilities in a directional noise field, while still maintaining low speech distortion. However, in a diffused noise field less significant noise reduction is obtainable. The performance is even further degraded when the noise signal is nonstationary. In this contribution we propose three postfiltering methods for improving the performance of microphone arrays. Two of which are based on single-channel speech enhancers and making use of recently proposed algorithms concatenated to the beamformer output. The third is a multichannel speech enhancer which exploits noise-only components constructed within the TF-GSC structure. This work concentrates on the assessment of the proposed postfiltering structures. An extensive experimental study, which consists of both objective and subjective evaluation in various noise fields, demonstrates the advantage of the multichannel postfiltering compared to the single-channel techniques.

T. Dvorkind and S. Gannot, "Speaker localization in a reverberant environment", in The 22nd Convention of IEEE Israel (IEEEI), Tel-Aviv University, Israel, Dec. 2002, pp. 7–9.

abstractBibTeX

BibTeX

@inproceedings{dvorkind2002speaker,
title={Speaker localization in a reverberant environment},
author={Dvorkind, T. and Gannot, S.},
booktitle={The 22nd Convention of IEEE Israel (IEEEI)},
pages={7--9},
year={2002},
month={Dec.},
address = {Tel-Aviv University, Israel}
}

copy to clipboard

The problem of speaker localization is addressed in this work. We present a novel approach for estimating the time difference of arrival (TDOA) of the speech signal to a microphone array, in a reverberant and noisy environment. By estimating acoustical transfer function (ATF) ratios, the TDOA is extracted from a relatively short impulse response. Our approach shows superior performance, compared with the traditional generalized cross correlation (GCC) method.

S. Gannot and M. Moonen, "Subspace methods for multi-microphone speech dereverberation", in The International Workshop on Acoustic Echo and Noise Control (IWAENC), Darmstadt, Germany, Sep. 2001.

BibTeX

@inproceedings{gannot2001subspace,
title={Subspace Methods for Multi-Microphone Speech Dereverberation},
author={Gannot, S. and Moonen, M.},
booktitle={The International Workshop on Acoustic Echo and Noise Control (IWAENC)},
year={2001},
month={Sep.},
address = {Darmstadt, Germany}
}

copy to clipboard

S. Gannot, D. Burshtein, and E. Weinstein, "Theoretical analysis of the general transfer function GSC", in The International Workshop on Acoustic Echo and Noise Control (IWAENC), Darmstadt, Germany, Sep. 2001.

abstractBibTeX

BibTeX

@inproceedings{gannot2001theoretical,
title={Theoretical Analysis of the General Transfer Function {GSC}},
author={Gannot, S. and Burshtein, D. and Weinstein, E.},
booktitle={The International Workshop on Acoustic Echo and Noise Control (IWAENC)},
year={2001},
month={Sep.},
address = {Darmstadt, Germany}
}

copy to clipboard

In recent work we considered the use of a microphone array located in a reverberated room where general acoustic transfer functions (ATFs) relate the source signal and the microphones for enhancing a speech signal contaminated by interference. The resulting frequency-domain algorithm enables dealing with a complicated ATF in the same simple manner as Griffiths & Jim GSC algorithm deals with delay-only arrays. In this contribution a general expression of the enhancer output is derived. This expression is used for evaluating two figures of merit, i.e., noise reduction ability and the amount of distortion imposed. The performance is shown to be dependent on the ATFs involved, the noise field and the quality of estimation of the ATF ratios. Analytical performance evaluation of the method is obtained. It is shown that the proposed method maintains its good performance even in the general ATF case.

S. Gannot, D. Burshtein, and E. Weinstein, "Beamforming methods for multi-channel speech enhancement", in The International Workshop on Acoustic Echo and Noise Control (IWAENC), Pocono Manor, Pennsylvania, USA, Sep. 1999, pp. 96–99.

BibTeX

@inproceedings{gannot1999beamforming,
title={Beamforming methods for multi-channel speech enhancement},
author={Gannot, S. and Burshtein, D. and Weinstein, E.},
booktitle={The International Workshop on Acoustic Echo and Noise Control (IWAENC)},
pages={96--99},
year={1999},
month={Sep.},
address = {Pocono Manor, Pennsylvania, USA}
}

copy to clipboard

S. Gannot and D. Burshtein, "Speech enhancement using a mixture-maximum model", in EuroSpeech, Budapest, Hungary, Sep. 1999.

BibTeX

@inproceedings{gannot1999mixmax,
title={Speech Enhancement Using a Mixture-Maximum Model},
author={Gannot, S. and Burshtein, D.},
booktitle={EuroSpeech},
year={1999},
month={Sep.},
address = {Budapest, Hungary}
}

copy to clipboard

audio

S. Gannot, D. Burshtein, and E. Weinstein, "Iterative-batch and sequential algorithms for single microphone speech enhancement", in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, Munich, Germany, 1997, pp. 1215–1218.

abstractBibTeX

BibTeX

@inproceedings{gannot1997iterative,
title={Iterative-batch and sequential algorithms for single microphone speech enhancement},
author={Gannot, S. and Burshtein, D. and Weinstein, E.},
booktitle={the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
volume={2},
pages={1215--1218},
year={1997},
address = {Munich, Germany}
}

copy to clipboard

Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing. In this paper we represent a class of Kalman-filter based speech enhancement algorithms with some extensions, modifications, and improvements. The first algorithm employs the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters. The enhanced speech signal is obtained as a by-product of the parameter estimation algorithm. The second algorithm is a sequential, computationally efficient, gradient descent algorithm. We discuss various topics concerning the practical implementation of these algorithms. Experimental study, using real speech and noise signals is provided to compare these algorithms with alternative speech enhancement algorithms, and to compare the performance of the iterative and sequential algorithms.

S. Gannot, D. Burshtein, and E. Weinstein, "Algorithms for single microphone speech enhancement", in the 19th Convention of IEEE Israel (IEEEI), Jerusalem, Israel, 1996, pp. 94–97.

BibTeX

@INPROCEEDINGS{gannot96,
author = {Gannot, S. and Burshtein, D. and Weinstein, E.},
title = {Algorithms for single microphone speech enhancement},
booktitle = {the 19th Convention of IEEE Israel (IEEEI)},
year = {1996},
pages = {94--97},
address = {Jerusalem, Israel},
}

copy to clipboard