Scoring-Based ML Estimation and CRBs for Reverberation, Speech and Noise PSDs in a Spatially Homogeneous Noise-Field

Yaron Laufer and Sharon Gannot, “Scoring-Based ML Estimation and CRBs for Reverberation, Speech and Noise PSDs in a Spatially Homogeneous Noise-Field,” submitted to IEEE Transactions on Audio, Speech and Language Processing, July 2019.

In our experimental study, we consider two different real-life acoustic scenarios, recorded at the acoustic lab of the Engineering Faculty at Bar-Ilan University. Both scenarios consist of measured RIRs and recorded noises:

Acoustic Scenario 1: The room panels were adjusted to create reverberation level of T₆₀ = 400 msec. RIRs were identified by playing a periodic chirp signal from a HATS mannequin. More details on the setup can be found in [1]. For the additive noise, an air-conditioner noise was recorded under the same conditions.

Acoustic Scenario 2: RIRs were downloaded from the RIR database [2], recorded at the same acoustic lab. The reverberation time was set to T₆₀ = 610 msec. For the additive noise, babble noise signals were recorded separately.

In both cases, the noisy and reverberant signals were constructed by convolving clean speech utterances (from the TIMIT database) with the corresponding RIR, and then adding the noise with several reverberant signal-to noise ratio (RSNR) levels.

Based on the proposed non-blocking-based and blocking-based ML estimators of the various PSDs, a multichannel Wiener filter (MCWF) is constructed, aiming to enhance the reverberant and noisy speech. Two versions for computing the MCWF are examined: i) The direct implementation, denoted henceforth as Dir; and ii) the decision-directed implementation, which will be referred to as DD.

Acoustic Scenario 1:

Rev [ms] =400.

source: FAJW0_SX363, RSNR [dB]: 0

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

source: MARC0_SX378, RSNR [dB]: 5

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

source: MBJV0_SI124, RSNR [dB]: 5

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

source: FDXW0_SX161, RSNR [dB]: 10

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

Acoustic Scenario 2:

Rev [ms] =610.

source: FEAC0_SX255, RSNR [dB]: 5

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

source: MDBP0_SX348, RSNR [dB]: 5

Clean 0:04 Noisy 0:04 Non-blocking ML Dir 0:04 Non-blocking ML DD 0:04 Blocking ML Dir 0:04 Blocking ML DD 0:04

source: FAEM0_SA1, RSNR [dB]: 10

Clean 0:03 Noisy 0:03 Non-blocking ML Dir 0:03 Non-blocking ML DD 0:03 Blocking ML Dir 0:03 Blocking ML DD 0:03

source: MCTM0_SX90, RSNR [dB]: 10

Clean 0:02 Noisy 0:02 Non-blocking ML Dir 0:02 Non-blocking ML DD 0:02 Blocking ML Dir 0:02 Blocking ML DD 0:02

References:

[1] Y. Laufer and S. Gannot, “A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel,” IEEE Trans. Audio, Speech and Language Processing, vol. 27, no. 1, pp. 225–239, Jan. 2019.

[2] E. Hadad, F. Heese, P. Vary, and S. Gannot, “Multichannel audio database in various acoustic environments,” in International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 313–317.