A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel

Yaron Laufer and Sharon Gannot, “A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel,” submitted to IEEE Transactions on Audio, Speech and Language Processing, Jun. 2018, revised Sep. 2018.

In our experimental study, we consider both simulated and real room environments:

  1. Simulation: Clean speech signals (from the TIMIT database) were convolved with room impulse responses from the RIR database [1] recorded in the acoustic lab at Bar-Ilan University. The microphone signals were synthesized by adding an artificial noise signal to the reverberant speech signal.
  2. Real-life Experiment: Real recordings of real-life speakers were carried out at the same acoustic lab, in static and dynamic scenarios. Noise signals were recorded separately and then were added to the measured speech signals.

The performance of the proposed algorithm was compared with the following algorithms:

1) The baseline method in [2];  2) our previous algorithm in [3], which assumes a spatially homogeneous and spherically diffuse noise sound field. This diffuse-based method, which estimates only the power of the noise, will be referred to as Prop. (diff.). The method proposed in this paper, which estimates the entire noise precision matrix, is denoted henceforth as Prop. (mat.).

Disclaimer:

The implementation of the baseline algorithm [2] was made for comparison by the authors of the proposed algorithm and on their own responsibility.

References:

  1. E. Hadad, F. Heese, P. Vary, and S. Gannot, “Multichannel audio database in various acoustic environments,” in International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 313–317.
  2. S. Malik, J. Benesty, and J. Chen, “A Bayesian framework for blind adaptive beamforming,” IEEE Tran. on Signal Processing, vol. 62, no. 9, pp. 2370–2384, 2014.
  3.  Y. Laufer and S. Gannot, “A Bayesian hierarchical model for speech enhancement,” in IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, Apr. 2018.

Simulation

Source: MCTM0_SA1, Rev [ms]: 160, Noise: Directional, SNR [dB]: 5
Source: MDBP0_SX348, Rev [ms]: 360, Noise: Directional, SNR [dB]: 5
Source: FDXW0_SX161, Rev [ms]: 360, Noise: Diffuse, SNR [dB]: 10
Source: FECD0_SI1418, Rev [ms]: 160, Noise: Diffuse, SNR [dB]: 5

Real-life experiment

a. Static scenario

Source: Female_s1, Rev [ms]: 200, Noise: AirCond, SNR [dB]: 5
Source: Female_s2, Rev [ms]: 400, Noise: Directional, SNR [dB]: 0
Source: Male, Rev [ms]: 200, Noise: Directional, SNR [dB]: 5

b. Dynamic scenario

Source: Moving_Female, Rev [ms]: 200, Noise: Directional, SNR [dB]: 5
Source: Moving_Male, Rev [ms]: 200, Noise: AirCond, SNR [dB]: 5