Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model

Koby Weisberg, Bracha Laufer-Goldshtein and Sharon Gannot

Simulation experiment:

For the simulated data, clean anechoic speech signals were drawn from the TIMIT database. The speakers were randomly selected from a subset of 26 speakers. Speech utterances of the same speaker were concatenated to obtain a 5s long speech signal. The room dimensions were set to 6 x 4 x 3 m with reverberation time T60 ~ 200ms. The signals were captured by an eight-microphone linear array with inter-distances of [3, 3, 3, 8, 3, 3, 3] cm. The measured signals were contaminated by an additive babble diffuse noise with various SNR levels. Three moving speakers were simulated, with initial DOA set to 36, 90 and 144 degrees, respectively. The speakers moved from their initial positions along an arc of a circle with a radius of 1m from the array center. Their time-varying DOA has a sinusoidal form, with time period randomly selected between 1-2.5s, and amplitude also randomly selected between 5-8 degrees. We present audio samples from 3 random mixtures, with SNR of 5dB and 25dB.

Lab experiment:

In our experimental study, we evaluated the performance on real-life recordings carried out at the BIU acoustic lab. We first defined two limited arcs on a circle with radius of  ~2m: the first arc between 20-75 degrees and the other between120-165 degrees. Seven speakers participated in our experiment, five males and two females. Each speaker moved back and forth while speaking with a natural random trajectory on each of the defined arcs. The length of each recording was approximately 30s.  The signals were captured by an eight-microphone linear array with inter-distances of [3, 3, 3, 6, 3, 3, 3] cm. The array was located in the center of the designated circle, in a distance of approximately 1.5 meters from one of the walls. The reverberation time was set to T60 ~ 450 ms by adjusting the controllable room panels. A diffuse babble noise was also separately recorded by the same array using 4 loudspeakers facing the room corners. We present audio samples from 5 mixtures, contains all different participates. with SNR of 5dB and 25dB.

Subject 3 and subject 1:

Subject 2 and subject 7:

Subject 2 and subject 6:

Subject 1 and subject 4:

Subject 1 and subject 5: