I am involved in four exciting research projects. I am looking for talented and committed Ph.D. students and/or Post-Doctoral trainees with strong background in statistical signal processing and machine learning. Experience in audio signal processing and high programming skills will be considered as an advantage. If interested, please contact me.
Please see our sponsorship.
Develop a novel paradigm and novel concept of socially-aware robots, and to conceive innovative methods and algorithms for computer vision, audio processing, sensor-based control, and spoken dialog systems based on modern statistical- and deep-learning to ground the required social robot skills.
Create and launch a brand new generation of robots that are flexible enough to adapt to the needs of the users, and not the other way around.
Validate the technology based on HRI experiments in a gerontology hospital, and to assess its acceptability by patients and medical staff.
Modern day environments are laden with rich stimuli all competing for our attention, a reality that poses substantial challenges for the perceptual system. Focusing attention exclusively on one important speaker and avoiding distraction is a major feat, in particular for individuals with hearing or attentional impairment.
In this project, we propose a unique combination of methodologies from signal processing, machine learning and brain research disciplines that can jointly develop novel algorithms and evaluation procedures capable of extracting and enhancing a desired speaker in adverse acoustic scenarios. Specifically, we harness the power of deep neural networks (DNNs) for audio-processing classification tasks, to develop approaches for training a DNN with multi-microphone speech recordings, and to overcome the complex nature of dealing with natural speech data with its inherent dynamic nature. Moreover, determining which speaker should be selectively enhanced will be guided automatically by the user’s momentary internal preferences using a real-time EEG-based neural interface. The novel and unique neuro-engineering approach is critical for developing stable technological solutions meant for human use and that need to adhere to real-life behavioural and environmental constraints, and can be applied more broadly in the design of new-generation “attentive” hearing devices, “hearables” and teleconferencing systems.
This project deals with developing single- and multi-microphone algorithms for speech enhancement in adverse conditions. Special concern will be given to hardware constraints and to real-time requirements.
As rich datasets for visual and auditory modalities accumulate, the field of machine perception progresses towards the challenge of understanding rich scenes that involve multiple co-interacting entities. Going beyond low-level perception, semantic understanding of rich-scenes is viewed as the next frontier of machine perception.
To evaluate understanding of rich scenes, one often aims at high-level decisions and reasoning about the scene. Specifically, research on language grounding, aims to connect high-level
representations in plain language referring to perceived objects and situations. As one concrete important example, in the task of referring expressions, a model is trained to detect an object of interest, based on its visual properties. As an example, a deep network can find a “boy wearing a red hat” in an image of several boys.
We propose to advance the state of the art in this domain by generating perception signals that are more natural in two ways. First, they refer to dynamic scenes (videos) where objects can move around. Second, objects are also referred to by the sounds they emit, forming a unified audio-visual reasoning dataset.
Two main challenges are addressed in this project: