AI Headphones That Focus on a ‘Single Speaker’ in a Crowd

Engineers Develop Groundbreaking AI Hearing System

Engineers have created an innovative artificial intelligence system, named ‘Target Speech Hearing’, that empowers headphone users to focus on a single speaker in a crowded and noisy environment. By simply looking at the person speaking for three to five seconds, the system ‘enrols’ the speaker’s voice. After enrolment, the system isolates and plays back only that speaker’s voice in real-time, even as the listener and speaker move around.

Advancements in Noise-Canceling Technology

Traditional noise-canceling headphones are effective at creating a quiet listening experience by blocking out background noise. However, they often struggle to selectively allow important sounds through. For instance, the latest Apple AirPods Pro can adjust sound levels when the wearer is in a conversation but lack the precision to let the user choose exactly who to listen to and when.

The findings were presented on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The proof-of-concept device’s code is available for further development, though the system itself is not yet commercially available.

Innovative Use of AI in Hearing

“We tend to think of AI now as web-based chatbots that answer questions,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices, you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”

How It Works

Users wearing modified off-the-shelf headphones with integrated microphones can activate the system by pressing a button while facing the speaker. The microphones capture the sound waves, and the system processes these signals to identify and learn the speaker’s vocal patterns. The AI then isolates this voice and continually enhances its clarity, even as the speaker and listener move around.

User Testing and Future Developments

In tests with 21 subjects, the clarity of the enrolled speaker’s voice was rated nearly twice as high as unfiltered audio. The system can currently enrol only one speaker at a time and requires a relatively quiet background when enrolling a new speaker. If sound quality is unsatisfactory, users can re-enrol the speaker for improved clarity.

The team aims to expand this technology to earbuds and hearing aids, enhancing accessibility and convenience.

At Harley Street Hearing we keep ahead of everything new in hearing technology. If you would like to come in for a hearing consultation contact us.

Enjoy this article? You might be interested in some of our others:

Discover the Power of Invisible Hearing Aids – Lyric

Breakthrough Gene Therapy for Inherited Deafness

Importance of Hearing Aids Fitted by a Professional

Research Support, Collaboration and Reference

This research involved contributions from Bandhav Veluri, Malek Itani, Tuochao Chen (UW doctoral students), and Takuya Yoshioka (director of research at AssemblyAI). It was funded by a Moore Inventor Fellow award, a Thomas J. Cable Endowed Professorship, and a UW CoMotion Innovation Gap Fund.

Veluri, B., Itani, M., Chen, T., Yoshioka, T., Gollakota, S. “Look Once to Hear: Target Speech Hearing with Noisy Examples.” ACM CHI Conference on Human Factors in Computing Systems, 2024. DOI: 10.1145/3613904.3642057