Accurate Online Speaker Diarization with Supervised Learning

Accurate Online Speaker Diarization with Supervised Learning

  • November 14, 2018
Table of Contents

Accurate Online Speaker Diarization with Supervised Learning

Speaker diarization, the process of partitioning an audio stream with multiple people into homogeneous segments associated with each individual, is an important part of speech recognition systems. By solving the problem of “who spoke when”, speaker diarization has applications in many important scenarios, such as understanding medical conversations, video captioning and more. However, training these systems with supervised learning methods is challenging — unlike standard supervised classification tasks, a robust diarization model requires the ability to associate new individuals with distinct speech segments that weren’t involved in training.

Importantly, this limits the quality of both online and offline diarization systems. Online systems usually suffer more, since they require diarization results in real time.

Source: googleblog.com

Tags :
Share :
comments powered by Disqus

Related Posts

Learning Concepts with Energy Functions

Learning Concepts with Energy Functions

We’ve developed an energy-based model that can quickly learn to identify and generate instances of concepts, such as near, above, between, closest, and furthest, expressed as sets of 2d points. Our model learns these concepts after only five demonstrations.

Read More
20 Best YouTube channels for AI and machine learning

20 Best YouTube channels for AI and machine learning

What are the most interesting and informative YouTube channels about artificial intelligence (AI) and machine learning? Subscribe to these 20 high-quality channels today to stay up to date with the latest AI and machine learning breakthroughs. Siraj Raval:

Read More