Looking to Listen: Audio-Visual Speech Separation

Looking to Listen: Audio-Visual Speech Separation

  • April 12, 2018
Table of Contents

Looking to Listen: Audio-Visual Speech Separation

People are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds. Known as the cocktail party effect, this capability comes natural to us humans. However, automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.

Source: googleblog.com

Tags :
Share :
comments powered by Disqus

Related Posts

Towards a Virtual Stuntman

Towards a Virtual Stuntman

Motion control problems have become standard benchmarks for reinforcement learning, and deep RL methods have been shown to be effective for a diverse suite of tasks ranging from manipulation to locomotion. However, characters trained with deep RL often exhibit unnatural behaviours, bearing artifacts such as jittering, asymmetric gaits, and excessive movement of limbs. Can we train our characters to produce more natural behaviours?

Read More
DeepMarks: A Digital Fingerprinting Framework for Deep Neural Networks

DeepMarks: A Digital Fingerprinting Framework for Deep Neural Networks

DeepMarks introduces the first fingerprinting methodology that enables the model owner to embed unique fingerprints within the parameters (weights) of her model and later identify undesired usages of her distributed models. The proposed framework embeds the fingerprints in the Probability Density Function (pdf) of trainable weights by leveraging the extra capacity available in contemporary DL models. DeepMarks is robust against fingerprints collusion as well as network transformation attacks, including model compression and model fine-tuning.

Read More