Expressive Speech Synthesis with Tacotron

Expressive Speech Synthesis with Tacotron

  • March 28, 2018
Table of Contents

Expressive Speech Synthesis with Tacotron

At Google, we’re excited about the recent rapid progress of neural network-based text-to-speech (TTS) research. In particular, end-to-end architectures, such as the Tacotron systems we announced last year, can both simplify voice building pipelines and produce natural-sounding speech. This will help us build better human-computer interfaces, like conversational assistants, audiobook narration, news readers, or voice design software.

To deliver a truly human-like voice, however, a TTS system must learn to model prosody, the collection of expressive factors of speech, such as intonation, stress, and rhythm. Most current end-to-end systems, including Tacotron, don’t explicitly model prosody, meaning they can’t control exactly how the generated speech should sound. This may lead to monotonous-sounding speech, even when models are trained on very expressive datasets like audiobooks, which often contain character voices with significant variation.

Today, we are excited to share two new papers that address these problems.

Source: googleblog.com

Tags :
Share :
comments powered by Disqus

Related Posts

China will publicly shame jaywalkers using facial-recognition technology

China will publicly shame jaywalkers using facial-recognition technology

The AI company behind the billboards, Intellifusion, is in talks with mobile phone networks and local social media platforms to enforce the new system.

Read More
Guide to Speech Recognition with Python

Guide to Speech Recognition with Python

Far from a being a fad, the overwhelming success of speech-enabled products like Amazon Alexa has proven that some degree of speech support will be an essential aspect of household tech for the foreseeable future. If you think about it, the reasons why are pretty obvious. Incorporating speech recognition into your Python application offers a level of interactivity and accessibility that few technologies can match.

Read More