A highly efficient, real-time text to speech system deployed on CPUs

A highly efficient, real-time text to speech system deployed on CPUs

  • May 16, 2020
Table of Contents

A highly efficient, real-time text to speech system deployed on CPUs

Modern text-to-speech (TTS) systems have come a long way in using neural networks to mimic the nuances of human voice. To generate humanlike audio, one second of speech can require a TTS system to output as many as 24,000 samples — sometimes even more. The size and complexity of state-of-the-art models require massive computation, which often needs to run on GPUs or other specialized hardware.

At Facebook, our long-term goal is to deliver high-quality, efficient voices to the billions of people in our community. In order to achieve this, we’ve built and deployed a neural TTS system with state-of-the-art audio quality. With strong engineering and extensive model optimization, we have attained a 160x speedup over our baseline while retaining state-of-the-art audio quality, which enables the whole service to be hosted in real time using regular CPUs — without any specialized hardware.

The system is highly flexible and will play an important role in creating and scaling new voice applications that sound more human and expressive and are more enjoyable to use. It’s currently powering Portal, our video-calling device, and it’s available as a service for other applications, like reading assistance and virtual reality. Today, we’re sharing details on our approach and how we solved core efficiency challenges to deploy this at scale.

Source: facebook.com

Tags :
Share :
comments powered by Disqus

Related Posts

The Best NLP Papers From ICLR 2020

The Best NLP Papers From ICLR 2020

I went through 687 papers that were accepted to ICLR 2020 virtual conference (out of 2594 submitted – up 63% since 2019!) and identified 9 papers with the potential to advance the use of deep learning NLP models in everyday use cases.

Read More
A state-of-the-art open source chatbot

A state-of-the-art open source chatbot

Facebook AI has built and open-sourced Blender, the largest-ever open-domain chatbot. It outperforms others in terms of engagement and also feels more human, according to human evaluators. The culmination of years of research in conversational AI, this is the first chatbot to blend a diverse set of conversational skills — including empathy, knowledge, and personality — together in one system.

Read More