Teaching AI to learn speech the way children do
A collaboration between the Facebook AI Research (FAIR) group and the Paris Sciences & Lettres University, with additional sponsorship from Microsoft Research, to challenge other researchers to teach AI systems to learn speech in a way that more closely resembles how young children learn. The ZeroSpeech 2019 challenge (which builds on previous efforts in 2015 and 2017) asks participants to build a speech synthesizer using only audio input, without any text or phonetic labels. The challenge’s central task is to build an AI system that can discover, in an unknown language, the machine equivalent of text of phonetic labels and use them to re-synthesize a sentence in a given voice.
Essentially, the system must discover its own discrete “orthographic” notation, which may or may not correspond to linguistically defined subword units like consonants, vowels, and syllables. Participants are provided with raw audio, as well as a baseline system with one component that performs subword discovery and another for speech synthesis. Participants can either replace the baseline with a new end-to-end system or improve one of the baseline’s components in order to generate a higher-quality waveform.
Entries will be evaluated based on the bit rate of the discovered set of labels and the overall waveform quality. Submissions are due March 15. Teams with the top-scoring or most innovative papers will be selected for presentation at the Interspeech conference in September.