Phrase Break Prediction for Long-form Reading TTS: Exploiting Text Structure Information

Viacheslav Klimkov; Adam Nadolski; Alexis Moinet; Bartosz Putrycz; Roberto Barra-Chicote; Tom Merritt; Thomas Drugman

Publication

Phrase Break Prediction for Long-form Reading TTS: Exploiting Text Structure Information

By Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Tom Merritt, Thomas Drugman

2018

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Phrasing structure is one of the most important factors in increasing the naturalness of text-to-speech (TTS) systems, in particular for long-form reading. Most existing TTS systems are optimized for isolated short sentences, and completely discard the larger context or structure of the text.

This paper presents how we have built phrasing models based on data extracted from audiobooks. We investigate how various types of textual features can improve phrase break prediction: part-of-speech (POS), guess POS (GPOS), dependency tree features and word embeddings. These features are fed into a bidirectional LSTM or a CART baseline. The resulting systems are compared using both objective and subjective evaluations. Using BiLSTM and word embeddings proves to be beneficial.

Phrase Break Prediction for Long-form Reading TTS: Exploiting Text Structure Information

Latest news

Work with us