De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM

Sébastien Le Maguer; Bernd Möbius; Ingmar Steiner; Damien Lolive

In: Proceedings Journées d'Études sur la Parole. Journées d'Études sur la Parole (JEP), July 4-8, Paris, France, CNRS, 2016.


Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. This paper focuses on an objective analysis of the influence of these descriptive features on the synthesis achieved in English and French.


Weitere Links

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz