In this study, we propose a method for constructing a Lhasa Tibetan prosodic lexicon based on a continuous speech database, which leads to significant improvements in speech synthesis performance for low-resource and complex languages. The experiment begins by utilizing a 3.95-hour speech database of a Lhasa Tibetan speaker, focusing on the prosodic feature of “tone sandhi” to investigate the phonological features and grammatical functions of Lhasa Tibetan. Drawing inspiration from the “Usage-Based Theory” in cognitive linguistics, we extract prefabs (prefabricated chunks) from 2,526 utterances. According to the prosodic features and grammatical structure of these prefabs, we construct a Prefabs Lexicon consisting of 175 thousand entries. In the comparative experiment, we employ a sequence-to-sequence speech synthesis approach and automatically segment the input sequence using both the Prefabs Lexicon and the conventional Tibetan lexicon. To evaluate the performance, a 56-minute dataset from another professional Lhasa broadcaster is used as a test set. Compared to the conventional Tibetan lexicon, the Prefabs Lexicon achieves an improved 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 of 0.92. Additionally, in the synthesis experiment for the toneless Amdo Tibetan, the Mean Opinion Score (MOS) increases to 4.17, indicating the universal applicability of the Prefabs Lexicon across dialects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.