Open Access Open Access  Restricted Access Subscription or Fee Access

HMM-Based Text-to-Speech Synthesis & Stressed Speech Processing

Pradnya Prakash Wagh, Swati Warungase, Neha Ashok Aringale, Nadeem Bhaimiya Shaikh


According to this paper, the new system produces synthetic speech that is noticeably higher-quality than
speaker-dependent systems when actual speech data sets are used, and it can compete with speaker-dependent approaches
even in situations when substantial speech data sets are available. This excitation signal, the glottal source, has naturally
piqued the interest of speech synthesis, and a variety of techniques have been developed to mimic the glottal source of
spontaneous speech.The use of artificial models for the glottal source has improved the synthesis's quality. However, the
current models also oversimplify the glottal source, which has resulted in inadequate synthesis quality. Using glottal inverse
filtering to recover glottal flow pulses from natural speech has been proposed as a solution to problems arising from
simplistic glottal source models. However, previous work with glottal flow pulses extracted from real speech is limited to
certain applications, such as vowel isolation, and the benefits of combining automatic glottal inverse filtering with an HMM-
based speech synthesizer have not been explored. Furthermore, a comparative analysis using many speech synthesis
methods demonstrates how reliable the new approach is: Even for sentences that are outside of its area, it can create voices
from less-than-ideal speech data and synthesize high-quality speech.

Full Text:



. Yamagishi et al., "Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis,"

in IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1208-1230,

Aug. 2009, doi: 10.1109/TASL.2009.2016394.

S. Shukla, S. R. M. Prasanna and S. Dandapat, "Stressed speech processing: Human vs

automatic in non-professional speakers scenario," 2011 National Conference on Communications

(NCC), Bangalore, India, 2011, pp. 1-5, doi: 10.1109/NCC.2011.5734704.

H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM-based

speech synthesis system for the Blizzard Challenge 2005,” IEICE Trans. Inf. Syst., vol. E90-D,

no. 1, pp. 325–333, Jan.2007

T. Toda and K. Tokuda, “A speech parameter generation algorithm considering global

variance for HMM- based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90- D, no. 5, pp.

–824, May 2007.

J. Carabias-Orti, P. Vera-Candeas, F. J. Canadas-Quesada and N. Ruiz-Reyes, "Music

Scene-Adaptive Harmonic Dictionary for Unsupervised Note-Event Detection," in IEEE Transactions

on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 473-486, March 2010, doi:


K. Yu and S. Young, "Continuous F0 Modeling for HMM Based Statistical Parametric Speech

Synthesis," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp.

-1079, July 2011, doi: 10.1109/TASL.2010.2076805.

P. Lanchantin, G. Degottex and X. Rodet, "A HMM-based speech synthesis system using a

new glottal source and vocal-tract separation method," 2010 IEEE International Conference on

Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 4630-4633, doi:


Raitio T. Hidden Markov model based Finnish text-to-speech system utilizing glottal inverse

filtering. Master's thesis, Helsinki University of Technology. 2008 May 30.

Dutoit T. High-quality text-to-speech synthesis: An overview. Journal Of Electrical And

Electronics Engineering Australia. 1997 Mar;17(1):25-36.

Macchi M. Issues in text-to-speech synthesis. InProceedings. IEEE International Joint

Symposia on Intelligence and Systems (Cat. No. 98EX174) 1998 May 23 (pp. 318-325). IEEE.

Ungurean C, Burileanu D. An advanced NLP framework for high-quality Text-to-Speech

synthesis. In2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD)

May 18 (pp. 1-6). IEEE.



  • There are currently no refbacks.

Copyright (c) 2024 Recent Trends in Electronics and Communication Systems