Current methods for monitoring laryngeal muscle function include techniques such as intramuscular electromyography, external laryngeal palpation, and laryngeal endoscopy. Although these methods have provided much information about muscle activation and function during voice production, they are invasive, uncomfortable, and subjective. The objective of this work is to explore the use of the high-density electromyography (HDsEMG) as a non-invasive alternative that can potentially provide objective information on the activity of the laryngeal muscles during speech. with a focus on the cricothyroid muscle (CT). From this set of signals, it is possible to decompose the electromyography signal, providing indirect information on the spatial recruitment and firing rates of motor units (MU) within the muscle. It is hypothesized that the use of MU firing rate and recruitment will allow for a better estimation of muscle activation compared to traditional methods. A high-density wireless HDsEMG equipment (Sessantaquattro, OT Bioelettronica) is used with a 64-channel electrode grid, which are centered on the CT muscle. Preliminary results of a case study illustrated that it was possible to obtain the rates and firing trains of 4 motor units. Future work will explore how HDsEMG applied to the larynx has the potential to improve diagnostic and therapeutic follow-up of pathologies of laryngeal function.
Phonetically balanced texts are used to study different voice and speech characteristics. In the context of clinical work and research, these texts provide a standard for quantifying perceptual, acoustic, or aerodynamic assessments. Recent modeling efforts are being devoted to describing long-term speech behaviors based on a collection of sustained phonemes. However, comprehensive descriptions of phoneme distributions representative of connected speech are not readily available. Thus, the present study introduces a method to estimate phoneme distributions using text data mining, as an alternative to existing power law methods. The procedure used for the decomposition of texts into phonemes, the estimation of the phonetic distributions and the comparisons between different texts, conversational speech, and standard reading passages are discussed. The results are presented using histograms and R-squared determination coefficients for the case of the English language, although the approach can be easily applied for other languages. A discussion of the proposed method, results, and limitations is presented.
Bayesian estimation based on an extended Kalman filter and a muscle-controlled biomechanical model of the vocal folds has demonstrated a viable, appealing method for obtaining physiologically relevant measures of glottal function (e.g., subglottal pressure, activation of laryngeal muscles, and vocal fold collision pressure) from synchronized recordings of calibrated high-speed videoendoscopy and oral airflow. In practice, the simultaneous recording of these signals is a cumbersome, uncomfortable procedure for patients. To simplify the experimental procedure while maintaining sufficient accuracy, we introduce a constrained Bayesian scheme that suitably incorporates physiological information about the subglottal pressure and uses the glottal area waveform as the sole observation. The proposed constrained extended Kalman filter produces reliable glottal airflow estimates compared to in-lab obtained waveforms, yielding root-mean-square errors lower than 63 mL/s, a performance similar to previous studies. These initial results provide advancement for the clinical assessment of vocal function.
Voice inverse filtering analysis comprises different methods for the non-invasive estimation of glottal airflow from a speech signal, thus bringing forth relevant information about the vocal function and acoustic excitation during voiced phonation. Most inverse filtering strategies consider a parametric source-filter model of phonation and variants of linear prediction to adjust the model coefficients. However, classical linear prediction is susceptible to impulse-like acoustic excitations produced by abrupt glottal closures. Robust alternatives have been proposed that apply a time-domain weighting function to de-emphasize the detrimental contribution of the impulse-like glottal events. The present study introduces the maximum correntropy criterion-based linear prediction for voice inverse filtering. This method takes advantage of the correntropy –a non-linear localized similarity measure inherently insensitive to outliers– to implement a robust weighted linear prediction, where the weighting function is adjusted iteratively through a speech-data-guided optimization scheme. Simulations show that the proposed method naturally overweights samples in the glottal closed phase, where the phonation model is more accurate, without being necessary any prior information about the closure instants. It is further shown that maximum correntropy criterion-based linear prediction improves inverse filtering analysis in terms of the smoothness of estimated glottal waveforms, and the spectral relevance of the vocal tract filter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.