Paper
10 September 2007 Variable frame rate analysis for automatic speech recognition
Author Affiliations +
Proceedings Volume 6777, Multimedia Systems and Applications X; 67770G (2007) https://doi.org/10.1117/12.734890
Event: Optics East, 2007, Boston, MA, United States
Abstract
In this paper we investigate the use of variable frame rate (VFR) analysis in automatic speech recognition (ASR). First, we review VFR technique and analyze its behavior. It is experimentally shown that VFR improves ASR performance for signals with low signal-to-noise ratios since it generates improved acoustic models and substantially reduces insertion and substitution errors although it may increase deletion errors. It is also underlined that the match between the average frame rate and the number of hidden Markov model states is critical in implementing VFR. Secondly, we analyze an effective VFR method that uses a cumulative, weighted cepstral-distance criterion for frame selection and present a revision for it. Lastly, the revised VFR method is combined with spectral- and cepstral-domain enhancement methods including the minimum statistics noise estimation (MSNE) based spectral subtraction and the cepstral mean subtraction, variance normalization and ARMA filtering (MVA) process. Experiments on the Aurora 2 database justify that VFR is highly complementary to the enhancement methods. Enhancement of speech both facilitates the frame selection in VFR and provides de-noised speech for recognition.
© (2007) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zheng-Hua Tan "Variable frame rate analysis for automatic speech recognition", Proc. SPIE 6777, Multimedia Systems and Applications X, 67770G (10 September 2007); https://doi.org/10.1117/12.734890
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Signal to noise ratio

Speech recognition

Databases

Acoustics

Feature extraction

Statistical analysis

Distance measurement

Back to Top