Automatic detection of hypernasal speech of children with cleft lip and palate from spanish vowels and words using classical measures and nonlinear analysis
Keywords:Automatic hypernasality detection, cleft lip and palate, perturbation measures, noise measures, nonlinear dynamics
This paper presents a system for the automatic detection of hypernasal speech signals based on the combination of two different characterization approaches applied to the five spanish vowels and two selected words. The first approach is based on classical features such as pitch period perturbations, noise measures, and Mel-Frequency Cepstral Coefficients (MFCC). The second approach is based on the Non-Linear Dynamics (NLD) analysis. The most relevant features are selected and sorted using two techniques: Principal Components Analysis (PCA) and Sequential Forward Floating Selection (SFFS). The decision about whether a voice record is hypernasal or healthy is taken using a Soft Margin - Support Vector Machine (SM-SVM). Experiments upon recordings of the five Spanish vowels and the words are performed considering three different set of features: (1) the classical approach, (2) the NLD analysis, and (3) the combination of the classical and NLD measures. In general, the accuracies are higher and more stable when the classical and NLD features are combined, indicating that the NLD analysis is complementary to the classical approach.
J. Arias, J. Godino, N. Sáenz, V. Osma and G. J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “An improved method for voice pathology detection by means of a HMM-based feature space transformation”, Pattern Recognition, vol. 43, no. 9, pp. 3100-3112, 2010.
T. Yun, W. Ching and L. Guo, “Voice low tone to high tone ratio, nasalance, and nasality ratings in connected speech of native Mandarin speakers: a pilot study”, The Cleft Lip and Palate Journal, vol. 49, no. 4, pp. 437-446, 2012.
A. Kummer, Cleft palate and craniofacial anomalies: effects on speech and resonance, 2nd ed. Cincinnati, USA: Cengage Learning, 2007.
A. Kummer and L. Lee, “Evaluation and Treatment of Resonance Disorders”, Language, Speech, and Hearing Services in Schools, vol. 27, pp. 271-281, 1996.
P. Vijayalakshmi, M. Reddy and D. O’Shaughnessy, “Acoustic analysis and detection of hypernasality using a group delay function”, IEEE Transactions on Biomedical Engineering, vol. 54, no. 4, pp. 621-629, 2007.
L. He et al., “Automatic evaluation of hypernasality based on a cleft palate speech database”, Journal of medical systems, vol. 39, no. 5, pp. 1-7, 2015.
K. Golding, “Therapy techniques for cleft palate speech and related disorders”, 1st ed. New York, USA: Singular Thomson Learning, 2001.
J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short- term cepstral parameters”, IEEE Transactions on Biomedical Engineering, vol. 53, no. 10, pp. 1943-1953, 2006.
A. Maier, F. Hönig, C. Hacker, M. Schuster and E. Nöth, “Automatic evaluation of characteristic speech disorders in children with cleft lip and palate”, in 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Australia, 2008, pp. 1757-1760.
A. Giovanni et al., “Nonlinear behavior of vocal fold vibration: the role of coupling between the vocal folds”, Journal of Voice, vol. 13, no. 4, pp. 465-476, 1999.
J. Orozco et al., “Automatic selection of acoustic and non-linear dynamic features in voice signals for hypernasality detection”, in 12th Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, 2011, pp. 529-532.
H. Kantz and T. Schreiber, Nonlinear time series analysis, 2nd ed. Cambridge, UK: Cambridge University Press, 2004.
N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.
H. Wertzner, S. Schreiber and L. Amaro, “Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders”, Rev. Bras. Otorrinolaringol., vol. 71, no. 5, pp. 582-588, 2005.
L. Guo, W. Ching and S. Fu, “Evaluation of hypernasality in vowels using voice low tone to high tone ratio”, Cleft Palate Craniofacial Journal, vol. 46, no. 1, pp. 47-52, 2009.
B. Boyanov and S. Hadjitodorov, “Acoustic analysis of pathological voices: A voice analysis system for the screening of laryngeal diseases”, IEEE Engineering in Medicine and Biology, vol. 16, no. 4, pp. 74-82, 1997.
K. Shama, A. Krishna and N. Cholayya N, “Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology”, EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. 1-9, 2007.
E. Yumoto, W. Gould and T. Baer, “Harmonics-to-noise ratio as an index of the degree of hoarseness”, Journal of the Acoustical Society of America, vol. 71, no. 6, pp. 1544-1550, 1982.
G. de Krom, “A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals”, Journal of Speech, Language and Hearing Research, vol. 36, no. 2, pp. 254-266, 1993.
P. Murphy and O. Akande, “Cepstrum-based Harmonics to Noise Ratio Measurement in voiced speech”, Lecture Notes in Artificial Intelligence, vol. 3445, pp. 199-218, 2005.
H. Kasuya, S. Ogawa, K. Mashima and S. Ebihara, “Normalized noise energy as an acoustic measure to evaluate pathologic voice”, Journal of the Acoustical Society of America, vol. 80, no. 5, pp. 1329-1334, 1986.
D. Michaelis, T. Gramss and H. Strube, “Glottal-to- Noise Excitation Ratio - A new measure for describing pathological voices”, Acta Acust. united Ac., vol. 83, no. 4, pp. 700-706, 1997.
J. Godino et al., “The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders”, Journal of Voice, vol. 24, no. 1, pp. 47-56, 2010.
S. Bou and J. Hansen, “A comparative study of traditional and newly proposed features for recognition of speech under stress”, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 4, pp. 429-442, 2000.
J. Jiang, Y. Zhang and C. McGilligan, “Chaos in voice, from modeling to measurement”, Journal of Voice, vol. 20, no. 1, pp. 2-17, 2006.
F. Takens, “Detecting strange attractors in turbulence”, Lecture Notes in Mathematics, vol. 898, pp. 366-381, 1981.
P. Henriquez et al., “Characterization of healthy and pathological voice through measures based on nonlinear dynamics”, IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 6, pp. 1186-1195, 2009.
A. Shaheen, N. Roy and J. Jiang, “Nonlinear dynamic analysis of disordered voice: the relationship between the correlation dimension (D2) and pre-/post- treatment change in perceived dysphonia severity”, Journal of Voice, vol. 24, no. 3, pp. 285-293, 2010.
J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel- cepstral coefficients”, IEEE Transactions on Biomedical Engineering, vol. 58, no. 2, pp. 370-379, 2011.
P. Grassberger and I. Procaccia, “Measuring the strangeness of strange attractors”, Physica D, vol. 9, no. 1-2, pp. 189-208, 1983.
H. Abarbanel, Analysis of observed chaotic data, 1st ed. New York, USA: Institute of Nonlinear Science, 1999.
M. Rosenstein, J. Collins and C. De Luca, “A practical method for calculating largest Lyapunov exponents from small data sets”, Physica D, vol. 65, no. 1-2, pp. 117-134, 1993.
V. Oseledec, “A multiplicative ergodic theorem. Lyapunov characteristic numbers for dynamical systems”, Transactions of Moscow Mathematic Society, vol. 19, pp. 197-231, 1968.
H. Hurst, R. Black and Y. Simaika, Long-term storage: an experimental study, 1st ed. London, UK: Constable, 1965.
F. Kaspar and H. Shuster, “Easily calculable measure for complexity of spatiotemporal patterns”, A Physical Review, vol. 36, no. 2, pp. 842-848, 1987.
Jolliffe, Principal Component Analysis, 2nd ed. New York, USA: Springer, 2002.
R. Bro and A. Smilde, “Principal component analysis”, Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014.
P. Pudil, J. Novovicova and J. Kittler, “Floating search methods in feature selection”, Pattern Recognition Letters, vol. 15, no. 11, pp. 1119-1125, 1994.
P. Pudil, F. Ferri, J. Novovicova and J. Kittler, “Floating search methods for feature selection with nonmonotonic criterion functions”, in 12th IAPR International Conference on Pattern Recognition, Jerusalem, Israel, 1994, pp. 279-283.
B. Scholköpf and A. Smola, Learning with Kernels, 1st ed. Cambridge, USA: The MIT Press, 2002.
D. Kuehn and K. Moller, “Speech and language issues in the cleft palate population: the state of the art”, Cleft Palate-Craniofacial Journal, vol. 37, no. 4, pp. 1-35, 2000.
R. Carvajal, N. Wessel, M. Vallverdú, P. Caminal and A. Voss, “Correlation dimension analysis of heart rate variability in patients with dilated cardiomyopathy”, Computer Methods and Programs in Biomedicine, vol. 78, no. 2, pp. 133-140, 2005.
M. Ding, C. Grebogi, E. Ott, T. Sauer and J. Yorke, “Estimating correlation dimension from chaotic time series: when does plateau occur?”, Physica D, vol. 9, no. 3-4, pp. 404-424, 1993.