Detección automática de voz hipernasal de niños con labio y paladar hendido a partir de vocales y palabras del español usando medidas clásicas y análisis no lineal
DOI:
https://doi.org/10.17533/udea.redin.n80a12Palabras clave:
detección automática de hipernasalidad, labio y paladar hendido, medidas de perturbación, medidas de ruido, dinámica no linealResumen
Este artículo presenta un sistema para la detección automática de señales de voz hipernasales basado en la combinación de dos diferentes esquemas de caracterización aplicados en las cinco vocales del español y dos palabras seleccionadas. El primer esquema está basado en características clásicas como perturbaciones del periodo fundamental, medidas de ruido y coeficientes cepstrales en la frecuencia de Mel. El segundo enfoque está basado en medidas de dinámica no lineal. Las características más relevantes son seleccionadas usando dos técnicas: análisis de componentes principales y selección flotante hacia adelante secuencial. La decisión acerca de si un registro de voz es hipernasal o sano es tomada usando una máquina de soporte vectorial de margen suave. Los experimentos consideran grabaciones de las cinco vocales del idioma español y las palabras /coco/ y /gato/ y se consideran, asimismo, tres conjuntos de características: (1) el enfoque clásico, (2) el análisis de dinámica no lineal y (3) la combinación de ambos esquemas. En general, los aciertos son mayores y más estables cuando las características clásicas y no lineales son combinadas, indicando que el análisis de dinámica no lineal se complementa con el esquema clásico.
Descargas
Citas
J. Arias, J. Godino, N. Sáenz, V. Osma and G. J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “An improved method for voice pathology detection by means of a HMM-based feature space transformation”, Pattern Recognition, vol. 43, no. 9, pp. 3100-3112, 2010.
T. Yun, W. Ching and L. Guo, “Voice low tone to high tone ratio, nasalance, and nasality ratings in connected speech of native Mandarin speakers: a pilot study”, The Cleft Lip and Palate Journal, vol. 49, no. 4, pp. 437-446, 2012.
A. Kummer, Cleft palate and craniofacial anomalies: effects on speech and resonance, 2nd ed. Cincinnati, USA: Cengage Learning, 2007.
A. Kummer and L. Lee, “Evaluation and Treatment of Resonance Disorders”, Language, Speech, and Hearing Services in Schools, vol. 27, pp. 271-281, 1996.
P. Vijayalakshmi, M. Reddy and D. O’Shaughnessy, “Acoustic analysis and detection of hypernasality using a group delay function”, IEEE Transactions on Biomedical Engineering, vol. 54, no. 4, pp. 621-629, 2007.
L. He et al., “Automatic evaluation of hypernasality based on a cleft palate speech database”, Journal of medical systems, vol. 39, no. 5, pp. 1-7, 2015.
K. Golding, “Therapy techniques for cleft palate speech and related disorders”, 1st ed. New York, USA: Singular Thomson Learning, 2001.
J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short- term cepstral parameters”, IEEE Transactions on Biomedical Engineering, vol. 53, no. 10, pp. 1943-1953, 2006.
A. Maier, F. Hönig, C. Hacker, M. Schuster and E. Nöth, “Automatic evaluation of characteristic speech disorders in children with cleft lip and palate”, in 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Australia, 2008, pp. 1757-1760.
A. Giovanni et al., “Nonlinear behavior of vocal fold vibration: the role of coupling between the vocal folds”, Journal of Voice, vol. 13, no. 4, pp. 465-476, 1999.
J. Orozco et al., “Automatic selection of acoustic and non-linear dynamic features in voice signals for hypernasality detection”, in 12th Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, 2011, pp. 529-532.
H. Kantz and T. Schreiber, Nonlinear time series analysis, 2nd ed. Cambridge, UK: Cambridge University Press, 2004.
N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.
H. Wertzner, S. Schreiber and L. Amaro, “Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders”, Rev. Bras. Otorrinolaringol., vol. 71, no. 5, pp. 582-588, 2005.
L. Guo, W. Ching and S. Fu, “Evaluation of hypernasality in vowels using voice low tone to high tone ratio”, Cleft Palate Craniofacial Journal, vol. 46, no. 1, pp. 47-52, 2009.
B. Boyanov and S. Hadjitodorov, “Acoustic analysis of pathological voices: A voice analysis system for the screening of laryngeal diseases”, IEEE Engineering in Medicine and Biology, vol. 16, no. 4, pp. 74-82, 1997.
K. Shama, A. Krishna and N. Cholayya N, “Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology”, EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. 1-9, 2007.
E. Yumoto, W. Gould and T. Baer, “Harmonics-to-noise ratio as an index of the degree of hoarseness”, Journal of the Acoustical Society of America, vol. 71, no. 6, pp. 1544-1550, 1982.
G. de Krom, “A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals”, Journal of Speech, Language and Hearing Research, vol. 36, no. 2, pp. 254-266, 1993.
P. Murphy and O. Akande, “Cepstrum-based Harmonics to Noise Ratio Measurement in voiced speech”, Lecture Notes in Artificial Intelligence, vol. 3445, pp. 199-218, 2005.
H. Kasuya, S. Ogawa, K. Mashima and S. Ebihara, “Normalized noise energy as an acoustic measure to evaluate pathologic voice”, Journal of the Acoustical Society of America, vol. 80, no. 5, pp. 1329-1334, 1986.
D. Michaelis, T. Gramss and H. Strube, “Glottal-to- Noise Excitation Ratio - A new measure for describing pathological voices”, Acta Acust. united Ac., vol. 83, no. 4, pp. 700-706, 1997.
J. Godino et al., “The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders”, Journal of Voice, vol. 24, no. 1, pp. 47-56, 2010.
S. Bou and J. Hansen, “A comparative study of traditional and newly proposed features for recognition of speech under stress”, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 4, pp. 429-442, 2000.
J. Jiang, Y. Zhang and C. McGilligan, “Chaos in voice, from modeling to measurement”, Journal of Voice, vol. 20, no. 1, pp. 2-17, 2006.
F. Takens, “Detecting strange attractors in turbulence”, Lecture Notes in Mathematics, vol. 898, pp. 366-381, 1981.
P. Henriquez et al., “Characterization of healthy and pathological voice through measures based on nonlinear dynamics”, IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 6, pp. 1186-1195, 2009.
A. Shaheen, N. Roy and J. Jiang, “Nonlinear dynamic analysis of disordered voice: the relationship between the correlation dimension (D2) and pre-/post- treatment change in perceived dysphonia severity”, Journal of Voice, vol. 24, no. 3, pp. 285-293, 2010.
J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel- cepstral coefficients”, IEEE Transactions on Biomedical Engineering, vol. 58, no. 2, pp. 370-379, 2011.
P. Grassberger and I. Procaccia, “Measuring the strangeness of strange attractors”, Physica D, vol. 9, no. 1-2, pp. 189-208, 1983.
H. Abarbanel, Analysis of observed chaotic data, 1st ed. New York, USA: Institute of Nonlinear Science, 1999.
M. Rosenstein, J. Collins and C. De Luca, “A practical method for calculating largest Lyapunov exponents from small data sets”, Physica D, vol. 65, no. 1-2, pp. 117-134, 1993.
V. Oseledec, “A multiplicative ergodic theorem. Lyapunov characteristic numbers for dynamical systems”, Transactions of Moscow Mathematic Society, vol. 19, pp. 197-231, 1968.
H. Hurst, R. Black and Y. Simaika, Long-term storage: an experimental study, 1st ed. London, UK: Constable, 1965.
F. Kaspar and H. Shuster, “Easily calculable measure for complexity of spatiotemporal patterns”, A Physical Review, vol. 36, no. 2, pp. 842-848, 1987.
Jolliffe, Principal Component Analysis, 2nd ed. New York, USA: Springer, 2002.
R. Bro and A. Smilde, “Principal component analysis”, Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014.
P. Pudil, J. Novovicova and J. Kittler, “Floating search methods in feature selection”, Pattern Recognition Letters, vol. 15, no. 11, pp. 1119-1125, 1994.
P. Pudil, F. Ferri, J. Novovicova and J. Kittler, “Floating search methods for feature selection with nonmonotonic criterion functions”, in 12th IAPR International Conference on Pattern Recognition, Jerusalem, Israel, 1994, pp. 279-283.
B. Scholköpf and A. Smola, Learning with Kernels, 1st ed. Cambridge, USA: The MIT Press, 2002.
D. Kuehn and K. Moller, “Speech and language issues in the cleft palate population: the state of the art”, Cleft Palate-Craniofacial Journal, vol. 37, no. 4, pp. 1-35, 2000.
R. Carvajal, N. Wessel, M. Vallverdú, P. Caminal and A. Voss, “Correlation dimension analysis of heart rate variability in patients with dilated cardiomyopathy”, Computer Methods and Programs in Biomedicine, vol. 78, no. 2, pp. 133-140, 2005.
M. Ding, C. Grebogi, E. Ott, T. Sauer and J. Yorke, “Estimating correlation dimension from chaotic time series: when does plateau occur?”, Physica D, vol. 9, no. 3-4, pp. 404-424, 1993.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2016 Revista Facultad de Ingeniería Universidad de Antioquia
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.
Los artículos disponibles en la Revista Facultad de Ingeniería, Universidad de Antioquia están bajo la licencia Creative Commons Attribution BY-NC-SA 4.0.
Eres libre de:
Compartir — copiar y redistribuir el material en cualquier medio o formato
Adaptar : remezclar, transformar y construir sobre el material.
Bajo los siguientes términos:
Reconocimiento : debe otorgar el crédito correspondiente , proporcionar un enlace a la licencia e indicar si se realizaron cambios . Puede hacerlo de cualquier manera razonable, pero no de ninguna manera que sugiera que el licenciante lo respalda a usted o su uso.
No comercial : no puede utilizar el material con fines comerciales .
Compartir igual : si remezcla, transforma o construye a partir del material, debe distribuir sus contribuciones bajo la misma licencia que el original.
El material publicado por la revista puede ser distribuido, copiado y exhibido por terceros si se dan los respectivos créditos a la revista, sin ningún costo. No se puede obtener ningún beneficio comercial y las obras derivadas tienen que estar bajo los mismos términos de licencia que el trabajo original.