An insight to the automatic categorization of speakers according to sex and its application to the detection of voice pathologies: A comparative study

Jorge Andrés Gómez-García; Laureano Moro-Velázquez; Juan Ignacio Godino-Llorente; César Germán Castellanos-Domínguez

doi:10.17533/udea.redin.n79a06

Autores/as

Jorge Andrés Gómez-García Universidad Politécnica de Madrid https://orcid.org/0000-0002-6060-387X
Laureano Moro-Velázquez Universidad Politécnica de Madrid https://orcid.org/0000-0002-3033-7005
Juan Ignacio Godino-Llorente Universidad Politécnica de Madrid https://orcid.org/0000-0001-7348-3291
César Germán Castellanos-Domínguez Universidad Nacional de Colombia https://orcid.org/0000-0002-0138-5489

DOI:

https://doi.org/10.17533/udea.redin.n79a06

Palabras clave:

filtrado inverso, GMM, UBM, detección de la patología de voz

Resumen

Una categorización automática de los hablantes de acuerdo con su sexo mejora el rendimiento de un detector automático de patologías de voz. Esto se fundamenta en hallazgos que demuestran diferencias perceptuales, acústicas y anatómicas en voces masculinas y femeninas. En particular, este trabajo persigue dos objetivos: 1) diseñar un sistema que discrimine automáticamente el sexo de hablantes utilizando habla normofónica y patológica, 2) estudiar la influencia que este detector de sexo tiene sobre el acierto de un posterior detector de patologías de voz. La parametrización del detector automático de sexo se basa en MFCC aplicados sobre señales de voz; y MFCC aplicados a formas de onda glotal junto a parámetros que modelan el tracto vocal. Las formas de onda glotal se extraen de la voz a través de un filtrado inverso iterativo en celosía. En cuanto al detector de patologías, una parametrización MFCC se aplica a señales de voz. La clasificación, tanto en los detectores de sexo como de patología, se lleva a cabo con técnicas del estado del arte basadas en modelos de base universal. Experimentos son realizados sobre la base de datos Saarbrücken empleando la fonación sostenida de la vocal /a/. Los resultados indican que el sexo del hablante puede ser discriminado automáticamente utilizando habla normofónica y patológica, obteniendo una precisión de hasta un 95%. Por otra parte, al incluir información a priori sobre el sexo del hablante se produce una mejora de alrededor del 2% de rendimiento absoluto en EER, en tareas de detección de patología.

|Resumen

= 544 veces | PDF (ENGLISH)

= 226 veces|

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Jorge Andrés Gómez-García, Universidad Politécnica de Madrid

Centro de Tecnología Biomédica (CTB).

Laureano Moro-Velázquez, Universidad Politécnica de Madrid

Centro de Tecnología Biomédica (CTB).

Juan Ignacio Godino-Llorente, Universidad Politécnica de Madrid

Centro de Tecnología Biomédica (CTB).

César Germán Castellanos-Domínguez, Universidad Nacional de Colombia

Departamento de Ingeniería Electrónica, Eléctrica y Computación, Sede Manizales.

Citas

J. Godino, N. Sáenz, V. Osma, S. Aguilera and P. Gómez, “An integrated tool for the diagnosis of voice disorders”, Medical Engineering & Physics, vol. 28, no. 3, pp. 276-289, 2006.

World Health Organization (WHO), Gender mainstreaming for health managers: a practical approach. Geneva, Switzerland: WHO; Department of Gender, Women and Health, 2011.

M. Benzeghiba et al., “Automatic speech recognition and speech variability: A review”, Speech Communication, vol. 49, no. 10-11, pp. 763-786, 2007.

C. Huang, T. Chen, S. Li, E. Chang and J. Zhou, “Analysis of speaker variability”, in 2nd INTERSPEECH, Aalborg, Denmark, 2001, pp. 1377-1380.

V. Parsa and D. Jamieson, “Acoustic discrimination of pathological voice: sustained vowels versus continuous speech”, Journal of Speech, Language, and Hearing Research, vol. 44, no. 2, pp. 327-339, 2001.

N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.

J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and shortterm cepstral parameters”, IEEE Trans. Biomed. Eng., vol. 53, no. 10, pp. 1943-1953, 2006.

J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients”, IEEE Trans. Biomed. Eng., vol. 58, no. 2, pp. 370-379, 2011.

D. Childers, K. Wu, K. Bae and D. Hicks, “Automatic recognition of gender by voice”, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, USA, 1988, pp. 603-606.

T. Vogt and E. André, “Improving automatic emotion recognition from speech via gender differentiation”, in Language Resources and Evaluation Conference (LREC), Genoa, Italy, 2006, pp. 1123-1126

W. Andrews, M. Kohler, J. Campbell, J. Godfrey and J. Hernández, “Gender-dependent phonetic refraction for speaker recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA, 2002, pp. 149-152.

S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1557-1565, 2006.

D. Childers and K. Wu, “Gender recognition from speech. Part II: Fine analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1841- 1856, 1991.

K. Wu and D. Childers, “Gender recognition from speech. Part I: Coarse analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1828- 1840, 1991.

D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”, The Journal of the Acoustical Society of America, vol. 87, no. 2, pp. 820-857, 1990.

T. Hixon, G. Weismer and J. Hoit, Preclinical Speech Science: Anatomy, Physiology, Acoustics, Perception, 1st ed. San Diego, USA: Plural Publishing, Inc., 2008.

A. Behrman, Speech and Voice Science, 1st ed. San Diego, USA: Plural Publishing, Inc., 2007.

M. Södersten, S. Hertegård and B. Hammarberg, “Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women”, Journal of Voice, vol. 9, no. 2, pp. 182-197, 1995.

H. Hanson and E. Chuang, “Glottal characteristics of male speakers: acoustic correlates and comparison with female data”, The Journal of the Acoustical Society of America, vol. 106, no. 2, pp. 1064-1077, 1999.

E. Mendoza, N. Valencia, J. Muñoz and H. Trujillo, “Differences in voice quality between men and women: use of the long-term average spectrum (LTAS)”, Journal of Voice, vol. 10, no. 1, pp. 59-66, 1996.

H. Hanson, “Glottal characteristics of female speakers: acoustic correlates”, The Journal of the Acoustical Society of America, vol. 101, no. 1, pp. 466-481, 1997.

E. Holmberg, R. Hillman and J. Perkell, “Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice”, The Journal of the Acoustical Society of America, vol. 84, no. 2, pp. 511-529, 1988.

R. Monsen, and E. Engebretson, “Study of variations in the male and female glottal wave”, The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 981- 993, 1977.

L. Walawalkar, M. Yeasin, A. Narasimhamurthy and R. Sharma, “Support vector learning for gender classification using audio and visual cues: A comparison”, in 1st International Workshop on Pattern Recognition with Support Vector Machines (SVM), Niagara Falls, Canada, 2002, pp. 144-159.

Y. Zeng, Z. Wu, T. Falk and W. Chan, “Robust GMM based gender classification using pitch and RASTAPLP parameters of speech”, in International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, pp. 3376-3379.

C. Muñoz, R. Martínez, A. Álvarez, L. Mazaira and P. Gómez, “Discriminacion de genero basada en nuevos parámetros MFCC”, in 1st WTM-IP: Workshop de Tecnologías Multibiométricas para la Identificación de personas, Las Palmas de Gran Canaria, Spain, 2010, pp. 22-25.

R. Fraile, N. Sáenz, J. Godino, V. Osma and C. Fredouille, “Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex”, Folia Phoniatrica et Logopaedica, vol. 61, no. 3, pp. 146-152, 2009.

M. Putzer and W. Barry, “Instrumental dimensioning of normal and pathological phonation using acoustic measurements”, Clinical Linguistics & Phonetics, vol. 22, no. 6, pp. 407-420, 2008.

P. Gómez et al., “Evaluation of voice pathology based on the estimation of vocal fold biomechanical parameters”, Journal of Voice, vol. 21, no. 4, pp. 450- 476, 2007.

P. Gómez et al., “Evidence of vocal cord pathology from the mucosal wave cepstral contents”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, 2004, pp. 437-440.

M. Airas, “TKK Aparat: an environment for voice inverse filtering and parameterization”, Logopedics Phoniatrics Vocology, vol. 33, no. 1, pp. 49-64, 2008.

P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering”, Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992.

P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental”, in Joint Workshop on Pattern Recognition and Artificial Intelligence, Hyannis, USA, 1976, pp. 91-103.

D. Reynolds, T. Quatieri and R. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1-3, pp. 19- 41, 2000.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.

T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: from features to supervectors”, Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.

W. Campbell, D. Sturim and D. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, 2006.

M. Pützer and W. Barry, Saarbrücken voice database, Saarland University. [Online]. Available: http://www.stimmdatenbank.coli.uni-saarland.de. Accessed on: Aug. 29, 2009.

D. Martínez, E. Lleida, A. Ortega, A. Miguel and J. Villalba, “Voice pathology detection on the Saarbrücken voice database with calibration and fusion of scores using multifocal toolkit”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 99-109.

D. Martínez, E. Lleida, A. Ortega and A. Miguel, “Score level versus audio level fusion for voice pathology detection on the Saarbrücken Voice Database”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 110-120.

Una mirada a la categorización automática de hablantes de acuerdo al sexo y su aplicación a la detección de patologías de voz: Un estudio comparativo

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Biografía del autor/a

Jorge Andrés Gómez-García, Universidad Politécnica de Madrid

Laureano Moro-Velázquez, Universidad Politécnica de Madrid

Juan Ignacio Godino-Llorente, Universidad Politécnica de Madrid

César Germán Castellanos-Domínguez, Universidad Nacional de Colombia

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Eres libre de:

Bajo los siguientes términos:

Artículos más leídos del mismo autor/a

Palabras clave

Idioma

Información

Número actual