An insight to the automatic categorization of speakers according to sex and its application to the detection of voice pathologies: A comparative study

Jorge Andrés Gómez-García; Laureano Moro-Velázquez; Juan Ignacio Godino-Llorente; César Germán Castellanos-Domínguez

doi:10.17533/udea.redin.n79a06

Authors

Jorge Andrés Gómez-García Polytechnic University of Madrid https://orcid.org/0000-0002-6060-387X
Laureano Moro-Velázquez Polytechnic University of Madrid https://orcid.org/0000-0002-3033-7005
Juan Ignacio Godino-Llorente Polytechnic University of Madrid https://orcid.org/0000-0001-7348-3291
César Germán Castellanos-Domínguez National University of Colombia https://orcid.org/0000-0002-0138-5489

DOI:

https://doi.org/10.17533/udea.redin.n79a06

Keywords:

inverse filtering, GMM, UBM, voice pathology detection

Abstract

An automatic categorization of the speakers according to their sex improves the performance of an automatic detector of voice pathologies. This is grounded on findings demonstrating perceptual, acoustical and anatomical differences in males’ and females’ voices. In particular, this paper follows two objectives: 1) to design a system which automatically discriminates the sex of a speaker when using normophonic and pathological speech, 2) to study the influence that this sex detector has on the accuracy of a further voice pathology detector. The parameterization of the automatic sex detector relies on MFCC applied to speech; and MFCC applied to glottal waveforms plus parameters modeling the vocal tract. The glottal waveforms are extracted from speech via iterative lattice inverse filters. Regarding the pathology detector, a MFCC parameterization is applied to speech signals. Classification, in both sex and pathology detectors, is carried out using state of the art techniques based on universal background models. Experiments are performed in the Saarbrücken database, employing the sustained phonation of vowel /a/. Results indicate that the sex of the speaker may be discriminated automatically using normophonic and pathological speech, obtaining accuracy up to 95%. Moreover, including the a-priori information about the sex of the speaker produces an absolute performance improvement in EER of about 2% on pathology detection tasks.

|Abstract

= 593 veces | PDF

= 231 veces|

Downloads

Download data is not yet available.

Author Biographies

Jorge Andrés Gómez-García, Polytechnic University of Madrid

Center for Biomedical Technology (CTB).

Laureano Moro-Velázquez, Polytechnic University of Madrid

Center for Biomedical Technology (CTB).

Juan Ignacio Godino-Llorente, Polytechnic University of Madrid

Center for Biomedical Technology (CTB).

César Germán Castellanos-Domínguez, National University of Colombia

Department of Electronic, Electrical and Computer Engineering, Manizales Headquarters.

References

J. Godino, N. Sáenz, V. Osma, S. Aguilera and P. Gómez, “An integrated tool for the diagnosis of voice disorders”, Medical Engineering & Physics, vol. 28, no. 3, pp. 276-289, 2006.

World Health Organization (WHO), Gender mainstreaming for health managers: a practical approach. Geneva, Switzerland: WHO; Department of Gender, Women and Health, 2011.

M. Benzeghiba et al., “Automatic speech recognition and speech variability: A review”, Speech Communication, vol. 49, no. 10-11, pp. 763-786, 2007.

C. Huang, T. Chen, S. Li, E. Chang and J. Zhou, “Analysis of speaker variability”, in 2nd INTERSPEECH, Aalborg, Denmark, 2001, pp. 1377-1380.

V. Parsa and D. Jamieson, “Acoustic discrimination of pathological voice: sustained vowels versus continuous speech”, Journal of Speech, Language, and Hearing Research, vol. 44, no. 2, pp. 327-339, 2001.

N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.

J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and shortterm cepstral parameters”, IEEE Trans. Biomed. Eng., vol. 53, no. 10, pp. 1943-1953, 2006.

J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients”, IEEE Trans. Biomed. Eng., vol. 58, no. 2, pp. 370-379, 2011.

D. Childers, K. Wu, K. Bae and D. Hicks, “Automatic recognition of gender by voice”, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, USA, 1988, pp. 603-606.

T. Vogt and E. André, “Improving automatic emotion recognition from speech via gender differentiation”, in Language Resources and Evaluation Conference (LREC), Genoa, Italy, 2006, pp. 1123-1126

W. Andrews, M. Kohler, J. Campbell, J. Godfrey and J. Hernández, “Gender-dependent phonetic refraction for speaker recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA, 2002, pp. 149-152.

S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1557-1565, 2006.

D. Childers and K. Wu, “Gender recognition from speech. Part II: Fine analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1841- 1856, 1991.

K. Wu and D. Childers, “Gender recognition from speech. Part I: Coarse analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1828- 1840, 1991.

D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”, The Journal of the Acoustical Society of America, vol. 87, no. 2, pp. 820-857, 1990.

T. Hixon, G. Weismer and J. Hoit, Preclinical Speech Science: Anatomy, Physiology, Acoustics, Perception, 1st ed. San Diego, USA: Plural Publishing, Inc., 2008.

A. Behrman, Speech and Voice Science, 1st ed. San Diego, USA: Plural Publishing, Inc., 2007.

M. Södersten, S. Hertegård and B. Hammarberg, “Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women”, Journal of Voice, vol. 9, no. 2, pp. 182-197, 1995.

H. Hanson and E. Chuang, “Glottal characteristics of male speakers: acoustic correlates and comparison with female data”, The Journal of the Acoustical Society of America, vol. 106, no. 2, pp. 1064-1077, 1999.

E. Mendoza, N. Valencia, J. Muñoz and H. Trujillo, “Differences in voice quality between men and women: use of the long-term average spectrum (LTAS)”, Journal of Voice, vol. 10, no. 1, pp. 59-66, 1996.

H. Hanson, “Glottal characteristics of female speakers: acoustic correlates”, The Journal of the Acoustical Society of America, vol. 101, no. 1, pp. 466-481, 1997.

E. Holmberg, R. Hillman and J. Perkell, “Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice”, The Journal of the Acoustical Society of America, vol. 84, no. 2, pp. 511-529, 1988.

R. Monsen, and E. Engebretson, “Study of variations in the male and female glottal wave”, The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 981- 993, 1977.

L. Walawalkar, M. Yeasin, A. Narasimhamurthy and R. Sharma, “Support vector learning for gender classification using audio and visual cues: A comparison”, in 1st International Workshop on Pattern Recognition with Support Vector Machines (SVM), Niagara Falls, Canada, 2002, pp. 144-159.

Y. Zeng, Z. Wu, T. Falk and W. Chan, “Robust GMM based gender classification using pitch and RASTAPLP parameters of speech”, in International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, pp. 3376-3379.

C. Muñoz, R. Martínez, A. Álvarez, L. Mazaira and P. Gómez, “Discriminacion de genero basada en nuevos parámetros MFCC”, in 1st WTM-IP: Workshop de Tecnologías Multibiométricas para la Identificación de personas, Las Palmas de Gran Canaria, Spain, 2010, pp. 22-25.

R. Fraile, N. Sáenz, J. Godino, V. Osma and C. Fredouille, “Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex”, Folia Phoniatrica et Logopaedica, vol. 61, no. 3, pp. 146-152, 2009.

M. Putzer and W. Barry, “Instrumental dimensioning of normal and pathological phonation using acoustic measurements”, Clinical Linguistics & Phonetics, vol. 22, no. 6, pp. 407-420, 2008.

P. Gómez et al., “Evaluation of voice pathology based on the estimation of vocal fold biomechanical parameters”, Journal of Voice, vol. 21, no. 4, pp. 450- 476, 2007.

P. Gómez et al., “Evidence of vocal cord pathology from the mucosal wave cepstral contents”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, 2004, pp. 437-440.

M. Airas, “TKK Aparat: an environment for voice inverse filtering and parameterization”, Logopedics Phoniatrics Vocology, vol. 33, no. 1, pp. 49-64, 2008.

P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering”, Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992.

P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental”, in Joint Workshop on Pattern Recognition and Artificial Intelligence, Hyannis, USA, 1976, pp. 91-103.

D. Reynolds, T. Quatieri and R. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1-3, pp. 19- 41, 2000.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.

T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: from features to supervectors”, Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.

W. Campbell, D. Sturim and D. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, 2006.

M. Pützer and W. Barry, Saarbrücken voice database, Saarland University. [Online]. Available: http://www.stimmdatenbank.coli.uni-saarland.de. Accessed on: Aug. 29, 2009.

D. Martínez, E. Lleida, A. Ortega, A. Miguel and J. Villalba, “Voice pathology detection on the Saarbrücken voice database with calibration and fusion of scores using multifocal toolkit”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 99-109.

D. Martínez, E. Lleida, A. Ortega and A. Miguel, “Score level versus audio level fusion for voice pathology detection on the Saarbrücken Voice Database”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 110-120.

An insight to the automatic categorization of speakers according to sex and its application to the detection of voice pathologies: A comparative study

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Jorge Andrés Gómez-García, Polytechnic University of Madrid

Laureano Moro-Velázquez, Polytechnic University of Madrid

Juan Ignacio Godino-Llorente, Polytechnic University of Madrid

César Germán Castellanos-Domínguez, National University of Colombia

References

Downloads

Published

How to Cite

Issue

Section

License

You are free to:

Most read articles by the same author(s)

Keywords

Language

Information

Current Issue