An insight to the automatic categorization of speakers according to sex and its application to the detection of voice pathologies: A comparative study
An automatic categorization of the speakers according to their sex improves the performance of an automatic detector of voice pathologies. This is grounded on findings demonstrating perceptual, acoustical and anatomical differences in males’ and females’ voices. In particular, this paper follows two objectives: 1) to design a system which automatically discriminates the sex of a speaker when using normophonic and pathological speech, 2) to study the influence that this sex detector has on the accuracy of a further voice pathology detector. The parameterization of the automatic sex detector relies on MFCC applied to speech; and MFCC applied to glottal waveforms plus parameters modeling the vocal tract. The glottal waveforms are extracted from speech via iterative lattice inverse filters. Regarding the pathology detector, a MFCC parameterization is applied to speech signals. Classification, in both sex and pathology detectors, is carried out using state of the art techniques based on universal background models. Experiments are performed in the Saarbrücken database, employing the sustained phonation of vowel /a/. Results indicate that the sex of the speaker may be discriminated automatically using normophonic and pathological speech, obtaining accuracy up to 95%. Moreover, including the a-priori information about the sex of the speaker produces an absolute performance improvement in EER of about 2% on pathology detection tasks.
J. Godino, N. Sáenz, V. Osma, S. Aguilera and P. Gómez, “An integrated tool for the diagnosis of voice disorders”, Medical Engineering & Physics, vol. 28, no. 3, pp. 276-289, 2006.
World Health Organization (WHO), Gender mainstreaming for health managers: a practical approach. Geneva, Switzerland: WHO; Department of Gender, Women and Health, 2011.
M. Benzeghiba et al., “Automatic speech recognition and speech variability: A review”, Speech Communication, vol. 49, no. 10-11, pp. 763-786, 2007.
C. Huang, T. Chen, S. Li, E. Chang and J. Zhou, “Analysis of speaker variability”, in 2nd INTERSPEECH, Aalborg, Denmark, 2001, pp. 1377-1380.
V. Parsa and D. Jamieson, “Acoustic discrimination of pathological voice: sustained vowels versus continuous speech”, Journal of Speech, Language, and Hearing Research, vol. 44, no. 2, pp. 327-339, 2001.
N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.
J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and shortterm cepstral parameters”, IEEE Trans. Biomed. Eng., vol. 53, no. 10, pp. 1943-1953, 2006.
J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients”, IEEE Trans. Biomed. Eng., vol. 58, no. 2, pp. 370-379, 2011.
D. Childers, K. Wu, K. Bae and D. Hicks, “Automatic recognition of gender by voice”, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, USA, 1988, pp. 603-606.
T. Vogt and E. André, “Improving automatic emotion recognition from speech via gender differentiation”, in Language Resources and Evaluation Conference (LREC), Genoa, Italy, 2006, pp. 1123-1126
W. Andrews, M. Kohler, J. Campbell, J. Godfrey and J. Hernández, “Gender-dependent phonetic refraction for speaker recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA, 2002, pp. 149-152.
S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1557-1565, 2006.
D. Childers and K. Wu, “Gender recognition from speech. Part II: Fine analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1841- 1856, 1991.
K. Wu and D. Childers, “Gender recognition from speech. Part I: Coarse analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1828- 1840, 1991.
D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”, The Journal of the Acoustical Society of America, vol. 87, no. 2, pp. 820-857, 1990.
T. Hixon, G. Weismer and J. Hoit, Preclinical Speech Science: Anatomy, Physiology, Acoustics, Perception, 1st ed. San Diego, USA: Plural Publishing, Inc., 2008.
A. Behrman, Speech and Voice Science, 1st ed. San Diego, USA: Plural Publishing, Inc., 2007.
M. Södersten, S. Hertegård and B. Hammarberg, “Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women”, Journal of Voice, vol. 9, no. 2, pp. 182-197, 1995.
H. Hanson and E. Chuang, “Glottal characteristics of male speakers: acoustic correlates and comparison with female data”, The Journal of the Acoustical Society of America, vol. 106, no. 2, pp. 1064-1077, 1999.
E. Mendoza, N. Valencia, J. Muñoz and H. Trujillo, “Differences in voice quality between men and women: use of the long-term average spectrum (LTAS)”, Journal of Voice, vol. 10, no. 1, pp. 59-66, 1996.
H. Hanson, “Glottal characteristics of female speakers: acoustic correlates”, The Journal of the Acoustical Society of America, vol. 101, no. 1, pp. 466-481, 1997.
E. Holmberg, R. Hillman and J. Perkell, “Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice”, The Journal of the Acoustical Society of America, vol. 84, no. 2, pp. 511-529, 1988.
R. Monsen, and E. Engebretson, “Study of variations in the male and female glottal wave”, The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 981- 993, 1977.
L. Walawalkar, M. Yeasin, A. Narasimhamurthy and R. Sharma, “Support vector learning for gender classification using audio and visual cues: A comparison”, in 1st International Workshop on Pattern Recognition with Support Vector Machines (SVM), Niagara Falls, Canada, 2002, pp. 144-159.
Y. Zeng, Z. Wu, T. Falk and W. Chan, “Robust GMM based gender classification using pitch and RASTAPLP parameters of speech”, in International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, pp. 3376-3379.
C. Muñoz, R. Martínez, A. Álvarez, L. Mazaira and P. Gómez, “Discriminacion de genero basada en nuevos parámetros MFCC”, in 1st WTM-IP: Workshop de Tecnologías Multibiométricas para la Identificación de personas, Las Palmas de Gran Canaria, Spain, 2010, pp. 22-25.
R. Fraile, N. Sáenz, J. Godino, V. Osma and C. Fredouille, “Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex”, Folia Phoniatrica et Logopaedica, vol. 61, no. 3, pp. 146-152, 2009.
M. Putzer and W. Barry, “Instrumental dimensioning of normal and pathological phonation using acoustic measurements”, Clinical Linguistics & Phonetics, vol. 22, no. 6, pp. 407-420, 2008.
P. Gómez et al., “Evaluation of voice pathology based on the estimation of vocal fold biomechanical parameters”, Journal of Voice, vol. 21, no. 4, pp. 450- 476, 2007.
P. Gómez et al., “Evidence of vocal cord pathology from the mucosal wave cepstral contents”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, 2004, pp. 437-440.
M. Airas, “TKK Aparat: an environment for voice inverse filtering and parameterization”, Logopedics Phoniatrics Vocology, vol. 33, no. 1, pp. 49-64, 2008.
P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering”, Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992.
P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental”, in Joint Workshop on Pattern Recognition and Artificial Intelligence, Hyannis, USA, 1976, pp. 91-103.
D. Reynolds, T. Quatieri and R. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1-3, pp. 19- 41, 2000.
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: from features to supervectors”, Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.
W. Campbell, D. Sturim and D. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, 2006.
M. Pützer and W. Barry, Saarbrücken voice database, Saarland University. [Online]. Available: http://www.stimmdatenbank.coli.uni-saarland.de. Accessed on: Aug. 29, 2009.
D. Martínez, E. Lleida, A. Ortega, A. Miguel and J. Villalba, “Voice pathology detection on the Saarbrücken voice database with calibration and fusion of scores using multifocal toolkit”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 99-109.
D. Martínez, E. Lleida, A. Ortega and A. Miguel, “Score level versus audio level fusion for voice pathology detection on the Saarbrücken Voice Database”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 110-120.
Copyright (c) 2016 Revista Facultad de Ingeniería Universidad de Antioquia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors can archive the pre-print version (i.e., the version prior to peer review) and post-print version (that is, the final version after peer review and layout process) on their personal website, institutional repository and / or thematic repository
- Upon acceptance of an article, it will be published online through the page https://revistas.udea.edu.co/index.php/ingenieria/issue/archive in PDF version with its correspondent DOI identifier
The Revista Facultad de Ingeniería -redin- encourages the Political Constitution of Colombia, chapter IV
Chapter IV Sanctions 51
The following shall be liable to imprisonment for two to five years and a fine of five to 20 times the legal minimum monthly wage: (1) any person who publishes an unpublished literary or artistic work, or part thereof, by any means, without the express prior authorization of the owner of rights; (2) any person who enters in the National Register of Copyright a literary, scientific or artistic work in the name of a person other than the true author, or with its title altered or deleted, or with its text altered, deformed, amended or distorted, or with a false mention of the name of the publisher or phonogram, film, videogram or software producer; (3) any person who in any way or by any means reproduces, disposes of, condenses, mutilates or otherwise transforms a literary, scientific or artistic work without the express prior authorization of the owners thereof; (4) any person who reproduces phonograms, videograms, software or cinematographic works without the express prior authorization of the owner, or transports, stores, stocks, distributes, imports, sells, offers for sale, acquires for sale or distribution or in any way deals in such reproductions. Paragraph. If either the material embodiment or title page of or the introduction to the literary work, phonogram, videogram, software or cinematographic work uses the name, business style, logotype or distinctive mark of the lawful owner of rights, the foregoing sanctions shall be increased by up to half.