An insight to the automatic categorization of speakers according to sex and its application to the detection of voice pathologies: A comparative study
Keywords:inverse filtering, GMM, UBM, voice pathology detection
An automatic categorization of the speakers according to their sex improves the performance of an automatic detector of voice pathologies. This is grounded on findings demonstrating perceptual, acoustical and anatomical differences in males’ and females’ voices. In particular, this paper follows two objectives: 1) to design a system which automatically discriminates the sex of a speaker when using normophonic and pathological speech, 2) to study the influence that this sex detector has on the accuracy of a further voice pathology detector. The parameterization of the automatic sex detector relies on MFCC applied to speech; and MFCC applied to glottal waveforms plus parameters modeling the vocal tract. The glottal waveforms are extracted from speech via iterative lattice inverse filters. Regarding the pathology detector, a MFCC parameterization is applied to speech signals. Classification, in both sex and pathology detectors, is carried out using state of the art techniques based on universal background models. Experiments are performed in the Saarbrücken database, employing the sustained phonation of vowel /a/. Results indicate that the sex of the speaker may be discriminated automatically using normophonic and pathological speech, obtaining accuracy up to 95%. Moreover, including the a-priori information about the sex of the speaker produces an absolute performance improvement in EER of about 2% on pathology detection tasks.
J. Godino, N. Sáenz, V. Osma, S. Aguilera and P. Gómez, “An integrated tool for the diagnosis of voice disorders”, Medical Engineering & Physics, vol. 28, no. 3, pp. 276-289, 2006.
World Health Organization (WHO), Gender mainstreaming for health managers: a practical approach. Geneva, Switzerland: WHO; Department of Gender, Women and Health, 2011.
M. Benzeghiba et al., “Automatic speech recognition and speech variability: A review”, Speech Communication, vol. 49, no. 10-11, pp. 763-786, 2007.
C. Huang, T. Chen, S. Li, E. Chang and J. Zhou, “Analysis of speaker variability”, in 2nd INTERSPEECH, Aalborg, Denmark, 2001, pp. 1377-1380.
V. Parsa and D. Jamieson, “Acoustic discrimination of pathological voice: sustained vowels versus continuous speech”, Journal of Speech, Language, and Hearing Research, vol. 44, no. 2, pp. 327-339, 2001.
N. Sáenz, J. Godino, V. Osma and P. Gómez, “Methodological issues in the development of automatic systems for voice pathology detection”, Biomedical Signal Processing and Control, vol. 1, no. 2, pp. 120-128, 2006.
J. Godino, P. Gómez and M. Blanco, “Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and shortterm cepstral parameters”, IEEE Trans. Biomed. Eng., vol. 53, no. 10, pp. 1943-1953, 2006.
J. Arias, J. Godino, N. Sáenz, V. Osma and G. Castellanos, “Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients”, IEEE Trans. Biomed. Eng., vol. 58, no. 2, pp. 370-379, 2011.
D. Childers, K. Wu, K. Bae and D. Hicks, “Automatic recognition of gender by voice”, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, USA, 1988, pp. 603-606.
T. Vogt and E. André, “Improving automatic emotion recognition from speech via gender differentiation”, in Language Resources and Evaluation Conference (LREC), Genoa, Italy, 2006, pp. 1123-1126
W. Andrews, M. Kohler, J. Campbell, J. Godfrey and J. Hernández, “Gender-dependent phonetic refraction for speaker recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, USA, 2002, pp. 149-152.
S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1557-1565, 2006.
D. Childers and K. Wu, “Gender recognition from speech. Part II: Fine analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1841- 1856, 1991.
K. Wu and D. Childers, “Gender recognition from speech. Part I: Coarse analysis”, The Journal of the Acoustical Society of America, vol. 90, pp. 1828- 1840, 1991.
D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”, The Journal of the Acoustical Society of America, vol. 87, no. 2, pp. 820-857, 1990.
T. Hixon, G. Weismer and J. Hoit, Preclinical Speech Science: Anatomy, Physiology, Acoustics, Perception, 1st ed. San Diego, USA: Plural Publishing, Inc., 2008.
A. Behrman, Speech and Voice Science, 1st ed. San Diego, USA: Plural Publishing, Inc., 2007.
M. Södersten, S. Hertegård and B. Hammarberg, “Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women”, Journal of Voice, vol. 9, no. 2, pp. 182-197, 1995.
H. Hanson and E. Chuang, “Glottal characteristics of male speakers: acoustic correlates and comparison with female data”, The Journal of the Acoustical Society of America, vol. 106, no. 2, pp. 1064-1077, 1999.
E. Mendoza, N. Valencia, J. Muñoz and H. Trujillo, “Differences in voice quality between men and women: use of the long-term average spectrum (LTAS)”, Journal of Voice, vol. 10, no. 1, pp. 59-66, 1996.
H. Hanson, “Glottal characteristics of female speakers: acoustic correlates”, The Journal of the Acoustical Society of America, vol. 101, no. 1, pp. 466-481, 1997.
E. Holmberg, R. Hillman and J. Perkell, “Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice”, The Journal of the Acoustical Society of America, vol. 84, no. 2, pp. 511-529, 1988.
R. Monsen, and E. Engebretson, “Study of variations in the male and female glottal wave”, The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 981- 993, 1977.
L. Walawalkar, M. Yeasin, A. Narasimhamurthy and R. Sharma, “Support vector learning for gender classification using audio and visual cues: A comparison”, in 1st International Workshop on Pattern Recognition with Support Vector Machines (SVM), Niagara Falls, Canada, 2002, pp. 144-159.
Y. Zeng, Z. Wu, T. Falk and W. Chan, “Robust GMM based gender classification using pitch and RASTAPLP parameters of speech”, in International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, pp. 3376-3379.
C. Muñoz, R. Martínez, A. Álvarez, L. Mazaira and P. Gómez, “Discriminacion de genero basada en nuevos parámetros MFCC”, in 1st WTM-IP: Workshop de Tecnologías Multibiométricas para la Identificación de personas, Las Palmas de Gran Canaria, Spain, 2010, pp. 22-25.
R. Fraile, N. Sáenz, J. Godino, V. Osma and C. Fredouille, “Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex”, Folia Phoniatrica et Logopaedica, vol. 61, no. 3, pp. 146-152, 2009.
M. Putzer and W. Barry, “Instrumental dimensioning of normal and pathological phonation using acoustic measurements”, Clinical Linguistics & Phonetics, vol. 22, no. 6, pp. 407-420, 2008.
P. Gómez et al., “Evaluation of voice pathology based on the estimation of vocal fold biomechanical parameters”, Journal of Voice, vol. 21, no. 4, pp. 450- 476, 2007.
P. Gómez et al., “Evidence of vocal cord pathology from the mucosal wave cepstral contents”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, 2004, pp. 437-440.
M. Airas, “TKK Aparat: an environment for voice inverse filtering and parameterization”, Logopedics Phoniatrics Vocology, vol. 33, no. 1, pp. 49-64, 2008.
P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering”, Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992.
P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental”, in Joint Workshop on Pattern Recognition and Artificial Intelligence, Hyannis, USA, 1976, pp. 91-103.
D. Reynolds, T. Quatieri and R. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1-3, pp. 19- 41, 2000.
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, “Front-End Factor Analysis for Speaker Verification”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: from features to supervectors”, Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.
W. Campbell, D. Sturim and D. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, 2006.
M. Pützer and W. Barry, Saarbrücken voice database, Saarland University. [Online]. Available: http://www.stimmdatenbank.coli.uni-saarland.de. Accessed on: Aug. 29, 2009.
D. Martínez, E. Lleida, A. Ortega, A. Miguel and J. Villalba, “Voice pathology detection on the Saarbrücken voice database with calibration and fusion of scores using multifocal toolkit”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 99-109.
D. Martínez, E. Lleida, A. Ortega and A. Miguel, “Score level versus audio level fusion for voice pathology detection on the Saarbrücken Voice Database”, in IberSPEECH: “VII Jornadas en Tecnología del Habla” and III Iberian SLTech Workshop, Madrid, Spain, 2012, pp. 110-120.
How to Cite
Copyright (c) 2016 Revista Facultad de Ingeniería Universidad de Antioquia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Revista Facultad de Ingeniería, Universidad de Antioquia is licensed under the Creative Commons Attribution BY-NC-SA 4.0 license. https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
The material published in the journal can be distributed, copied and exhibited by third parties if the respective credits are given to the journal. No commercial benefit can be obtained and derivative works must be under the same license terms as the original work.