Subject-independent acoustic-to-articulatory mapping of fricative sounds by using vocal tract length normalization
This paper presents an acoustic-to-articulatory (AtoA) mapping method for tracking the movement of the critical articulators on fricative utterances. The proposed approach applies a vocal tract length normalization process. Subsequently, those acoustic time-frequency features better related to movement of articulators from the statistical perspective are used for AtoA mapping. We test this method on the MOCHA-TIMIT database, which contains signals from an electromagnetic articulograph system. The proposed features were tested on an AtoA mapping system based on Gaussian mixture models, where Pearson correlation coeffi cient is used to measure the goodness of estimates. Correlation value between the estimates and reference signals shows that subject-independent AtoA mapping with proposed approach yields comparable results to subject-dependent AtoA mapping.
J. Schroeter and M. Sondhi, “Techniques for estimating vocal-tract shapes from the speech signal”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 133-150, 1994.
H. Kjellström and O. Engwall, “Audiovisual-toarticulatory inversion”, Speech Communication, vol. 51, no. 3, pp. 195-209, 2009.
H. Kadri, E. Duflos and P. Preux, “Learning vocal tract variables with multi-task kernels”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 2200- 2203.
S. Panchapagesan and A. Alwan, “A study of acousticto-articulatory inversion of speech by analysisby-synthesis using chain matrices and the Maeda articulatory model”, J. Acoust. Soc. Am., vol. 129, no. 4, pp. 2144-2162, 2011.
K. Shirai and T. Kobayashi, “Estimating articulatory motion from speech wave”, Speech Communication, vol. 5, no. 2, pp. 159-170, 1986.
V. Sorokin, L. Alexander and A. Trushkin, “Estimation of stability and accuracy of inverse problem solution for the vocal tract”, Speech Communication, vol. 30, pp. 55-74, 2000.
S. Ouni and Y. Laprie, “Modeling the articulatory space using a hypercube codebook for acoustic-toarticulatory inversion”, Journal of Acoustical Society of America, vol. 118, no. 1, pp. 444-460, 2005.
B. Potard, Y. Laprie and S. Ouni, “Incorporation of phonetic constraints in acoustic-to-articulatory inversion”, Journal of Acoustical Society of America, vol. 123, no. 4, pp. 2310-2323, 2008.
J. Hogden et al., “Accurate recovery of articulator positions from acoustics: new conclusions based on human data”, Journal of Acoustical Society of America, vol. 100, no. 3, pp. 1819-1834, 1996.
K. Richmond, S. King, and P. Taylor, “Modelling the uncertainty in recovering articulation from acoustics”, Computer, Speech & Language, vol. 17, pp. 153-172, 2003.
T. Toda, A. Black and K. Tokuda, “Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model”, Speech Communication, vol. 50, no. 3, pp. 215-227, 2008.
I. Ozbek, M. Hasegawa and M. Demirekler, “Estimation of articulatory trajectories based on gaussian mixture model (GMM) with audio-visual information fusion and dynamic Kalman smoothing”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1180-1195, 2011.
A. Sepulveda, R. Guido and G. Castellanos, “Estimation of relevant time-frequency features using Kendall coefficient for articulator position inference”, Speech Communication, vol. 55, no. 1, pp. 99-110, 2013.
A. Toutios and K. Margaritis, “Contribution to statistical acoustic-to-EMA mapping”, in 16th European Signal Processing Conference (EUSIPCO), Lausanne, Switzerland, 2008, pp. 1-5.
A. Sepulveda, J. Arias and G. Castellanos, “Acousticto-articulatory mapping of tongue position for voiced speech signals”, in 3rd Advanced Voice Function Assessment International Workshop (AVFA), Madrid, Spain, 2009, pp. 112.
¨ P. Ghosh and S. Narayanan, “A generalized smoothness criterion for acoustic-to-articulatory inversion”, Journal of Acoustical Society of America, vol. 128, no. 4, pp. 2162-2172, 2010.
S. Hiroya and T. Mochida, “Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs”, Speech Communication, vol. 48, no. 12, pp. 1677-1690, 2006.
P. Ghosh and S. Narayanan, “A subject-independent acoustic-to-articulatory inversion”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 4624– 4627.
A. Afshan and P. Ghosh, “Improved subjectindependent acoustic-to-articulatory inversion”, Speech Communication, vol. 66, pp. 1-16, 2015.
G. Papcun et al., “Inferring articulation and recognizing gestures from acoustics with a neural network trained on x–ray microbeam data”, Journal of Acoustical Society of America, vol. 92, no. 2, pp. 688-700, 1992.
K. Richmond, “Estimating articulatory parameters from the acoustic speech signal”, Ph.D. dissertation, The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK, 2002.
P. Jackson and V. Singampalli, “Statistical identification of articulation constraints in the production of speech”, Speech Communication, vol. 51, no. 8, pp. 695-710, 2009.
Z. Al-Bawab, “An analysis-by-synthesis approach to vocal tract modeling for robust speech recognition”, Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, USA, 2009.
S. Maeda, “A digital simulation method of the vocaltract system”, Speech Communication, vol. 1, no. 3-4, pp. 199-229, 1982.
T. Suzuki, M. Sugiyama, T. Kanamori and J. Sese, “Mutual information estimation reveals global associations between stimuli and biological processes”, BMC Bioinformatics, vol. 10, no. 1, 2009.
P. Maji, “f-information measures for efficient selection of discriminative genes from microarray data”, IEEE Transactions on Biomedical Engineering, vol. 56, no. 4, pp. 1063-1069, 2009.
A. Kain, M. Macon, “Spectral voice conversion for textto-speech synthesis”, in IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, 1998, pp. 285-288.
Copyright (c) 2015 Revista Facultad de Ingeniería Universidad de Antioquia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors can archive the pre-print version (i.e., the version prior to peer review) and post-print version (that is, the final version after peer review and layout process) on their personal website, institutional repository and / or thematic repository
- Upon acceptance of an article, it will be published online through the page https://revistas.udea.edu.co/index.php/ingenieria/issue/archive in PDF version with its correspondent DOI identifier
The Revista Facultad de Ingeniería -redin- encourages the Political Constitution of Colombia, chapter IV
Chapter IV Sanctions 51
The following shall be liable to imprisonment for two to five years and a fine of five to 20 times the legal minimum monthly wage: (1) any person who publishes an unpublished literary or artistic work, or part thereof, by any means, without the express prior authorization of the owner of rights; (2) any person who enters in the National Register of Copyright a literary, scientific or artistic work in the name of a person other than the true author, or with its title altered or deleted, or with its text altered, deformed, amended or distorted, or with a false mention of the name of the publisher or phonogram, film, videogram or software producer; (3) any person who in any way or by any means reproduces, disposes of, condenses, mutilates or otherwise transforms a literary, scientific or artistic work without the express prior authorization of the owners thereof; (4) any person who reproduces phonograms, videograms, software or cinematographic works without the express prior authorization of the owner, or transports, stores, stocks, distributes, imports, sells, offers for sale, acquires for sale or distribution or in any way deals in such reproductions. Paragraph. If either the material embodiment or title page of or the introduction to the literary work, phonogram, videogram, software or cinematographic work uses the name, business style, logotype or distinctive mark of the lawful owner of rights, the foregoing sanctions shall be increased by up to half.