Development of Feature Set, Classification Implementation and Applications for Vowel Migration/Modification in Sung Filipino (Tagalog) Texts and Perceived Intelligibility
Abstract
With the emergence of research on real-time visual feedback to supplement vocal pedagogy, the utilization of technology in the world of music is now seen to accelerate skills learning and enhance cognitive development. The researchers of this project aim to further analyze vowel intelligibility and develop software applications intended to be used not only by professional singers but also by individuals who wish to improve their singing capability.
Data in the form of sung vowels and song pieces were obtained from 46 singers. A Listening Test was then conducted on these samples to obtain the ground truth for vowel classification based on human perception. Simulation of the human auditory perception of sung Filipino vowels was performed using formant frequencies and Mel-frequency cepstral coefficients as feature vector inputs to a two-stage Discriminant Analysis classifier. The setup resulted in an over-all Training Set accuracy of 89.4% and an over-all Test Set accuracy of 90.9%. The accuracy of the classifier, measured in terms of the correspondence of vowel classifications obtained from the classifier with the results of the Listening Test, reached 92.3%. Using information obtained from the classifier, offline and online/real-time software applications were developed.
The main application features include the display of the spectral envelope and spectrogram, pitch and vibrato analysis and direct feedback on the classification of the sung vowel. These features were recommended by singers who were surveyed and were incorporated in the applications to aid singers to adjust formant locations, directly determine listener’s perception of sung vowels, perform modeling effectively and carry out vowel migration.