Open Access Open Access  Restricted Access Subscription Access

Identification of the Major Language Families of India and Evaluation of their Mutual Influence


Affiliations
1 Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur 721 302, India
 

A language family is a group of languages which have descended from a common mother language. Since the ancestor is common, these languages are expected to be similar in some respect and manifest the similarity in scientific experiments. In language identification, language-specific features are extracted from speech and a model is created which represents the language. This work extends the language identification framework to capture features common to language families and create models which can efficiently represent the language families. Mel frequency cepstral coefficient (MFCC) and speech signal-based frequency cepstral coefficient (SFCC) are used as primary feature extraction tools. A combination of these along with shifted delta coefficient (SDC) gives the final set of features. The work uses Gaussian mixture model (GMM) and support vector machines (SVM) as modelling tools. Different combinations of these feature extraction and modelling techniques are used to get four different systems: MFCC + SDC + GMM, SFCC + SDC + GMM, MFCC + SDC + SVM and SFCC + SDC + SVM. Experiments with these systems show that the language families can be identified with reasonable accuracy. Further, the work tests the influence of one language family on the other and finds that in most cases, the languages which are spoken in areas lying on the boundary of two families are more influenced by the other family. A deviation from it can relate to geopolitical isolation of two neighbouring regions and thus can give new insights or corroborate investigations of historians.

Keywords

Feature Extraction, Language Family, Modeling Techniques, Mutual Influence.
User
Notifications
Font Size

  • Ishtiaq, M., Language Shifts Among the Scheduled Tribes in India: A Geographical Study, Motilal Banarsidass Publ., 1999.
  • CIA World Factbook; https://www.cia.gov.
  • Ethnologue: Languages of the World; http://www.ethnologue.com
  • Encyclopedia Britannica; http://www.britannica.com
  • Zissman, M. A., Automatic language identification of telephone speech. Lincoln Lab. J., 1995, 8(2), 115-144.
  • Torres-Carrasquillo, P. A., Reynolds, D. A. and Deller Jr, J. R., Language identification using Gaussian mixture model tokenization. In International Conference Spoken Language Processing, Denver, Colorado, United States, September 2002.
  • Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A. and Deller Jr, J. R.. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. International Conference Spoken Language Processing, Denver, Colorado, United States, 2002.
  • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E. and Torres-Carrasquillo, P. A., Support vector machines for speaker and language recognition. Computer Speech and Language, Elsevier, 2006, pp. 210-229.
  • Li, H., Ma, B. and Lee, K. A., Spoken language recognition: from fundamentals to practice. Spoken Language Recogn., Proc. IEEE, December 2012.
  • Davis, S. B. and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech Signal Process., 1980, 28(4), 357-366.
  • Vergin, R., Shaughnessy, D. O. and Farhat, A., Generalized mel frequency cepstral coefficients for large-vocabulary speakerindependent continuous-speech recognition. IEEE Trans. Speech Audio Process., 1999, 7(5), 525-532.
  • Paliwal, K., Shannon, B., Lyons, J. and Wojcicki, K., Speechsignalbased frequency warping. IEEE Signal Process. Lett., 2009, 16(4), 319-322.
  • Matejka, P., Burget, L., Schwarz, P. and Cernocky, J., Brno University of Technology System for NIST 2005 Language Recognition Evaluation. In IEEE Odyssey - The Speaker and Language Recognition Workshop, San Juan, Puerto Rico, 28-30 June 2006.
  • Kohler, M. A. and Kennedy, M., Language Identification Using Shifted Delta Cepstra, In Circuits and Systems Conference, IEEE, 4-7 August 2002.
  • Reynolds, D. A. and Rose, R. C., Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process., 1995, 3(1), 72-83.
  • Haykin, S., Neural Networks and Learning Machines, Pearson Education Inc., 2011.
  • Campbell, W. M., Sturim, D. E. and Reynolds, D. A., Support vector machines using GMM supervectors for speaker verification.IEEE Signal Process. Lett., 2006, 13(5), 308-311.
  • Reynolds, D. A., Quatieri, T. F. and Dunn, R. B., Speaker verification using adapted Gaussian mixture models. Digital Signal Process., 2000, 10(1-3), 19-41.
  • Prasar Bharati; http://newsonair.nic.in
  • Majumdar, R. C., Raychaudhuri, H. C. and Datta, K., An Advanced History of India, Macmillan, 1946.
  • http://orissa.gov.in
  • http://chhattisgarh.nic.in
  • Diffie, B. W. and Winius, G. D., Foundations of the Portuguese Empire, 1415-1850, University of Minnesota Press, Minnesota Archive Editions, 1977.
  • Shastry, B. S. and Borges, C. J., Goa-Kanara Portuguese Relations, 1498-1763, The Xavier Centre of Historical Research, 2000.
  • http://www.goakonkaniakademi.org
  • http://www.mu.ac.in
  • Ravindran, P. N.. Black Pepper: Piper Nigrum, CRC Press, 2004.
  • Curtin, P. D., Cross-Cultural Trade in World History, Cambridge University Press, 1984.
  • Mathias Mundadan, A., From the Beginning up to the Middle of the Sixteenth Century (up to 1542) (History of Christianity in India), Church History Association of India, 1989.
  • http://www.aponline.gov.in

Abstract Views: 338

PDF Views: 124




  • Identification of the Major Language Families of India and Evaluation of their Mutual Influence

Abstract Views: 338  |  PDF Views: 124

Authors

Debapriya Sengupta
Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur 721 302, India
Goutam Saha
Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur 721 302, India

Abstract


A language family is a group of languages which have descended from a common mother language. Since the ancestor is common, these languages are expected to be similar in some respect and manifest the similarity in scientific experiments. In language identification, language-specific features are extracted from speech and a model is created which represents the language. This work extends the language identification framework to capture features common to language families and create models which can efficiently represent the language families. Mel frequency cepstral coefficient (MFCC) and speech signal-based frequency cepstral coefficient (SFCC) are used as primary feature extraction tools. A combination of these along with shifted delta coefficient (SDC) gives the final set of features. The work uses Gaussian mixture model (GMM) and support vector machines (SVM) as modelling tools. Different combinations of these feature extraction and modelling techniques are used to get four different systems: MFCC + SDC + GMM, SFCC + SDC + GMM, MFCC + SDC + SVM and SFCC + SDC + SVM. Experiments with these systems show that the language families can be identified with reasonable accuracy. Further, the work tests the influence of one language family on the other and finds that in most cases, the languages which are spoken in areas lying on the boundary of two families are more influenced by the other family. A deviation from it can relate to geopolitical isolation of two neighbouring regions and thus can give new insights or corroborate investigations of historians.

Keywords


Feature Extraction, Language Family, Modeling Techniques, Mutual Influence.

References





DOI: https://doi.org/10.18520/cs%2Fv110%2Fi4%2F667-681