Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Assamese Connected Digit Recognition System


Affiliations
1 Department of Electronics and Communication Engineering, Gauhati University Institute of Science and Technology, Gauhati University, Guwahati, Assam, India
     

   Subscribe/Renew Journal


In this work, we present the development of a connected digit recognition system in Assamese language. Assamese is an under-resourced language of North-East India that is widely spoken in the state of Assam. The text corpus used in this work, consists of a sequence 7 digits spoken in continuous manner. In order to capture the variations in phonetic context, the sequence of digits were arranged in such a way that, each digit occur in all the 7 positions. The speech corpus used in this work was collected from 11 native Assamese speakers out of which 5 were female while 6 were male. Mel Frequency Cepstral Coefficient (MFCC) features have been used as front-end features. We have explored the Subspace Gaussian Mixture Model (SGMM) based acoustic modeling approach in addition to the Gaussian Mixture Model (GMM) within the Hidden Markov Model (HMM) framework. Accuracies of 95.7% and 95.9% are achieved in GMM-HMM and SGMM-HMM systems respectively.

Keywords

Assamese Language, Digit Recognition, SGMM-HMM.
User
Subscription Login to verify subscription
Notifications
Font Size

  • D. C. Wyld, B. Saxena, and C. Wahi, “Hindi digits recognition system on speech data collected in different natural noise environments,” Computer Science and Information Technology, vol. 5, pp. 23-30, 2015. DOI: 10.5121/csit.2015.50303.
  • C. Kurian, and K. Balakrishnan, “Connected digit speech recognition system for Malayalam language,” Sadhana, vol. 38, no. 6, December 2013. DOI: 10.1007/s12046-013-0160-2.
  • S. Karpagavalli, R. Deepika, P. Kokila, K. U. Rani, and E. Chandra, “Isolated Tamil digit speech recognition using template-based and HMM-based approaches,” in P. V. Krishna, M. R. Babu, and E. Ariwa, (eds.), Global Trends in Information Systems and Software Applications, ObCom 2011. Communications in Computer and Information Science, vol. 270, Springer, Berlin, Heidelberg, 2012.
  • D. S. Kulkarni, R. R. Deshmukh, V. J. L. Patil, P. P. Shrishrimal, S. D. Waghmare, and A. M. Oirere, “Marathi isolated digit recognition system using HTK,” IJCA Proceedings on International Conference on Cognitive Knowledge Engineering, ICKE, vol. 2016, no. 2, pp. 42-45, January 2018.
  • B. D. Sarma, A. Dey, W. Lalhminghlui, P. Sarma, and S. R. M. Prasanna, “Robust Mizo digit recognition using data augmentation and tonal information,” 9th International Conference on Speech Prosody, Poland, 13-16 June 2018.
  • H. Sarma, N. Saharia, and U. Sharma, “Development and analysis of speech recognition systems for Assamese language using HTK,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 17, no. 1, pp. 7:1-7:14, November 2017. DOI: 10.1145/3137055.
  • M. Sarma, K. Dutta, and K. K. Sarma, “Assamese numeral corpus for speech recognition using cooperative ANN architecture,” International Journal of Electrical and Electronics Engineering, vol. 3, no. 8, pp. 456-465, 2009.
  • B. D. Sarma, M. Sarma, M. Sarma and S. R. M. Prasanna, “Development of Assamese phonetic engine: Some issues,” 2013 Annual IEEE India Conference (INDICON), pp. 1-6, Mumbai, India, 2013. DOI: 10.1109/INDCON.2013.6725966.
  • S. Shahnawazuddin, D. Thotappa, B. D. Sarma, A. Deka, S. R. M. Prasanna and R. Sinha, “Assamese spoken query system to access the price of agricultural commodities,” 2013 National Conference on Communications (NCC), pp. 1-5, New Delhi, India, 2013. DOI: 10.1109/NCC.2013.6488011.
  • A. Dey, S. Shahnawazuddin, Deepak K. T., S. Imani, S. R. M. Prasanna, and R. Sinha, “Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries,” 2016 Twenty Second National Conference on Communication (NCC), pp. 1-6, Guwahati, India, 2016. DOI: 10.1109/NCC.2016.7561193.
  • S. Shahnawazuddin, D. Thotappa, A. Dey, S. Imani, S. R. M. Prasanna, and R. Sinha, “Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling,” Journal of Signal Processing Systems, vol. 88, no. 1, pp. 91-102, 2017.
  • https://en.wikipedia.org/wiki/Assamese_language
  • Indian Language Speech sound Label set (ILSL12) (Version 2.1.6). Available: https://www.iitm.ac.in/donlab/tts/downloads/cls/cls v2.1.6.pdf
  • A. Y. Mon, W. Pa, and Y. K. Thu, “Building HMM-SGMM continuous automatic speech recognition on Myanmar web news,” Proc. of 15th International Conference on Computer Applications (ICCA 2017), pp. 446-453, 2017.
  • D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, ….. K. Vesely, “The Kaldi speech recognition toolkit,” in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011), p. 4, IEEE Signal Processing Society, Hawaïï, USA, 11-15 December 2011.
  • http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mf ccs/

Abstract Views: 329

PDF Views: 0




  • Assamese Connected Digit Recognition System

Abstract Views: 329  |  PDF Views: 0

Authors

Barsha Deka
Department of Electronics and Communication Engineering, Gauhati University Institute of Science and Technology, Gauhati University, Guwahati, Assam, India
Abhishek Dey
Department of Electronics and Communication Engineering, Gauhati University Institute of Science and Technology, Gauhati University, Guwahati, Assam, India
S. R. Nirmala
Department of Electronics and Communication Engineering, Gauhati University Institute of Science and Technology, Gauhati University, Guwahati, Assam, India

Abstract


In this work, we present the development of a connected digit recognition system in Assamese language. Assamese is an under-resourced language of North-East India that is widely spoken in the state of Assam. The text corpus used in this work, consists of a sequence 7 digits spoken in continuous manner. In order to capture the variations in phonetic context, the sequence of digits were arranged in such a way that, each digit occur in all the 7 positions. The speech corpus used in this work was collected from 11 native Assamese speakers out of which 5 were female while 6 were male. Mel Frequency Cepstral Coefficient (MFCC) features have been used as front-end features. We have explored the Subspace Gaussian Mixture Model (SGMM) based acoustic modeling approach in addition to the Gaussian Mixture Model (GMM) within the Hidden Markov Model (HMM) framework. Accuracies of 95.7% and 95.9% are achieved in GMM-HMM and SGMM-HMM systems respectively.

Keywords


Assamese Language, Digit Recognition, SGMM-HMM.

References