Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Spoken English Digit Classification Using Supervised Learning


Affiliations
1 Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
     

   Subscribe/Renew Journal


Multiclass classification is a fundamental problem for many speech recognition systems. Spoken digit recognition is a multiclass problem of 10 classes. Present paper using Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN) and Ensemble method i.e., Random Forest (RF) to English digit classification. Caffe speech dataset of 2400 input instances (15 speakers*16 repetitions*10 digits) used for experiments. Mel Frequency Cepstral Coefficients (MFCC) features are formed for all input instances. The dataset is divided into training set and testing set with 10%, 30% and 50% of dataset as testing set. Confusion matrices generated with all test cases for all classification methods. Performance of Ensemble method is high compared to SVM and KNN at different number of frames. The highest accuracy achieved by RF method is 97.50% by taking 10% testing data.

Keywords

Caffe, Ensemble Methods, KNN, MFCC, Random Forest (RF), Spoken English Digit, SVM.
User
Subscription Login to verify subscription
Notifications
Font Size

  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678, ACM, 2014.
  • I. A. Lawal, “Spoken character classification using abductive network,” International Journal of Speech Technology, vol. 20, no. 4, pp. 881-890, 2017.
  • G. Muhammad, Y. A. Alotaibi, and M. N. Huda, “Automatic speech recognition for Bangla digits,” 2009 12th International Conference on Computers and Information Technology (ICCIT’09), IEEE, 2009.
  • Z. Ali, A. W. Abbas, T. M. Thasleema, B. Uddin, T. Raaz, and S. A. R. Abid, “Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN,” International Journal of Speech Technology, vol. 18, no. 2, pp. 271-275, 2015.
  • D. F. Silva, V. M. A. de Souza, G. E. A. P. A. Batista, and R. Giusti, “Spoken digit recognition in Portuguese using line spectral frequencies,” Ibero-American Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, 2012.
  • I. Bazzi, and D. Katabi, “Using support vector machines for spoken digit recognition,” Sixth International Conference on Spoken Language Processing, 2000.
  • J. V. Doremalen, and L. Boves, “Spoken digit recognition using a hierarchical temporal memory,” Ninth Annual Conference of the International Speech Communication Association, 2008.
  • F. Diaz, J. M Ferrández, P. Gomez, and V. Rodellar, “Spoken-digit recognition using self-organizing maps with perceptual pre-processing,” International Work-Conference on Artificial Neural Networks, Springer, Berlin, Heidelberg, 1997.
  • T. Kitamura, S. Ando, and E. Hayahara, “Speaker-independent spoken digit recognition in noisy environments using dynamic spectral features and neural networks,” Second International Conference on Spoken Language Processing, 1992.
  • N. Hammami, and M. Sellam, “Tree distribution classifier for automatic spoken Arabic digit recognition,” 2009 International Conference for Internet Technology and Secured Transactions (ICITST 2009), IEEE, 2009.
  • B. Logan, “Mel frequency cepstral coefficients for music modeling,” ISMIR, vol. 270, 2000.
  • A. Geron, Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, Inc, 2017.
  • N. Scaringella, G. Zoia, and D. Mlynek, “Automatic genre classification of music content: A survey,” IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 133-141, March 2006.
  • C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
  • V. Vapnik, The Nature of Statistical Learning Theory, Springer Science and Business Media, 2013.
  • B. Scholkopf, C. J. C. Burges, and A. J. Smola, “Advances in kernel methods: Support vector machines,” 1998.
  • Z. Jan, M. Abrar, S. Bashir, and A. M. Mirza, “Seasonal to inter-annual climate prediction using data mining KNN technique,” International Multi Topic Conference, Springer, Berlin, Heidelberg, 2008.
  • L.-Y. Hu, and M.-W. Huang “The distance function effect on k-nearest neighbor classification for medical datasets,” SpringerPlus, vol. 5, no. 1, p. 1304, 2016.
  • L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
  • M. Khondoker, R. Dobson, C. Skirrow, A. Simmons, and D. Stahl, “A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies,” Statistical Methods in Medical Research, vol. 25, no. 5, pp. 1804-1823, 2016.

Abstract Views: 663

PDF Views: 0




  • Spoken English Digit Classification Using Supervised Learning

Abstract Views: 663  |  PDF Views: 0

Authors

Maddimsetti Srinivas
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
Kasiprasad Mannepalli
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
G. L. P. Ashok
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India

Abstract


Multiclass classification is a fundamental problem for many speech recognition systems. Spoken digit recognition is a multiclass problem of 10 classes. Present paper using Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN) and Ensemble method i.e., Random Forest (RF) to English digit classification. Caffe speech dataset of 2400 input instances (15 speakers*16 repetitions*10 digits) used for experiments. Mel Frequency Cepstral Coefficients (MFCC) features are formed for all input instances. The dataset is divided into training set and testing set with 10%, 30% and 50% of dataset as testing set. Confusion matrices generated with all test cases for all classification methods. Performance of Ensemble method is high compared to SVM and KNN at different number of frames. The highest accuracy achieved by RF method is 97.50% by taking 10% testing data.

Keywords


Caffe, Ensemble Methods, KNN, MFCC, Random Forest (RF), Spoken English Digit, SVM.

References