Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Spoken English Digit Classification Using Supervised Learning


Affiliations
1 Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
     

   Subscribe/Renew Journal


Multiclass classification is a fundamental problem for many speech recognition systems. Spoken digit recognition is a multiclass problem of 10 classes. Present paper using Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN) and Ensemble method i.e., Random Forest (RF) to English digit classification. Caffe speech dataset of 2400 input instances (15 speakers*16 repetitions*10 digits) used for experiments. Mel Frequency Cepstral Coefficients (MFCC) features are formed for all input instances. The dataset is divided into training set and testing set with 10%, 30% and 50% of dataset as testing set. Confusion matrices generated with all test cases for all classification methods. Performance of Ensemble method is high compared to SVM and KNN at different number of frames. The highest accuracy achieved by RF method is 97.50% by taking 10% testing data.

Keywords

Caffe, Ensemble Methods, KNN, MFCC, Random Forest (RF), Spoken English Digit, SVM.
User
Subscription Login to verify subscription
Notifications
Font Size


  • Spoken English Digit Classification Using Supervised Learning

Abstract Views: 757  |  PDF Views: 0

Authors

Maddimsetti Srinivas
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
Kasiprasad Mannepalli
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India
G. L. P. Ashok
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India

Abstract


Multiclass classification is a fundamental problem for many speech recognition systems. Spoken digit recognition is a multiclass problem of 10 classes. Present paper using Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN) and Ensemble method i.e., Random Forest (RF) to English digit classification. Caffe speech dataset of 2400 input instances (15 speakers*16 repetitions*10 digits) used for experiments. Mel Frequency Cepstral Coefficients (MFCC) features are formed for all input instances. The dataset is divided into training set and testing set with 10%, 30% and 50% of dataset as testing set. Confusion matrices generated with all test cases for all classification methods. Performance of Ensemble method is high compared to SVM and KNN at different number of frames. The highest accuracy achieved by RF method is 97.50% by taking 10% testing data.

Keywords


Caffe, Ensemble Methods, KNN, MFCC, Random Forest (RF), Spoken English Digit, SVM.

References