Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Methods for Audio Classification & Segmentation


Affiliations
1 COMP Engineering, PDEA, COEH College of Engineering, Pune, India
2 E&TC Engineering, Sinhgad College of Engineering, Pune, India
     

   Subscribe/Renew Journal


This paper describes the work done on the development of an audio segmentation and classification system. Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. Many existing works on audio classification deal with the problem of classifying known homogeneous audio segments. In this work, audio recordings are divided into acoustically similar regions and classified into basic audio types such as speech, music or silence. Audio features used in this paper include real Cepstral coefficients, Linear predictive cepstral coefficients, result Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate and Short Term Energy (STE) to get 100% result. These features were extracted from audio files that were stored in a WAV format. Possible use of features, which are extracted directly from MPEG audio files, is also considered. Statistical based methods are used to segment and classify audio signals using these features. The classification methods used include the General Mixture Model (GMM) and the k-means algorithms. It is shown that the system implemented achieves an accuracy rate of more than 95% for discrete audio classification.

Keywords

Audio Content Analysis, Segmentation, Classification, GMM, 'k' Means, MFCC, ZCR, STE and MPEG.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 239

PDF Views: 2




  • Methods for Audio Classification & Segmentation

Abstract Views: 239  |  PDF Views: 2

Authors

Madhuri P. Borawake
COMP Engineering, PDEA, COEH College of Engineering, Pune, India
Rameshwar Kawitkar
E&TC Engineering, Sinhgad College of Engineering, Pune, India

Abstract


This paper describes the work done on the development of an audio segmentation and classification system. Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. Many existing works on audio classification deal with the problem of classifying known homogeneous audio segments. In this work, audio recordings are divided into acoustically similar regions and classified into basic audio types such as speech, music or silence. Audio features used in this paper include real Cepstral coefficients, Linear predictive cepstral coefficients, result Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate and Short Term Energy (STE) to get 100% result. These features were extracted from audio files that were stored in a WAV format. Possible use of features, which are extracted directly from MPEG audio files, is also considered. Statistical based methods are used to segment and classify audio signals using these features. The classification methods used include the General Mixture Model (GMM) and the k-means algorithms. It is shown that the system implemented achieves an accuracy rate of more than 95% for discrete audio classification.

Keywords


Audio Content Analysis, Segmentation, Classification, GMM, 'k' Means, MFCC, ZCR, STE and MPEG.