Open Access Open Access  Restricted Access Subscription Access

A Comprehensive Review on Audio Based Musical Instrument Recognition : Human-Machine Interaction towards Industry 4.0


Affiliations
1 Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi 835 215, Jharkhand, India
2 Department of Mathematics, Birla Institute of Technology, Mesra, Ranchi 835 215, Jharkhand, India
 

Over the last two decades, the application of machine technology has shifted from industrial to residential use. Further, advances in hardware and software sectors have led machine technology to its utmost application, the human-machine interaction, a multimodal communication. Multimodal communication refers to the integration of various modalities of information like speech, image, music, gesture, and facial expressions. Music is the non-verbal type of communication that humans often use to express their minds. Thus, Music Information Retrieval (MIR) has become a booming field of research and has gained a lot of interest from the academic community, music industry, and vast multimedia users. The problem in MIR is accessing and retrieving a specific type of music as demanded from the extensive music data. The most inherent problem in MIR is music classification. The essential MIR tasks are artist identification, genre classification, mood classification, music annotation, and instrument recognition. Among these, instrument recognition is a vital sub-task in MIR for various reasons, including retrieval of music information, sound source separation, and automatic music transcription. In recent past years, many researchers have reported different machine learning techniques for musical instrument recognition and proved some of them to be good ones. This article provides a systematic, comprehensive review of the advanced machine learning techniques used for musical instrument recognition. We have stressed on different audio feature descriptors of common choices of classifier learning used for musical instrument recognition. This review article emphasizes on the recent developments in music classification techniques and discusses a few associated future research problems.

Keywords

Classifier Learning, Feature Descriptors, Instrument Recognition, Multimodal Communication, Music Information Retrieval.
User
Notifications
Font Size

  • Shen J, Shepherd J, Cui B & Tan K-L, A novel framework for efficient automated singer identification in large music databases, ACM Trans Inf Syst, 27(3) (2009) 1–31.
  • Liu C-C & Huang C-S, A singer identification technique for content-based classification of MP3 music objects, Proc Int Conf Inf Knowl Mang (Chung Hua University) 2004, 506–511.
  • Li T, Ogihara M & Li Q, A comparative study on content-based music genre classification, Proc Int Res Dev Inf Retrieval (ACM Toronto, Canada) 2003, 282–289.
  • Zhang T, Automatic singer identification, Proc Int Conf Multimed Expo (ICME 2003) (IEEE) 2003, 33–36.
  • Li T & Ogihara M, Music artist style identification by semi supervised learning from both lyrics and content, Proc Int Conf Multimed (ACM New York, USA) 2004, 364–367.
  • Seipel F, Music Instrument Identification using Convolutional Neural Networks, Master Thesis, Technische Universitat, Berlin, 2018.
  • Marques J & Moreno P J, A study of musical instrument classification using Gaussian mixture models and support vector machines, Technical Report Series (Cambridge Research Laboratory) 1999, 1–21.
  • Giannoulis D & Klapuri A, Musical instrument recognition in polyphonic audio using missing feature approach IEEE/ACM Trans Audio Speech Language Process, 21(9) (2013) 1805–1817.
  • Deng J D, Simmermancher C & Cranefield S, A study on feature analysis for musical instrument classification, IEEE Trans Syst Man Cybern, Part B (Cybern.), 38(2) (2008) 429–438.
  • Livshin A&Rodet X, Purging musical instrument sample databases using automatic musical instrument recognition methods, IEEE Trans Audio Speech Language Process, 17(5) (2009) 1046–1051.
  • Peeters G, A Large set of audio features for sound description (similarity and classification) in the CUIDADO Project, IRCAM Technol Rep 2004, 1–25.
  • Weihs C, Ligges U, Morchen F & Mullensiefen D, Classification in music research, Adv Data Anal Classif, 1(3) (2007) 255–291.
  • Jordan M I & Mitchell T M, Machine learning: trends, perspectives, and prospects, Sci, 349(6245) (2015) 255–260.
  • LeCun Y, Bengio Y & Hinton G, Deep learning, Nature, 521(7553) (2015) 436–444.
  • Martin K D & Kim Y E, Musical instrument identification: A pattern-recognition approach, J Acoust Soc Am, 14(03) (1998) 1768–1768.
  • Agostini G, Longari M & Pollastri E, Musical instrument timbres classification with spectral features, EURASIP J Appl Signal Process, 1 (2003) 5–14.
  • Eronen A & Klapuri A, Musical instrument recognition using cepstral coefficients and temporal features, Proc ICASSPIEEE (Istanbul, Turkey) 2000, 753–756.
  • Fujinaga I & MacMillan K, Realtime recognition of orchestral instruments, Proc Int Comput Music Conf (International Computer Music Association, San Francisco) 2000, 141–143.
  • Eronen A, Comparison of features for musical instrument recognition, Proc IEEE Workshop Apps Signal Proc Audio Acoust (IEEE) 2001, 19–22.
  • Kaminskyj I & Czaszejko T, Automatic recognition of isolated monophonic musical instrument sounds using kNNC, J Intell Inf Syst, 24(2/3) (2005) 199–221.
  • Fu Z, Lu G, Ting K M & Zhang D, A survey of audio-based music classification and annotation, IEEE Trans Multimed, 13(2) (2011) 303–319.
  • Marques J, An Automatic Annotation System for Audio Data Containing Music, Master’s Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.
  • Essid S, Richard G & David B, Instrument recognition in polyphonic music based on automatic taxonomies, IEEE Trans Audio Speech Language Process, 14(1) (2006) 68–80.
  • Essid S, Richard G & David B, Musical instrument recognition by pairwise classification strategies, IEEE/ACM Trans Audio Speech Language Process, 14(4) (2006) 1401–1412.
  • Joder C, Essid S & Richard G, Temporal integration for audio classification with application to musical instrument classification, IEEE Trans Audio Speech Language Process, 17(1) (2009) 174–186.
  • Fuhrmann F, Haro M & Herrera P, Scalability, generality, and temporal aspects in automatic recognition of predominant musical instruments in polyphonic music, Proc ISMIR (International Society for Music Information Retrieval) 2009, 321–326.
  • Fuhrmann F & Herrera P, Polyphonic instrument recognition for exploring semantic similarities in music, Proc 13th Int Conf Digit Audio Effects (Graz, Austria) 2010, 1–8.
  • Ozbek M E, Ozkurt N & SavaciF A, Wavelet ridges for musical instrument classification, J Intell Inf Syst, 38(1) (2011) 241–256.
  • Wu J, Vincent E, Raczynski S A, Nishimoto T, Ono N & Sagayama S, Polyphonic pitch estimation & instrument identification by joint modelling of sustained and attack sounds, IEEE J Sel Top Signal Process, 5(6) (2011) 1124–1132.
  • Bosch J J, Janer J, Fuhrmann F & Herrera P, A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals, Proc Int Soc Music Inf Retrieval Conf (International Society for Music Information Retrieval) 2012, 559–564.
  • Duan Z, Pardo B & Daudet L, A novel cepstral representation for timbre modelling of sound sources in polyphonic mixtures, Proc IEEE Int Conf Acoust Speech Signal Process (IEEE) 2014, 7495–7499.
  • Yu L-F, Su L & Yang Y-H, Sparse cepstral codes and power scale for instrument identification, Proc IEEE Int Conf Acoust Speech Signal Process (IEEE) 2014, 7460–7464.
  • Lin C-C, Chen S-H, Truong T-K & Chang Y, Audio classification and categorization based on wavelets and support vector machine, IEEE Trans Speech Audio Process, 13(5) (2005) 644–651.
  • Mandel M & Ellis D, Song-level features and SVMs for music classification, Proc Int Conf Music Inf Retrieval, 5 (2005).
  • Kaminskyj I & Materka A, Automatic source identification of monophonic musical instrument sounds, Proc IEEE Int Conf Neural Netw, 1 (1995) 189–194.
  • Cemgil A T & Gurgen F, Classification of musical instrument sounds using neural networks, Proc of SIU97 (Bodrum, Turkey) 1997, 1–10.
  • Kostek B & Krolikowski R, Application of artificial neural networks to the recognition of musical sounds, Arch Acoust, 22(1) (1997) 27–50.
  • Kostek B & Czyzewski A, Representing musical instrument sounds for their automatic classification, J Audio Eng Soc, 49(9) (2001) 768–785.
  • Kostek B, Musical instrument classification and duet analysis employing music information retrieval techniques, Proc IEEE (JPROC), 92(4) (2004) 712–729.
  • Loughran R, Walker J, O’Farrell M & O’Neill M, The use of mel-frequency cepstral coefficients in musical instrument identification, Proc Int Comput Music Conf (ICMC) (Belfast, Northern Ireland) 2008, 24–29.
  • Newton M J & Smith L S, A neurally inspired musical instrument classification system based upon the sound Onset, J Acoust Soc Am, 131(6) (2012) 4785–4798.
  • Mikolov T, Karafiat M, Burget L, Cernocky J & Khudanpur S, Recurrent neural network-based language model, Proc Annu Conf Int Speech Commun Assoc (INTERSPEECH 2010) 2(3) (2010) 1045–1048.
  • Mesnil G, He X, Deng L & Bengio Y, Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding, Proc Annu Conf Int Speech Commun Assoc 2013, 3771–3775.
  • Yao K, Zweig G, Hwang M-Y, Shi Y & Yu D, Recurrent neural networks for language understanding, Proc Annu Conf Int Speech Commun Assoc 2013, 2524–2528.
  • Lecun Y, Bottou L, Bengio Y & Haffner P, Gradient-based learning applied to document recognition, Proc IEEE, 86(11) (1998) 2278–2324.
  • Lee H, Largman Y, Pham P & Ng A Y, Unsupervised feature learning for audio classification using convolutional deep belief networks, Proc Adv Neural Inf Process Syst, 22, 2009.
  • Han Y, Kim J & Lee K, Deep convolutional neural networks for predominant instrument recognition in polyphonic music, IEEE Trans Audio Speech Language Process, 25(1) (2016) 208–221.
  • Gururani S, Summers C & Lerch A, Instrument activity detection in polyphonic music using deep neural networks, Proc Int Soc Music Inf Retrieval Conf (ISMIR) (Paris, France) 2018, 569–576.
  • Gomez J S, AbeBer J & Cano E, Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning, Int Soc Music Inf Retrieval Conf (ISMIR) (Paris, France) 2018, 577–584.
  • Yu D, Duan H, Fang J & Zeng B, Predominant instrument recognition based on deep neural network with auxiliary classification, IEEE/ACM Trans Audio Speech Language Process, 28 (2020) 852–861.
  • Eronen A, Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs, Proc 7th Int Symp Signal Process Appl, 2 (2003) 133–136.
  • Xiong Z, Radhakrishnan R, Divakaran A & Huang T, Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification, Proc IEEE Int Conf Multimed Expo, 3 (2003) 397–400.
  • Kitahara T, Goto M, Komatani K, Ogata T & Okuno H G, Musical instrument recognizer “instrogram” and its application to music retrieval based on instrumentation similarity, Proc IEEE Int Symp Multimed (IEEE) 2006, 265–274.
  • Eichner M, Wolff M & Hoffmann R, Instrument classification using hidden Markov models, Proc ISMIR 2006, 349–350.
  • Zlatintsi A & Maragos P, Multiscale fractal analysis of musical instrument signals with application to recognition, IEEE Trans Audio Speech Language Process, 21(4) (2013) 737–748.
  • Partridge M & Jabri M, Robust principal component analysis, Proc IEEE Signal Process Soc Workshop (University of Sydney) 2000, 289–298.
  • MacQueen J, Some methods for classification and analysis of multivariate observations, Proc Symp Math Statist Probability (5th Berkeley Symposium) 1(14), 1967, 281–297.
  • Krishna A G & Sreenivas T V, Music instrument recognition: From isolated notes to solo phrases, Proc IEEE Int Conf Acoust Speech Signal Process 4 (2004) IV-265–IV-268.
  • Essid S, Richard G & David B, Efficient musical instrument recognition on solo performance music using basic features, Proc AES 25th Int Conf (London, UK) 2004, 89–93.
  • Virtanen T & Klapuri A, Analysis of polyphonic audio using source-filter model and nonnegative matrix factorization, Adv in Models for Acoust Process (Neural Inf Process Syst Workshop), 18 (2006).
  • Burred J J, From Sparse Models to Timbre Learning: New Methods for Musical Source Separation, Ph D Thesis, Technical University of Berlin, Berlin, 2008.
  • Heittola T, Klapuri A & Virtanen T, Musical instrument recognition in polyphonic audio using source-filter model for sound separation, Proc Int Soc Music Inf Retrieval Conf (Tampere University of Technology) 2009, 327–332.
  • Diment A, Rajan P, Heittola T & Virtanen T, “Modified group delay feature for musical instrument recognition,” in Proc. Int Symp Comput Music Multidiscip Res (Marseille, France) 2013, 431–438.
  • Eronen A, Automatic musical instrument recognition, MS Thesis, Tampere University of Technology, Tampere, Finland, 2001.
  • Brown J C, Houix O & McAdams S, Feature dependence in the automatic identification of musical woodwind instruments, J Acoust Soc Am, 109 (2001) 1064–1072.
  • Garcia J, Barbedo A & Tzanetakis G, Musical instrument classification using individual partials, IEEE Trans Audio Speech Language Process, 19(1) (2011) 111–122.
  • Vatolkin I & Rudolph G, Comparison of audio features for recognition of western and ethnic instruments in polyphonic mixtures, Proc Int Soc Music Inf Retrieval Conf (Paris, France ) 2018, 554–560.
  • Lee D D & Seung H S, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999) 788–791.
  • Lee D D & Seung H S, Algorithms for non-negative matrix factorization, Adv in Neural Inf Process Syst, 13 (2001) 556–562.
  • Dittmar C & Uhle C, Further steps towards drum transcription of polyphonic music, Proc Audio Eng Soc (Audio Engineering Society, Berlin, Germany) 2004, 1–8.
  • Kitahara T, Goto M, Komatani K, Ogata T & Okuno H G, Instrument identification in polyphonic music: Feature weighting to minimize influence of sound overlaps, EURASIP J Appl Signal Process, (1) (2007) 155–155.
  • West K, Novel techniques for audio music classification and search, Ph D Thesis, University of East Anglia, Norwich, U.K., 2008.
  • Livshin A, Peeters G & Rodet X, Studies, and improvements in automatic classification of musical sound samples, Proc Int Conf Comput Music (ICMC) (Singapour, Singapore) 2003, 220–227.
  • Livshin A & Rodet X, The importance of cross database evaluation in musical instrument sound classification: A critical approach, Proc Int Symp Music Inf Retrieval (ISMIR) 2003, 241–242.
  • Livshin A & Rodet X, Musical instrument identification in continuous recordings, Proc Int Conf Digital Audio Effects (DAFX-04) (Naples, Italie) 2004, 222–227.
  • Livshin A & Rodet X, The significance of the non-harmonic “Noise” versus the harmonic series for musical instrument recognition, Proc Int Symp Music Inf Retrieval (ISMIR) (NA, France) 2006, 95–100.
  • Bhalke D G, Rama Rao C B & Bormane D S, Automatic musical instrument classification using fractional Fourier transform based MFCC features and counter propagation neural network, J Intell Inf Syst, 46 (2016) 425–446.
  • Mierswa I & Morik K, Automatic feature extraction for classifying audio data, Mach Learn, 58 (2005) 127–149.
  • Slaney M, Weinberger K & White W, Learning a metric for music similarity, Proc Int Conf Music Inf Retrieval, 148 (2008).

Abstract Views: 150

PDF Views: 99




  • A Comprehensive Review on Audio Based Musical Instrument Recognition : Human-Machine Interaction towards Industry 4.0

Abstract Views: 150  |  PDF Views: 99

Authors

Sukanta Kumar Dash
Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi 835 215, Jharkhand, India
S S Solanki
Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi 835 215, Jharkhand, India
Soubhik Chakraborty
Department of Mathematics, Birla Institute of Technology, Mesra, Ranchi 835 215, Jharkhand, India

Abstract


Over the last two decades, the application of machine technology has shifted from industrial to residential use. Further, advances in hardware and software sectors have led machine technology to its utmost application, the human-machine interaction, a multimodal communication. Multimodal communication refers to the integration of various modalities of information like speech, image, music, gesture, and facial expressions. Music is the non-verbal type of communication that humans often use to express their minds. Thus, Music Information Retrieval (MIR) has become a booming field of research and has gained a lot of interest from the academic community, music industry, and vast multimedia users. The problem in MIR is accessing and retrieving a specific type of music as demanded from the extensive music data. The most inherent problem in MIR is music classification. The essential MIR tasks are artist identification, genre classification, mood classification, music annotation, and instrument recognition. Among these, instrument recognition is a vital sub-task in MIR for various reasons, including retrieval of music information, sound source separation, and automatic music transcription. In recent past years, many researchers have reported different machine learning techniques for musical instrument recognition and proved some of them to be good ones. This article provides a systematic, comprehensive review of the advanced machine learning techniques used for musical instrument recognition. We have stressed on different audio feature descriptors of common choices of classifier learning used for musical instrument recognition. This review article emphasizes on the recent developments in music classification techniques and discusses a few associated future research problems.

Keywords


Classifier Learning, Feature Descriptors, Instrument Recognition, Multimodal Communication, Music Information Retrieval.

References