Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Performance of Classifier Algorithms for Different Text Representations


Affiliations
1 Program in Computer Science, DAU, Indore, India
2 Devi Ahilya Vishwa Vidyalaya, Indore, India
3 EKlat-Research, Pune, India
     

   Subscribe/Renew Journal


Text representation has a strong impact on the performance of text classification system. Text representation with high and redundant number of features, noisy and irrelevant features often increases training and classification time of text classification system. It also reduces accuracy of system. An appropriate text representation with properly extracted or selected features may lead to high accuracy.
Our paper provides brief overview of popular text representation techniques along with the analysis of performance of three major text classifiers against the three popular text representations of vector space model, graph based model and NMF based model in the multi label setting. We are also proposing mltcNMF, feature extraction algorithm based on non negative matrix factorization approach in the high dimensional data space. We conducted set of experiments to make comprehensive evaluation of the effects of these text representation approaches using multi label datasets and also measured classification performance of our new algorithm. Our empirical study shows that use of appropriate feature selection strategy in text representation can significantly improves performance of text classification system.

Keywords

Text Classification, Vector Space Model, NMF, Multi Label Text Classification.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 260

PDF Views: 2




  • Analysis of Performance of Classifier Algorithms for Different Text Representations

Abstract Views: 260  |  PDF Views: 2

Authors

Shweta Dharmadhikari
Program in Computer Science, DAU, Indore, India
Maya Ingale
Devi Ahilya Vishwa Vidyalaya, Indore, India
Parag Kulkarni
EKlat-Research, Pune, India

Abstract


Text representation has a strong impact on the performance of text classification system. Text representation with high and redundant number of features, noisy and irrelevant features often increases training and classification time of text classification system. It also reduces accuracy of system. An appropriate text representation with properly extracted or selected features may lead to high accuracy.
Our paper provides brief overview of popular text representation techniques along with the analysis of performance of three major text classifiers against the three popular text representations of vector space model, graph based model and NMF based model in the multi label setting. We are also proposing mltcNMF, feature extraction algorithm based on non negative matrix factorization approach in the high dimensional data space. We conducted set of experiments to make comprehensive evaluation of the effects of these text representation approaches using multi label datasets and also measured classification performance of our new algorithm. Our empirical study shows that use of appropriate feature selection strategy in text representation can significantly improves performance of text classification system.

Keywords


Text Classification, Vector Space Model, NMF, Multi Label Text Classification.