Author Details

Text representation has a strong impact on the performance of text classification system. Text representation with high and redundant number of features, noisy and irrelevant features often increases training and classification time of text classification system. It also reduces accuracy of system. An appropriate text representation with properly extracted or selected features may lead to high accuracy.
Our paper provides brief overview of popular text representation techniques along with the analysis of performance of three major text classifiers against the three popular text representations of vector space model, graph based model and NMF based model in the multi label setting. We are also proposing mltcNMF, feature extraction algorithm based on non negative matrix factorization approach in the high dimensional data space. We conducted set of experiments to make comprehensive evaluation of the effects of these text representation approaches using multi label datasets and also measured classification performance of our new algorithm. Our empirical study shows that use of appropriate feature selection strategy in text representation can significantly improves performance of text classification system.

Keywords

Text Classification, Vector Space Model, NMF, Multi Label Text Classification.

Full Text

Username
Password
Remember me