High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

S. Chinna Gopi; B. Suvarna; T. Maruthi Padmaja

doi:10.17485/ijst/2016/v9i30/129685

High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

S. Chinna Gopi ¹, B. Suvarna ², T. Maruthi Padmaja ²

Affiliations
1 Vijaya Institute of Technology for Women (VITW), Enikepadu, Vijayawada - 521108, Andhra Pradesh, India
2 VFSTR University, Vadlamudi, Guntur - 522213, Andhra Pradesh, India

Abstract
References
Article Metrics
Refbacks

Background/Objectives: It is well known that the performance of the classification models prone to the class imbalance problem. The class imbalance problem occurs when one class of data severely outnumbers the other classes of data. The classification models learned on Support Vector Machines (SVM) are quite prominent in exhibiting better generalization abilities even in the context of the class imbalance problem. However, it is proved that the high imbalance ratio hinders SVM learning performance. With this concern, this paper presents an empirical study on the viability of SVM in the context of feature selection from moderately and highly unbalanced datasets. Methods/Statistical Analysis: The Support Vector Machine-Recursive Feature Elimination (SVM-RFE) wrapper feature selection is analyzed in this study and its performance on one document analysis and two biomedical unbalanced datasets is compared with two prominent feature selection methods like Chi-Square (CHI) test and Information Gain (IG) using Decision Tree and Naive Bayes classification models. Findings: From this empirical study two major identifications are reported: 1. For the considered scenarios, classification models learned on IG and CHI test are better performed than SVM-RFE feature selection of high class imbalance setting. 2. The SVM-RFE on rebalanced data yielded better performance than SVM-RFE on original data. Application/Improvements: Considered feature selection methods, including SVM-RFE yielded better performance on oversampled data than SVM-RFE on original data. Overall, this study reports models learned on Decision Tree exhibited better performance than the models learned on Naïve Bayes classifier.

Keywords

Class Imbalance Problem, Chi-Square, Information Gain, Support Vector Machine, SVM-RFE.

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Abstract Views: 172

PDF Views: 0

High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

Abstract Views: 172 | PDF Views: 0

Authors

S. Chinna Gopi
Vijaya Institute of Technology for Women (VITW), Enikepadu, Vijayawada - 521108, Andhra Pradesh, India

B. Suvarna
VFSTR University, Vadlamudi, Guntur - 522213, Andhra Pradesh, India

T. Maruthi Padmaja
VFSTR University, Vadlamudi, Guntur - 522213, Andhra Pradesh, India

Abstract

Keywords

Class Imbalance Problem, Chi-Square, Information Gain, Support Vector Machine, SVM-RFE.

DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i30%2F129685

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

Keywords

High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

Authors

Abstract

Keywords