Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Optimized Feature Selection and Classification in Microarray Gene Expression Cancer Data


Affiliations
1 Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India
     

   Subscribe/Renew Journal


Cancer classification can be performed by Microarray Gene Expression data which comprises of thousands of genes and small number of samples. Gene expression data is efficient method for finding which gene causes cancer in human being. In this work, formulate hybrid model containing filter approach, the wrapper approach and partial least square method that used to select the optimized features form the high dimensional dataset. Filter approach uses mutual information, wrapper approach uses genetic algorithm and partial least square method uses t-score estimation for feature selection mechanism. With the reduced dimension of features, classification is performed on the reduced data set to classify the samples into normal or abnormal. To attain the improved classification accuracy both the feature selection and the dimension reduction is performed. By using feature selection technique most possibly cancer related genes from huge microarray gene expression data are selected. The trained classifier model is tested with benchmark cancer dataset which consists of colon cancer dataset comprises 62 samples, 40 of which are tumor and 22 are normal with 2000 genes and the prostate cancer dataset comprises 136 samples, 59 of which is normal and 75 are tumor with 12,600 genes. The proposed model achieves accuracy of 92.7% for wrapper approach with optimal features and also outperforms other two approaches with respect to accuracy and time complexity.

Keywords

Partial Least Squares, Feature Selection, Mutual Information, Cancer Classification, T-Score, Genetic Algorithm, Support Vector Machine.
Subscription Login to verify subscription
User
Notifications
Font Size


Abstract Views: 515

PDF Views: 0




  • Optimized Feature Selection and Classification in Microarray Gene Expression Cancer Data

Abstract Views: 515  |  PDF Views: 0

Authors

B. Lakshmanan
Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India
T. Jenitha
Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India

Abstract


Cancer classification can be performed by Microarray Gene Expression data which comprises of thousands of genes and small number of samples. Gene expression data is efficient method for finding which gene causes cancer in human being. In this work, formulate hybrid model containing filter approach, the wrapper approach and partial least square method that used to select the optimized features form the high dimensional dataset. Filter approach uses mutual information, wrapper approach uses genetic algorithm and partial least square method uses t-score estimation for feature selection mechanism. With the reduced dimension of features, classification is performed on the reduced data set to classify the samples into normal or abnormal. To attain the improved classification accuracy both the feature selection and the dimension reduction is performed. By using feature selection technique most possibly cancer related genes from huge microarray gene expression data are selected. The trained classifier model is tested with benchmark cancer dataset which consists of colon cancer dataset comprises 62 samples, 40 of which are tumor and 22 are normal with 2000 genes and the prostate cancer dataset comprises 136 samples, 59 of which is normal and 75 are tumor with 12,600 genes. The proposed model achieves accuracy of 92.7% for wrapper approach with optimal features and also outperforms other two approaches with respect to accuracy and time complexity.

Keywords


Partial Least Squares, Feature Selection, Mutual Information, Cancer Classification, T-Score, Genetic Algorithm, Support Vector Machine.



DOI: https://doi.org/10.37506/v11%2Fi1%2F2020%2Fijphrd%2F193842