Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Microarray Data Using Data Mining Techniques


Affiliations
1 Computer Science Department, Saveetha Engineering College, Chennai-602105, India
     

   Subscribe/Renew Journal


Gene expression data is essential for understanding cellular activities of all organisms in identifying the diseases and discovering drugs. Generally gene expression data may have missing values due to experimental errors during the laboratory processes, inappropriate thresholds in preprocessing, insufficient resolution of the microarray, image corruption, dust or scratches on the slide. Imputation of missing values is more recommended in order to increase the effectiveness of analysis algorithms than removal of data. And there is a need to discover a better clustering algorithm to identify the differently expressed genes. However, choice of suitable clustering method(s) for an experimental dataset is not straightforward till date. So in this paper we propose AVG imputation method for Pre-Processing and a hybrid clustering algorithm for Post-Processing. The hybrid clustering algorithm is tested with the AVG-Imputed missing value analyzed data as well as the original data. The results show that pre-processed data produce high-quality clusters and appropriate number of clusters in terms of BIC value, Log Likelihood and Sum of Squared Error criteria than the original data.

Keywords

AVG-Imputation, Data Mining, Gene Expression Data, Hybrid Clustering Algorithm, K-Means Clustering Algorithm, Missing Value Analysis, Model Based Clustering Algorithm.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 272

PDF Views: 2




  • Analysis of Microarray Data Using Data Mining Techniques

Abstract Views: 272  |  PDF Views: 2

Authors

J. Jasmine Gabrie
Computer Science Department, Saveetha Engineering College, Chennai-602105, India
P. Valarmathie
Computer Science Department, Saveetha Engineering College, Chennai-602105, India

Abstract


Gene expression data is essential for understanding cellular activities of all organisms in identifying the diseases and discovering drugs. Generally gene expression data may have missing values due to experimental errors during the laboratory processes, inappropriate thresholds in preprocessing, insufficient resolution of the microarray, image corruption, dust or scratches on the slide. Imputation of missing values is more recommended in order to increase the effectiveness of analysis algorithms than removal of data. And there is a need to discover a better clustering algorithm to identify the differently expressed genes. However, choice of suitable clustering method(s) for an experimental dataset is not straightforward till date. So in this paper we propose AVG imputation method for Pre-Processing and a hybrid clustering algorithm for Post-Processing. The hybrid clustering algorithm is tested with the AVG-Imputed missing value analyzed data as well as the original data. The results show that pre-processed data produce high-quality clusters and appropriate number of clusters in terms of BIC value, Log Likelihood and Sum of Squared Error criteria than the original data.

Keywords


AVG-Imputation, Data Mining, Gene Expression Data, Hybrid Clustering Algorithm, K-Means Clustering Algorithm, Missing Value Analysis, Model Based Clustering Algorithm.