Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Leukemia Classification using Cloud based Map-Reduce with K-Nearest Neighbor Classifier Framework


Affiliations
1 Department of Information Technology at Kennesaw State University, Kennesaw, GA 30144, Georgia
     

   Subscribe/Renew Journal


Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Cancer diagnosis is one of the most emerging clinical applications of microarray data. Frequent changes in the behavior of disease generates an enormous volume of data. Microarray data satisfies both the accuracy and velocity of big data in recent development, as it keeps changing with time. Therefore, the analysis of microarray datasets needs a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The exact identification of genes are responsible for causing cancer are essential in microarray data analysis. Most existing schemes are two-phase process such as feature selection or extraction and classification. The various statistical methods based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce based on a K-Nearest Neighbor (MRKNN) classifier is also employed to classify microarray data and the algorithms are successfully implemented in a Hadoop framework.


Keywords

Microarray Gene Expression, Leukemia Classification, Feature Selection, MapReduce based on a K-Nearest Neighbor.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 199

PDF Views: 1




  • Leukemia Classification using Cloud based Map-Reduce with K-Nearest Neighbor Classifier Framework

Abstract Views: 199  |  PDF Views: 1

Authors

A. Kernytskyy
Department of Information Technology at Kennesaw State University, Kennesaw, GA 30144, Georgia
Krzysztof J Cios
Department of Information Technology at Kennesaw State University, Kennesaw, GA 30144, Georgia

Abstract


Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Cancer diagnosis is one of the most emerging clinical applications of microarray data. Frequent changes in the behavior of disease generates an enormous volume of data. Microarray data satisfies both the accuracy and velocity of big data in recent development, as it keeps changing with time. Therefore, the analysis of microarray datasets needs a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The exact identification of genes are responsible for causing cancer are essential in microarray data analysis. Most existing schemes are two-phase process such as feature selection or extraction and classification. The various statistical methods based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce based on a K-Nearest Neighbor (MRKNN) classifier is also employed to classify microarray data and the algorithms are successfully implemented in a Hadoop framework.


Keywords


Microarray Gene Expression, Leukemia Classification, Feature Selection, MapReduce based on a K-Nearest Neighbor.