Open Access Open Access  Restricted Access Subscription Access

Basic Gene Discretization-Model using Correlation Clustering for Distributed DNA Databases


Affiliations
1 School of Science & Information Technology (SSIT), Skyline University, Nigeria
2 Department of Software Engineering, Jigjiga University, Ethiopia
3 Department of Information Technology, Jigjiga University, Ethiopia
4 Hindusthan College of Arts and Science, Coimbatore, India
 

Gene is a basic component of DNA located in the nucleus of Human cell. Currently data mining technique has huge impact in fields of human genetic science and gene sequence data analysis. Gene sequence analysis is a method of subjecting DNA sequence to systematic methods in order to know the genes character, configuration, nature and characteristics. CBC and MNBC applied to gene sequence data analysis, aims to segregate diseased diabetic genes from a vast stream of DNA gene sequence elements present in group of copiousstatistical data. This techniques attempts to approve, determine methods and tools for analyzingdiseased gene sequences. It also helps in classification and interpretation of results accurately and meaningfully. This study is a combination of supervised and unsupervised machine learning technique for data analysis. The clustering is done by CBC whereas classification done by MNBC techniques. It recognizes gene expressions by framingassociation rules in accordance with support measure and confidence measure on the input data set.It will extract and filter required data into clusters based on CBC technique thereby drafting association rules. These are then applied on testing dataset to filter required (diseased) gene sequences. Finally MLRC algorithm is applied as classification algorithm to identify class labels of test genes sequences in a big dataset. In medical diagnosis gene data mining techniques through gene discretization models helps to identify various associations between the DNA genes based progressions and inconsistency in disease infections transformations. Above all itovercomes the limitation of existing Support Vector Machine Classification technology which incurs high computational cost and increased iterations.

Keywords

Data mining, Data Analysis, DNA Gene, Gene Sequence, Vector Machine Classification.
User
Notifications
Font Size

Abstract Views: 261

PDF Views: 0




  • Basic Gene Discretization-Model using Correlation Clustering for Distributed DNA Databases

Abstract Views: 261  |  PDF Views: 0

Authors

J. Vijay Arputharaj
School of Science & Information Technology (SSIT), Skyline University, Nigeria
Pushpa Rega Ganesan
Department of Software Engineering, Jigjiga University, Ethiopia
Ponsuresh Manoharan
Department of Information Technology, Jigjiga University, Ethiopia
P. Supraja
Hindusthan College of Arts and Science, Coimbatore, India

Abstract


Gene is a basic component of DNA located in the nucleus of Human cell. Currently data mining technique has huge impact in fields of human genetic science and gene sequence data analysis. Gene sequence analysis is a method of subjecting DNA sequence to systematic methods in order to know the genes character, configuration, nature and characteristics. CBC and MNBC applied to gene sequence data analysis, aims to segregate diseased diabetic genes from a vast stream of DNA gene sequence elements present in group of copiousstatistical data. This techniques attempts to approve, determine methods and tools for analyzingdiseased gene sequences. It also helps in classification and interpretation of results accurately and meaningfully. This study is a combination of supervised and unsupervised machine learning technique for data analysis. The clustering is done by CBC whereas classification done by MNBC techniques. It recognizes gene expressions by framingassociation rules in accordance with support measure and confidence measure on the input data set.It will extract and filter required data into clusters based on CBC technique thereby drafting association rules. These are then applied on testing dataset to filter required (diseased) gene sequences. Finally MLRC algorithm is applied as classification algorithm to identify class labels of test genes sequences in a big dataset. In medical diagnosis gene data mining techniques through gene discretization models helps to identify various associations between the DNA genes based progressions and inconsistency in disease infections transformations. Above all itovercomes the limitation of existing Support Vector Machine Classification technology which incurs high computational cost and increased iterations.

Keywords


Data mining, Data Analysis, DNA Gene, Gene Sequence, Vector Machine Classification.