Open Access Open Access  Restricted Access Subscription Access

Stacked Framework of Machine Learning Classifiers for Protein Family Prediction Using Protein Characteristics


Affiliations
1 Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli 627 012, India
2 School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632 014, India
 

A protein family must be identified, so that the protein can be modified and controlled for using it in the identification of drug target interactions, structure prediction, etc. Protein families are identified using the similarity between protein sequences. Alignment-free approaches use machine learning (ML) techniques for protein family prediction. In this study, two novel ML-based models, viz. a stacked framework of random forest, and a stacked framework of random forest, decision tree and naive Bayes for protein family prediction have been developed for a better identification of protein families. Both the models outperform state-of-the-art methods with an accuracy of 98.21% and 98.49% respectively. The proposed models give better results for twilight zone protein datasets as well.

Keywords

Alignment Free Method, Machine Learning, Protein Family Prediction, Stacked Framework, Twilight-Zone Proteins.
User
Notifications
Font Size


  • Stacked Framework of Machine Learning Classifiers for Protein Family Prediction Using Protein Characteristics

Abstract Views: 260  |  PDF Views: 125

Authors

T. Idhaya
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli 627 012, India
A. Suruliandi
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli 627 012, India
S. P. Raja
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632 014, India

Abstract


A protein family must be identified, so that the protein can be modified and controlled for using it in the identification of drug target interactions, structure prediction, etc. Protein families are identified using the similarity between protein sequences. Alignment-free approaches use machine learning (ML) techniques for protein family prediction. In this study, two novel ML-based models, viz. a stacked framework of random forest, and a stacked framework of random forest, decision tree and naive Bayes for protein family prediction have been developed for a better identification of protein families. Both the models outperform state-of-the-art methods with an accuracy of 98.21% and 98.49% respectively. The proposed models give better results for twilight zone protein datasets as well.

Keywords


Alignment Free Method, Machine Learning, Protein Family Prediction, Stacked Framework, Twilight-Zone Proteins.

References





DOI: https://doi.org/10.18520/cs%2Fv125%2Fi5%2F508-517