Open Access Open Access  Restricted Access Subscription Access

Performance Analysis of Regression and Classification Models in the Prediction of Breast Cancer


Affiliations
1 School of Electronics Engineering (SENSE), VIT University, Vellore − 632014, Tamil Nadu, India
2 School of Social Science and Languages (SSL), VIT University, Vellore − 632014, Tamil Nadu, India
 

Objective: To suggest an automated diagnostic system for the early detection of breast cancer. Methods: This problem has been addressed by making use of machine learning algorithms that can accurately classify a tumor as either malignant or benign by identifying the minimum number of image features. A comparative study on various classification approaches such as Decision Tree, Support Vector Machine, K-Nearest Neighbor and Random Forest have also been conducted with a focus on cross validation to identify the best performing model. Findings: The study shows that Random Forest classifier gives the maximum accuracy. It also highlights that cross validation and fine tuning are necessary to prevent over fitting of data. Improvements: It has been observed that the selection of parameters play a very important role in correct classification as multicollinearity among attributes can render classifier models ineffective.

Keywords

Breast Cancer, Classification, Cross Validation, Decision Tree, K-Nearest Neighbor, Logistic Regression, Random Forest, Support Vector Machine
User

Abstract Views: 189

PDF Views: 0




  • Performance Analysis of Regression and Classification Models in the Prediction of Breast Cancer

Abstract Views: 189  |  PDF Views: 0

Authors

Aritra Basu
School of Electronics Engineering (SENSE), VIT University, Vellore − 632014, Tamil Nadu, India
Rohit Roy
School of Electronics Engineering (SENSE), VIT University, Vellore − 632014, Tamil Nadu, India
N. Savitha
School of Social Science and Languages (SSL), VIT University, Vellore − 632014, Tamil Nadu, India

Abstract


Objective: To suggest an automated diagnostic system for the early detection of breast cancer. Methods: This problem has been addressed by making use of machine learning algorithms that can accurately classify a tumor as either malignant or benign by identifying the minimum number of image features. A comparative study on various classification approaches such as Decision Tree, Support Vector Machine, K-Nearest Neighbor and Random Forest have also been conducted with a focus on cross validation to identify the best performing model. Findings: The study shows that Random Forest classifier gives the maximum accuracy. It also highlights that cross validation and fine tuning are necessary to prevent over fitting of data. Improvements: It has been observed that the selection of parameters play a very important role in correct classification as multicollinearity among attributes can render classifier models ineffective.

Keywords


Breast Cancer, Classification, Cross Validation, Decision Tree, K-Nearest Neighbor, Logistic Regression, Random Forest, Support Vector Machine



DOI: https://doi.org/10.17485/ijst%2F2018%2Fv11i3%2F169557