Identifying the Most Influential Variables in Breast Cancer Using Logistic Regression
Subscribe/Renew Journal
Breast cancer has become recently the most common cancer and a major cause of death among women all over the world and especially in developing countries like Iraq. This study aims to identify the most important features that affect in deciding the type of breast cancer whether benign or malignant.
A predictive model was developed using binary logistic regression which is expected to be helpful for oncologists in diagnosing the type of breast cancer data set have been downloaded from UCI ml repository that consists of 9attributes and 683valid instances.
At first, some preprocessing was done to cleanse the data, then two models were built using two different LR method to find out which one will give the most suitable model and highest classification rate. The first one was the full model with all predictive variables, while the other called reduced model with only 5 predictive variables. Each model was validated with a different data set than that used for developing the two models. Both validated and trained models were evaluated using different performance metrics like ROC curves, AUC, sensitivity and specificity. The analysis of the results showed that the reduced model is the best classifier since it gives the higher classification rate.
Keywords
Abstract Views: 515
PDF Views: 0