Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Cancer Prognosis Prediction Model Using Data Mining Techniques


Affiliations
1 Dept. of Comp. Sci., Christ University, Bangalore, India
2 Department of CSE, University Visvesvaraya College of Engineering, Bangalore, India
3 University Visvesvaraya College of Engineering, Bangalore, India
4 Indian Institute of Science, Bangalore, India
     

   Subscribe/Renew Journal


Cancer prognosis prediction improves the quality of treatment and increases the survivability of the patients. Disease prognosis is identified at the treatment stage and at the recurrence stage. Conventional cancer prediction method deals only with the survival or mortality of the patients, but not with other labels such as severity of the disease through metastasis or multi-primary, stage, grade, etc. The SEER Public Use cancer database has more prominent variables that support better prediction approach. The objective of this paper is twofold. One is to build a prediction model to find the prominent variables by using the standard classifiers and the second is to improve the prediction accuracy through various sampling techniques. The proposed prediction model consist of three phases namely, basic level pre-processing, problem specific processing and modeling classifiers. Problem specific processing phase deals with feature extraction, sampling and response variable selection. The well known classification algorithms (Decision Tree, Naive Bayes and KNN) have been used to model the classifiers for prediction analysis. Apart from the available incident data from SEER (Breast, Colorectal and Respiratory Cancer data) a new mixed combination of the three in equal proportion have been generated for the experimentation. Feature selection through correlation and information gain reduced the attributes to 37 from the raw size of 118. Patient survival, age at diagnosis, stage and multiple primaries in the given order has been identified as the prominent response variable, where as grade performed very low in the experimentation. The performances of various sampling techniques have been studied with the data set size ranging from 500 to 30000 samples for the four prominent labels identified in the previous step. The result shows that the balanced stratified sampling technique always maintains consistency in the performance. Also classifier model with decision tree algorithm optimizes the performance compared to the other algorithms. All the results of the models are tabulated in this paper.

Keywords

Classifier, Pre-Processing, Prognosis Prediction, SEER.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 276

PDF Views: 2




  • Cancer Prognosis Prediction Model Using Data Mining Techniques

Abstract Views: 276  |  PDF Views: 2

Authors

J. S. Saleema
Dept. of Comp. Sci., Christ University, Bangalore, India
P. Deepa Shenoy
Department of CSE, University Visvesvaraya College of Engineering, Bangalore, India
K. R. Venugopal
University Visvesvaraya College of Engineering, Bangalore, India
L. M. Patnaik
Indian Institute of Science, Bangalore, India

Abstract


Cancer prognosis prediction improves the quality of treatment and increases the survivability of the patients. Disease prognosis is identified at the treatment stage and at the recurrence stage. Conventional cancer prediction method deals only with the survival or mortality of the patients, but not with other labels such as severity of the disease through metastasis or multi-primary, stage, grade, etc. The SEER Public Use cancer database has more prominent variables that support better prediction approach. The objective of this paper is twofold. One is to build a prediction model to find the prominent variables by using the standard classifiers and the second is to improve the prediction accuracy through various sampling techniques. The proposed prediction model consist of three phases namely, basic level pre-processing, problem specific processing and modeling classifiers. Problem specific processing phase deals with feature extraction, sampling and response variable selection. The well known classification algorithms (Decision Tree, Naive Bayes and KNN) have been used to model the classifiers for prediction analysis. Apart from the available incident data from SEER (Breast, Colorectal and Respiratory Cancer data) a new mixed combination of the three in equal proportion have been generated for the experimentation. Feature selection through correlation and information gain reduced the attributes to 37 from the raw size of 118. Patient survival, age at diagnosis, stage and multiple primaries in the given order has been identified as the prominent response variable, where as grade performed very low in the experimentation. The performances of various sampling techniques have been studied with the data set size ranging from 500 to 30000 samples for the four prominent labels identified in the previous step. The result shows that the balanced stratified sampling technique always maintains consistency in the performance. Also classifier model with decision tree algorithm optimizes the performance compared to the other algorithms. All the results of the models are tabulated in this paper.

Keywords


Classifier, Pre-Processing, Prognosis Prediction, SEER.