Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automatic Selection of Decision Tree Algorithm Based on Training Set Size


Affiliations
1 Bharathiar University, School of BSMED, Coimbatore, India
2 SNS Rajalakshmi College of Science, Coimbatore, India
3 Dr. G.R. Damodaran College of Science, Coimbatore, India
     

   Subscribe/Renew Journal


In Data mining applications, very large training data sets with several million records are common. Decision trees are powerful and popular technique for both classification and prediction. Many decision tree construction algorithms have been proposed to handle large or small training data sets. Some algorithms are best suited for large data sets and some for small data set. The decision tree algorithm C4.5 classifies categorical and continuous attributes very well but it handles efficiently only a smaller data set. SLIQ (Supervised Learning In Quest) and SPRINT (Scalable Parallelizable Induction of Decision Tree)handles very large datasets. This paper deals with the automatic selection of decision tree algorithm based on training set size. The proposed system first prepares the training dataset size using the mathematical measure. The resultant training set size will be checked with the available memory. If memory is sufficient then the tree construction will continue with any one of the algorithms C4.5, SLIQ, SPRINT. After classifying the dataset, the accuracy of the classifier is estimated. The major advantages of the proposed approach are that the system takes less time and avoids memory problem.


Keywords

Data Mining, Decision Trees, Classification, Machine Learning, Training Data.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 284

PDF Views: 1




  • Automatic Selection of Decision Tree Algorithm Based on Training Set Size

Abstract Views: 284  |  PDF Views: 1

Authors

K. Vivekanandan
Bharathiar University, School of BSMED, Coimbatore, India
T. Sathyabama
SNS Rajalakshmi College of Science, Coimbatore, India
M. Prabhavathi
Dr. G.R. Damodaran College of Science, Coimbatore, India

Abstract


In Data mining applications, very large training data sets with several million records are common. Decision trees are powerful and popular technique for both classification and prediction. Many decision tree construction algorithms have been proposed to handle large or small training data sets. Some algorithms are best suited for large data sets and some for small data set. The decision tree algorithm C4.5 classifies categorical and continuous attributes very well but it handles efficiently only a smaller data set. SLIQ (Supervised Learning In Quest) and SPRINT (Scalable Parallelizable Induction of Decision Tree)handles very large datasets. This paper deals with the automatic selection of decision tree algorithm based on training set size. The proposed system first prepares the training dataset size using the mathematical measure. The resultant training set size will be checked with the available memory. If memory is sufficient then the tree construction will continue with any one of the algorithms C4.5, SLIQ, SPRINT. After classifying the dataset, the accuracy of the classifier is estimated. The major advantages of the proposed approach are that the system takes less time and avoids memory problem.


Keywords


Data Mining, Decision Trees, Classification, Machine Learning, Training Data.