Automatic Selection of Decision Tree Algorithm Based on Training Set Size
Subscribe/Renew Journal
In Data mining applications, very large training data sets with several million records are common. Decision trees are powerful and popular technique for both classification and prediction. Many decision tree construction algorithms have been proposed to handle large or small training data sets. Some algorithms are best suited for large data sets and some for small data set. The decision tree algorithm C4.5 classifies categorical and continuous attributes very well but it handles efficiently only a smaller data set. SLIQ (Supervised Learning In Quest) and SPRINT (Scalable Parallelizable Induction of Decision Tree)handles very large datasets. This paper deals with the automatic selection of decision tree algorithm based on training set size. The proposed system first prepares the training dataset size using the mathematical measure. The resultant training set size will be checked with the available memory. If memory is sufficient then the tree construction will continue with any one of the algorithms C4.5, SLIQ, SPRINT. After classifying the dataset, the accuracy of the classifier is estimated. The major advantages of the proposed approach are that the system takes less time and avoids memory problem.
Keywords
Abstract Views: 284
PDF Views: 1