Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Classification using Generalization Based Decision Tree Induction along with Relevance Analysis Based on Relational Database


Affiliations
1 Charotar Institute of Technology Changa, Gujarat, India
2 Charotar Institute of Technology, Changa, Gujarat, India
     

   Subscribe/Renew Journal


Classification is a process of sorting unknown values of certain attributes-of-interest based on the values of other attributes, and is a major challenge in data mining. A commonly used method is the decision tree. The efficiency of decision tree algorithms has been well established for relatively small data sets. However, this method of classification has problems when handling larger data sets, data having continuous numerical values, and has the tendency to favor multiplicity in terms of values associated with the attributes in the data set while making selection of the final determining attribute. In data mining applications, large training sets are common; therefore decision tree algorithms have limitations of scalability. Also in most data mining application, users have a little knowledge regarding which signature attribute should be selected for effective mining and the user is more dependent upon the capability of the algorithm. In this paper, we address selection of two things, one, the right signature attribute and the second, handle large data set. This we accomplish by proposing a new data classification method through integration of a set of sequential process that involves steps such as data cleaning; attribute oriented induction (identifying the signature attribute), relevance analysis as the preprocessing steps followed by induction of decision trees. This stepwise approach helps us to set simple extraction rules at multiple levels of abstraction and easily handles large data sets and continuous numerical values in a scalable way.

Keywords

Data Mining, Classification, Data Cleaning, Decision Tree Induction, Relevance Analysis.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 271

PDF Views: 3




  • Classification using Generalization Based Decision Tree Induction along with Relevance Analysis Based on Relational Database

Abstract Views: 271  |  PDF Views: 3

Authors

Amit Thakkar
Charotar Institute of Technology Changa, Gujarat, India
Yogeshwar P. Kosta
Charotar Institute of Technology, Changa, Gujarat, India
Amit Ganatra
Charotar Institute of Technology, Changa, Gujarat, India

Abstract


Classification is a process of sorting unknown values of certain attributes-of-interest based on the values of other attributes, and is a major challenge in data mining. A commonly used method is the decision tree. The efficiency of decision tree algorithms has been well established for relatively small data sets. However, this method of classification has problems when handling larger data sets, data having continuous numerical values, and has the tendency to favor multiplicity in terms of values associated with the attributes in the data set while making selection of the final determining attribute. In data mining applications, large training sets are common; therefore decision tree algorithms have limitations of scalability. Also in most data mining application, users have a little knowledge regarding which signature attribute should be selected for effective mining and the user is more dependent upon the capability of the algorithm. In this paper, we address selection of two things, one, the right signature attribute and the second, handle large data set. This we accomplish by proposing a new data classification method through integration of a set of sequential process that involves steps such as data cleaning; attribute oriented induction (identifying the signature attribute), relevance analysis as the preprocessing steps followed by induction of decision trees. This stepwise approach helps us to set simple extraction rules at multiple levels of abstraction and easily handles large data sets and continuous numerical values in a scalable way.

Keywords


Data Mining, Classification, Data Cleaning, Decision Tree Induction, Relevance Analysis.