Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Emerging Classification Method for Huge Dataset in Clustering


Affiliations
1 School of Computer Studies (PG), RVS College of Arts and Science, Coimbatore, India
2 Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, India
     

   Subscribe/Renew Journal


Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The generalize Canberra distance for continuous numerical attributes data to mixed attributes data, and use clustering analysis technique to squash existing instances, improve the classical nearest neighbor classification method.

Keywords

ID3, C4.5, Canberra Distance, Clustering, Improved Nearest Neighbour.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 344

PDF Views: 2




  • An Emerging Classification Method for Huge Dataset in Clustering

Abstract Views: 344  |  PDF Views: 2

Authors

B. Rosiline Jeetha
School of Computer Studies (PG), RVS College of Arts and Science, Coimbatore, India
M. Punithavalli
Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, India

Abstract


Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The generalize Canberra distance for continuous numerical attributes data to mixed attributes data, and use clustering analysis technique to squash existing instances, improve the classical nearest neighbor classification method.

Keywords


ID3, C4.5, Canberra Distance, Clustering, Improved Nearest Neighbour.