Open Access
Subscription Access
Open Access
Subscription Access
Exploring Capabilities of Canopy Clustering Algorithm
Subscribe/Renew Journal
Clustering is one of the widely used data mining techniques that have its utility in extracting business intelligence that can help enterprises making expert decisions. Clustering is an unsupervised learning algorithm that can identify natural groups from given objects. Many types of clustering algorithms such as hierarchical, partitioning, density based, the model based, grid based and soft computing came into existence. The quality of clustering and the computational overhead are two important concerns while using clustering techniques. Canopy clustering is a clustering technique that is best used as pre-processing to main clustering algorithms like K-Means. Using the canopy it is possible to work out huge and impossible clustering algorithms to work out. As the canopy uses cheap distance metric, it is possible to reduce clustering overhead with losing the accuracy of clusters. However, in the industry, there is suspicion about the need for the canopy clustering in the future as streaming K-means is able to serve the purpose. In this paper we explore the canopy clustering algorithm and provide useful insights into this in order to drive home insights pertaining to canopy clustering. We built a prototype that demonstrates the usefulness of canopy clustering. The empirical results revealed that canopy clustering reduces much of the computational overhead when compared with clustering algorithms without canopy approach.
Keywords
Clustering, Pre-Clustering, Canopy Clustering, and Distance Metric.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 222
PDF Views: 4