Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Exploring Capabilities of Canopy Clustering Algorithm


Affiliations
1 Department of Computer Science, Acharya Nagarjuna University, Andhra Pradesh, India
2 Department of Computer Science, Jawaharlal Nehru Technological University-JNTU, Hyderabad, India
     

   Subscribe/Renew Journal


Clustering is one of the widely used data mining techniques that have its utility in extracting business intelligence that can help enterprises making expert decisions. Clustering is an unsupervised learning algorithm that can identify natural groups from given objects. Many types of clustering algorithms such as hierarchical, partitioning, density based, the model based, grid based and soft computing came into existence. The quality of clustering and the computational overhead are two important concerns while using clustering techniques. Canopy clustering is a clustering technique that is best used as pre-processing to main clustering algorithms like K-Means. Using the canopy it is possible to work out huge and impossible clustering algorithms to work out. As the canopy uses cheap distance metric, it is possible to reduce clustering overhead with losing the accuracy of clusters. However, in the industry, there is suspicion about the need for the canopy clustering in the future as streaming K-means is able to serve the purpose. In this paper we explore the canopy clustering algorithm and provide useful insights into this in order to drive home insights pertaining to canopy clustering. We built a prototype that demonstrates the usefulness of canopy clustering. The empirical results revealed that canopy clustering reduces much of the computational overhead when compared with clustering algorithms without canopy approach.

Keywords

Clustering, Pre-Clustering, Canopy Clustering, and Distance Metric.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 179

PDF Views: 4




  • Exploring Capabilities of Canopy Clustering Algorithm

Abstract Views: 179  |  PDF Views: 4

Authors

Srinivas Sivarathri
Department of Computer Science, Acharya Nagarjuna University, Andhra Pradesh, India
A. Govardhan
Department of Computer Science, Jawaharlal Nehru Technological University-JNTU, Hyderabad, India

Abstract


Clustering is one of the widely used data mining techniques that have its utility in extracting business intelligence that can help enterprises making expert decisions. Clustering is an unsupervised learning algorithm that can identify natural groups from given objects. Many types of clustering algorithms such as hierarchical, partitioning, density based, the model based, grid based and soft computing came into existence. The quality of clustering and the computational overhead are two important concerns while using clustering techniques. Canopy clustering is a clustering technique that is best used as pre-processing to main clustering algorithms like K-Means. Using the canopy it is possible to work out huge and impossible clustering algorithms to work out. As the canopy uses cheap distance metric, it is possible to reduce clustering overhead with losing the accuracy of clusters. However, in the industry, there is suspicion about the need for the canopy clustering in the future as streaming K-means is able to serve the purpose. In this paper we explore the canopy clustering algorithm and provide useful insights into this in order to drive home insights pertaining to canopy clustering. We built a prototype that demonstrates the usefulness of canopy clustering. The empirical results revealed that canopy clustering reduces much of the computational overhead when compared with clustering algorithms without canopy approach.

Keywords


Clustering, Pre-Clustering, Canopy Clustering, and Distance Metric.