Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Partitional Distance-Based Projected Clustering Algorithm


Affiliations
1 Department of Information Technology, Aurora's Technological and Research Institute, Hyderabad, India
     

   Subscribe/Renew Journal


Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. To address this problem, a number of projected clustering algorithms have been proposed. However, most of them encounter difficulties when clusters hide in subspaces with very low dimensionality. These challenges motivate an effort to propose a robust partitional distance-based projected clustering algorithm. The algorithm consists of three phases. The first phase performs attribute relevance analysis by detecting dense and sparse regions and their location in each attribute. Starting from the results of the first phase, the goal of the second phase is to eliminate outliers, while the third phase aims to discover clusters in different subspaces. The clustering process is based on the K-means algorithm, with the computation of distance restricted to subsets of attributes where object values are dense. Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space.

Keywords

Agglomerative Approach, Attribute Relevance Classification, Analysis Eliminating Outliers, Clustering, Clique.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 248

PDF Views: 2




  • Partitional Distance-Based Projected Clustering Algorithm

Abstract Views: 248  |  PDF Views: 2

Authors

P. Srilakshmi
Department of Information Technology, Aurora's Technological and Research Institute, Hyderabad, India
T. Deepthi
Department of Information Technology, Aurora's Technological and Research Institute, Hyderabad, India

Abstract


Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. To address this problem, a number of projected clustering algorithms have been proposed. However, most of them encounter difficulties when clusters hide in subspaces with very low dimensionality. These challenges motivate an effort to propose a robust partitional distance-based projected clustering algorithm. The algorithm consists of three phases. The first phase performs attribute relevance analysis by detecting dense and sparse regions and their location in each attribute. Starting from the results of the first phase, the goal of the second phase is to eliminate outliers, while the third phase aims to discover clusters in different subspaces. The clustering process is based on the K-means algorithm, with the computation of distance restricted to subsets of attributes where object values are dense. Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space.

Keywords


Agglomerative Approach, Attribute Relevance Classification, Analysis Eliminating Outliers, Clustering, Clique.