Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Identifying Outliers in Datasets Using Outlier Removal Clustering (ORC) Algorithm


Affiliations
1 Department of Computer Science, Sree Saraswathi Thyagaraja College, Thippampatti, Pollachi, India
     

   Subscribe/Renew Journal


The objective function of general K-Mean, this work associates a weight vector with each cluster to indicate which dimensions are relevant to the clusters. To prevent the value of the objective function from decreasing because of the elimination of dimensions, virtual dimensions are added to the objective function. The values of data points on virtual dimensions are set artificially to ensure that the objective function is minimized when the real subspace clusters or the clusters in original space are found. The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. This research work presents an algorithm that provides outlier detection and data clustering simultaneously. The algorithm improves the estimation of centroids of the generative distribution during the process of clustering and outlier discovery.

Keywords

Data Mining, Clustering, K-Means, High Dimensions, Outlier Removal Clustering (ORC) Algorithm.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 234

PDF Views: 3




  • Identifying Outliers in Datasets Using Outlier Removal Clustering (ORC) Algorithm

Abstract Views: 234  |  PDF Views: 3

Authors

N. Nirmaladevi
Department of Computer Science, Sree Saraswathi Thyagaraja College, Thippampatti, Pollachi, India
R. Suresh Kumar
Department of Computer Science, Sree Saraswathi Thyagaraja College, Thippampatti, Pollachi, India

Abstract


The objective function of general K-Mean, this work associates a weight vector with each cluster to indicate which dimensions are relevant to the clusters. To prevent the value of the objective function from decreasing because of the elimination of dimensions, virtual dimensions are added to the objective function. The values of data points on virtual dimensions are set artificially to ensure that the objective function is minimized when the real subspace clusters or the clusters in original space are found. The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. This research work presents an algorithm that provides outlier detection and data clustering simultaneously. The algorithm improves the estimation of centroids of the generative distribution during the process of clustering and outlier discovery.

Keywords


Data Mining, Clustering, K-Means, High Dimensions, Outlier Removal Clustering (ORC) Algorithm.