Open Access Open Access  Restricted Access Subscription Access

On the Consequence of Variation Measure in K-Modes Clustering Algorithm


Affiliations
1 Computer Science Department, Shaqra University, Dawadmi Community College, Dawadmi 11911 P.O. Box 18, Saudi Arabia
 

Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently.The original k-means algorithm or known as Lloyd's algorithm, is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to definite data clustering, which is an integral part of data mining and has attracted much attention recently In this paper delineates increase to the k-modes algorithm for clustering definite data. By modifying a simple corresponding Variation measure for definite entities, a heuristic approach was developed in, which allows the use of the k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large definite data sets. The main aim of this paper is to derive severely the updating formula of the k-modes clustering algorithm with the new Variation measure, and the convergence of the algorithm under the optimization framework.

Keywords

Data Mining, Clustering, K-Means Algorithm, Definite Data.
User
Notifications
Font Size

Abstract Views: 216

PDF Views: 0




  • On the Consequence of Variation Measure in K-Modes Clustering Algorithm

Abstract Views: 216  |  PDF Views: 0

Authors

Abedalhakeem T. Issa
Computer Science Department, Shaqra University, Dawadmi Community College, Dawadmi 11911 P.O. Box 18, Saudi Arabia

Abstract


Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently.The original k-means algorithm or known as Lloyd's algorithm, is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to definite data clustering, which is an integral part of data mining and has attracted much attention recently In this paper delineates increase to the k-modes algorithm for clustering definite data. By modifying a simple corresponding Variation measure for definite entities, a heuristic approach was developed in, which allows the use of the k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large definite data sets. The main aim of this paper is to derive severely the updating formula of the k-modes clustering algorithm with the new Variation measure, and the convergence of the algorithm under the optimization framework.

Keywords


Data Mining, Clustering, K-Means Algorithm, Definite Data.