Open Access
Subscription Access
Open Access
Subscription Access
Implementation of K-Modes Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
Subscribe/Renew Journal
This paper is mainly related to Data Mining and in particular it is in Clustering. Partitioning a large set of objects into homogeneous groups is a fundamental operation in Data Mining. This process of grouping objects into homogenous groups is called as clustering. In general, K-Means algorithm is used for clustering large data sets in Data Mining but its efficiency is limited to cluster numerical objects only. However, K-Means algorithm working efficiently with numerical values, its use is limited in Data Mining because data sets in Data Mining often contain categorical values. In this paper we present an algorithm called K-Modes algorithm to extend the K-Means paradigm to categorical domains. Here we introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes and use a frequency based method to up date modes in the clustering process. Here the WEKA tool is used for the implementation of K-modes algorithm.
Keywords
Categorical Data, Clustering, Data Mining, Dissimilarity Measures, K-Means, K-Modes, Weka Tool.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 230
PDF Views: 2