Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Different Similarity Functions with Fuzzy C-Means Clustering Approach Using Meeting Transcripts


Affiliations
1 Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, India
     

   Subscribe/Renew Journal


Clustering is a technique of automatically grouping similar data into clusters. A large diversity of similarity measures distance functions such as Euclidean distance, Jaccard distance, Pearson Correlation distance, Cosine similarity and Kullback-Leibler Divergence have been implemented for clustering. Fuzzy C means algorithm is implemented for assigning membership to each word point in the cluster. In the same way it is calculated to each cluster center from the origin of remote region between the cluster center and the word point in this process. This proposed framework is used to validate the five similarity measure functions with Fuzzy C means clustering algorithm for finding the effectiveness. To estimate the optimal number of clusters, by implementing the validity measures like purity and entropy. Finally the results are compared five similarity measure functions with Fuzzy C Means clustering algorithm. Euclidean similarity measure function provides better and accurate results as compared to the other distance functions.

Keywords

Clustering, Euclidean Distance, Fuzzy C Means Algorithm, Similarity Measure.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 254

PDF Views: 5




  • Analysis of Different Similarity Functions with Fuzzy C-Means Clustering Approach Using Meeting Transcripts

Abstract Views: 254  |  PDF Views: 5

Authors

J. I. Sheeba
Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, India
K. Vivekanandan
Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, India

Abstract


Clustering is a technique of automatically grouping similar data into clusters. A large diversity of similarity measures distance functions such as Euclidean distance, Jaccard distance, Pearson Correlation distance, Cosine similarity and Kullback-Leibler Divergence have been implemented for clustering. Fuzzy C means algorithm is implemented for assigning membership to each word point in the cluster. In the same way it is calculated to each cluster center from the origin of remote region between the cluster center and the word point in this process. This proposed framework is used to validate the five similarity measure functions with Fuzzy C means clustering algorithm for finding the effectiveness. To estimate the optimal number of clusters, by implementing the validity measures like purity and entropy. Finally the results are compared five similarity measure functions with Fuzzy C Means clustering algorithm. Euclidean similarity measure function provides better and accurate results as compared to the other distance functions.

Keywords


Clustering, Euclidean Distance, Fuzzy C Means Algorithm, Similarity Measure.