Open Access Open Access  Restricted Access Subscription Access

Development of an Efficient Hierarchical Clustering Analysis using an Agglomerative Clustering Algorithm


Affiliations
1 Department of Computer Science, Lahore College for Women University, Lahore 54000, Pakistan
2 Department of Information Technology, Government College University Faisalabad 38000, Pakistan
3 Department of Computer Science, National Textile University, Faisalabad 37610, Pakistan
 

Clustering algorithms are used to generate clusters of elements having similar characteristics. Among the different groups of clustering algorithms, agglomerative algorithm is widely used in the document clustering domain. This study aimed to examine the effectiveness of agglomerative clustering algorithm in document clustering by enhancing its efficiency and evaluating it through implementation. The resulting values, precision = 0.8571, recall = 0.8571 and F-measure = 0.857076 indicate the highest level of accuracy and efficiency compared to existing algorithm.

Keywords

Cosine Similarity Measure, Document Clustering, F-Measure, Hierarchical Agglomerative Clustering, Preprocessing, TF-IDF.
User
Notifications
Font Size


  • Development of an Efficient Hierarchical Clustering Analysis using an Agglomerative Clustering Algorithm

Abstract Views: 538  |  PDF Views: 145

Authors

Arshia Naeem
Department of Computer Science, Lahore College for Women University, Lahore 54000, Pakistan
Mariam Rehman
Department of Information Technology, Government College University Faisalabad 38000, Pakistan
Maria Anjum
Department of Computer Science, Lahore College for Women University, Lahore 54000, Pakistan
Muhammad Asif
Department of Computer Science, National Textile University, Faisalabad 37610, Pakistan

Abstract


Clustering algorithms are used to generate clusters of elements having similar characteristics. Among the different groups of clustering algorithms, agglomerative algorithm is widely used in the document clustering domain. This study aimed to examine the effectiveness of agglomerative clustering algorithm in document clustering by enhancing its efficiency and evaluating it through implementation. The resulting values, precision = 0.8571, recall = 0.8571 and F-measure = 0.857076 indicate the highest level of accuracy and efficiency compared to existing algorithm.

Keywords


Cosine Similarity Measure, Document Clustering, F-Measure, Hierarchical Agglomerative Clustering, Preprocessing, TF-IDF.

References





DOI: https://doi.org/10.18520/cs%2Fv117%2Fi6%2F1045-1053