Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Design of Categorical Data Clustering Using Machine Learning Ensemble


Affiliations
1 Institute of Computer Science and Information Science, Srinivas University, India
     

   Subscribe/Renew Journal


Cluster analysis of data is a crucial tool for discovering and making sense of a dataset underlying structure. It has been put to use in many contexts and many different fields with great success. In addition, new innovations in the last decade have piqued the interest of clinical researchers, scientists, and biologists. As the number of dimensions in a data set grows, the consensus function of traditional ensemble clustering often fails to generate final clusters. The main problem with conventional ensemble clustering is exactly this. The proposed work employs a similarity measure between links to identify which clusters contain the unknown datasets. To this end, this study proposes employing an improved ensemble framework for clustering categorical datasets. More specifically, it employs ensemble machine learning methods to categorize data. Multiple machine learning algorithms are incorporated into this model. Objective performance indicators are used to compare a model to more traditional approaches to determine how effective each the proposed method is.

Keywords

Base Clustering, Ensemble Clustering Clusters, Accuracy, Precision
Subscription Login to verify subscription
User
Notifications
Font Size

  • L. Bai and J. Liang, “A Categorical Data Clustering Framework on Graph Representation”, Pattern Recognition, Vol. 128, pp. 1-13, 2022.
  • R. Brnawy and N. Shiri, “Improving Quality of Ensemble Technique for Categorical Data Clustering Using Granule Computing”, Proceedings of International Conference on Database and Expert Systems Applications, pp. 261-272, 2021.
  • G. Pole and P. Gera, “Cluster-Based Ensemble Using Distributed Clustering Approach for Large Categorical Data”, Proceedings of International Conference on ICT Analysis and Applications, pp. 671-680, 2021.
  • I. Khan and R. Hedjam, “Ensemble Clustering using Extended Fuzzy k-Means for Cancer Data Analysis”, Expert Systems with Applications, Vol. 172, pp. 114622-114633, 2021.
  • D.T. Dinh, V.N. Huynh and S. Sriboonchitta, “Clustering mixed Numerical and Categorical Data with Missing Values”, Information Sciences, Vol. 571, pp. 418-442, 2021.
  • I. Singh, N. Kumar and S. Jain, “A Multi-Level Classification and Modified PSO Clustering based Ensemble Approach for Credit Scoring”, Applied Soft Computing, Vol. 111, pp. 107687-107698, 2021.
  • B.A. Hassan and T.A. Rashid, “A Multidisciplinary Ensemble Algorithm for Clustering Heterogeneous Datasets”, Neural Computing and Applications, Vol. 33, No. 17, pp. 10987-11010, 2021.
  • K. Parish Venkata Kumar and M. Jogendra Kumar, “Concept Summarization of Uncertain Categorical Data Streams Based on Cluster Ensemble Approach”, Proceedings of International Conference on Pervasive Computing and Social Networking, pp. 385-398, 2022.
  • V. Shorewala, “Early Detection of Coronary Heart Disease using Ensemble Techniques”, Informatics in Medicine Unlocked, Vol. 26, pp. 1-16, 2022.
  • I.B. Ayinla and S.O. Akinola, “An Improved Ensemble Model using Random Forest Branch Clustering Optimisation Approach”, University of Ibadan Journal of Science and Logics in ICT Research, Vol. 7, No. 2, pp. 8-19, 2021.

Abstract Views: 88

PDF Views: 2




  • Design of Categorical Data Clustering Using Machine Learning Ensemble

Abstract Views: 88  |  PDF Views: 2

Authors

N. Yuvaraj
Institute of Computer Science and Information Science, Srinivas University, India
A. Jayanthiladevi
Institute of Computer Science and Information Science, Srinivas University, India

Abstract


Cluster analysis of data is a crucial tool for discovering and making sense of a dataset underlying structure. It has been put to use in many contexts and many different fields with great success. In addition, new innovations in the last decade have piqued the interest of clinical researchers, scientists, and biologists. As the number of dimensions in a data set grows, the consensus function of traditional ensemble clustering often fails to generate final clusters. The main problem with conventional ensemble clustering is exactly this. The proposed work employs a similarity measure between links to identify which clusters contain the unknown datasets. To this end, this study proposes employing an improved ensemble framework for clustering categorical datasets. More specifically, it employs ensemble machine learning methods to categorize data. Multiple machine learning algorithms are incorporated into this model. Objective performance indicators are used to compare a model to more traditional approaches to determine how effective each the proposed method is.

Keywords


Base Clustering, Ensemble Clustering Clusters, Accuracy, Precision

References