Open Access Open Access  Restricted Access Subscription Access

Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value


Affiliations
1 School of Computer Science, University of Lincoln, United Kingdom
2 Department of Computer Science, Michael Okpara University of Agriculture Umudike, Abia State, Nigeria
 

The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms. This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (K-means and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.

Keywords

Dataset, Clustering, Healthcare Data, Silhouette Score Value, K-Means, DBSCAN.
User
Notifications
Font Size

  • Alsayat, A., & El-Sayed, H. (2016). Efficient genetic K-Means clustering for health care knowledge discovery. In Software Engineering Research, Management and Applications (SERA), 2016 IEEE 14th International Conference on (pp. 45-52). IEEE.
  • Balasubramanian, T., & Umarani, R. (2012, March). An analysis on the impact of fluoride in human health (dental) using clustering data mining technique. In Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on (pp. 370-375). IEEE.
  • Banu G. Rasitha & Jamala J.H.Bousal (2015). Perdicting Heart Attack using Fuzzy C Means Clustering Algorithm. International Journal of Latest Trends in Engineering and Technology (IJLTET).
  • Banu, M. N., & Gomathy, B. (2014). Disease forecasting system using data mining methods. In Intelligent Computing Applications (ICICA), 2014 International Conference on (pp. 130-133). IEEE.
  • Belciug, S. (2009). Patients length of stay grouping using the hierarchical clustering algorithm. Annals of the University of Craiova-Mathematics and Computer Science Series, 36(2), 79-84.
  • Belciug, S., Salem, A. B., Gorunescu, F., & Gorunescu, M. (2010, November). Clustering-based approach for detecting breast cancer recurrence. In Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on (pp. 533-538). IEEE.
  • Bruno, G., Cerquitelli, T., Chiusano, S., & Xiao, X. (2014). A clustering-based approach to analyse examinations for diabetic patients. In Healthcare Informatics (ICHI), 2014 IEEE International Conference on (pp. 45-50). IEEE.
  • DeFreitas, K., & Bernard, M. (2015). Comparative performance analysis of clustering techniques in educational data mining. IADIS International Journal on Computer Science & Information Systems, 10(2).
  • Escudero, J., Zajicek, J. P., & Ifeachor, E. (2011). Early detection and characterization of Alzheimer's disease in clinical scenarios using Bioprofile concepts and K-means. In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE (pp. 6470-6473). IEEE.
  • Han, J., Kamber, M., & Pei, J. (2012). Cluster Analysis-10: Basic Concepts and Methods.
  • Ibrahim, N. H., Mustapha, A., Rosli, R., & Helmee, N. H. (2013). A hybrid model of hierarchical clustering and decision tree for rule-based classification of diabetic patients. International Journal of Engineering and Technology (IJET), 5(5), 3986-91.
  • Jabel K. Merlin & Srividhya (2016). Performance analysis of clustering algorithms on heart dataset. International Journal of Modern Computer Science, 5(4), 113-117.
  • Kar Amit Kumar, Shailesh Kumar Patel & Rajkishor Yadav (2016). A Comparative Study & Performance Evaluation of Different Clustering Techniques in Data Mining. ACEIT Conference Proceeding.
  • Lv, Y., Ma, T., Tang, M., Cao, J., Tian, Y., Al-Dhelaan, A., & Al-Rodhaan, M. (2016). An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing, 171, 9-22.
  • Malli, S., Nagesh, H. R., & Joshi, H. G. (2014). A Study on Rural Health care Data sets using Clustering Algorithms. International Journal of Engineering Research and Applications, 3(8), 517-520.
  • Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650-1654.
  • Na, S., Xumin, L., & Yong, G. (2010, April). Research on k-means clustering algorithm: An improved k-means clustering algorithm. In Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on (pp. 63-67). IEEE.
  • Paul, R., & Hoque, A. S. M. L. (2010, July). Clustering medical data to predict the likelihood of diseases. In Digital Information Management (ICDIM), 2010 Fifth International Conference on (pp. 44-49). IEEE.
  • Pham, D. T., Dimov, S. S., & Nguyen, C. D. (2005). Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103-119.
  • R.Nithya & P.Manikandan & D.Ramyachitra (2015); Analysis of clustering technique for the diabetes dataset using the training set parameter. International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 9.
  • Sagar, H. K., & Sharma, V. (2014). Error Evaluation on K-Means and Hierarchical Clustering with Effect of Distance Functions for Iris Dataset. International Journal of Computer Applications, 86(16).
  • Shah, G. H., Bhensdadia, C. K., & Ganatra, A. P. (2012). An empirical evaluation of density-based clustering techniques. International Journal of Soft Computing and Engineering (IJSCE) ISSN, 22312307, 216-223.
  • Tan, P. N., Steinbach, M., & Kumar, V. (2013). Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining.
  • Tomar, D., & Agarwal, S. (2013). A survey on Data Mining approaches for Healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241-266.
  • Vijayarani, S., & Sudha, S. (2015). An efficient clustering algorithm for predicting diseases from hemogram blood test samples. Indian Journal of Science and Technology, 8(17).
  • Zheng, B., Yoon, S. W., & Lam, S. S. (2014). Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications, 41(4), 1476-1482.

Abstract Views: 672

PDF Views: 366




  • Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value

Abstract Views: 672  |  PDF Views: 366

Authors

Godwin Ogbuabor
School of Computer Science, University of Lincoln, United Kingdom
F. N. Ugwoke
Department of Computer Science, Michael Okpara University of Agriculture Umudike, Abia State, Nigeria

Abstract


The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms. This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (K-means and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.

Keywords


Dataset, Clustering, Healthcare Data, Silhouette Score Value, K-Means, DBSCAN.

References