Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automatic Generation Of Parameters In Density-based Spatial Clustering


Affiliations
1 Department of Computer Science, University of Mumbai, India
     

   Subscribe/Renew Journal


As a result of emerging new techniques for scientific way of collecting data, we are able to accumulate data in large scale pertaining to various fields. One such method of data mining is Cluster analysis. Of all clustering algorithms, density-based clustering is better in terms of clustering quality and the way the data are handled. Density based clustering is advantageous over other clustering algorithms in the following ways – arbitrary shaped clusters are formed; number of clusters need not be known and noise is handled. However, there are two main points that are critical in density-based clustering. Firstly, it is not effective while handling datasets of varied density. Secondly, the selection of input parameters ε and Min Pts play a critical role in the quality of clustering. This paper proposes a model – Automatic Generation of Parameters in Density-Based Spatial Clustering (AGPDBSCAN) that aims at improving the density-based clustering by generating different candidate parameters. With these candidates, we will be able to handle both uniform density and varied density datasets. The results of experiments also look promising for different clustering datasets.

Keywords

Clustering Algorithms, Density-based Clustering, Density Parameters, Generation of Parameters
Subscription Login to verify subscription
User
Notifications
Font Size

  • L. Peng, Z. Dong and W. Naijun, “VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise”, Proceedings of International Conference on Service Systems and Service Management, pp. 1-4, 2007.
  • D. Birant and A. Kut, “ST-DBSCAN: An Algorithm for Clustering Spatial-Temporal Data”, Data and Knowledge Engineering, Vol. 60, No. 1, pp. 208-221, 2007.
  • M.D. Nguyen and W.Y. Shin, “DBSTexC: Density-Based Spatio-Textual Clustering on Twitter”, Proceedings of ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 23-26, 2017.
  • P.H. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Pearson Education, 2006.
  • T. Zhang, R. Ramakrishnan and L. Miron, “BIRCH: An Efficient Data Clustering Method for Very Large Databases”, Data Mining Knowledge Discovery, Vol. 1, No. 2, pp. 141-182, 1997.
  • Raymond T. Ng and Jiawei Han, “CLARANS - A Method for Clustering Objects for Spatial Data Mining”, IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 5, pp. 1003-1016, 2002.
  • E.A. Pambudi, A.Y. Badharudin and A.P. Wicaksono, “Enhanced K-Means by Using Grey Wolf Optimizer for Brain MRI Segmentation”, ICTACT Journal on Soft Computing, Vol. 11, No. 3, pp. 2353-2358, 2021.
  • D. Murugan and S.S. Rathna, “Fuzzy based Privacy Preserved K-Means Clustering”, ICTACT Journal on Soft Computing, Vol. 10, No. 1, pp. 2011-2014, 2019.
  • R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, SIGMOD Record, Vol. 27, No. 2, pp. 94-105, 1998.
  • X.X. Martin Ester, Hans Peter Kriegel and Jiirg Sander, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proceedings of International Conference on Knowledge Discovery and Data Mining, pp. 1-6, 1996.
  • M. Naik Gaonkar and K. Sawant, “DBSCAN with Eps Automatic for Large Dataset”, International Journal on Advanced Computer Theory and Engineering, Vol. 2, No. 2, pp. 2319-2526, 2013.
  • N. Soni and A. Ganatra, “AGED (Automatic Generation of Eps for DBSCAN)”, International Journal of Computer Science and Information Security, Vol. 14, No. 5, pp. 536559, 2016.
  • N. Soni and A. Ganatra, “MOiD (Multiple Objects incremental DBSCAN) - A Paradigm Shift in Incremental DBSCAN”, International Journal of Computer Science and Information Security, Vol. 14, No. 4, pp. 316-346, 2016.
  • F.O. Ozkok and M. Celik, “A New Approach to Determine Eps Parameter of DBSCAN Algorithm”, International Journal of Intelligent Systems and Applications in Engineering, Vol. 5, No. 4, pp. 247-251, 2017.
  • P. Viswanath and V. Suresh Babu, “Rough-DBSCAN: A Fast Hybrid Density based Clustering Method for Large Data Sets”, Pattern Recognition Letters, Vol. 30, No. 16, pp. 1477-1488, 2009.
  • J. Jang and H. Jiang, “DBScan++: Towards Fast and Scalable Density Clustering”, Proceedings of 36th International Conference on Machine Learning, Vol. 2019, pp. 5348-5359, 2019.
  • T.N. Tran, K. Drab and M. Daszykowski, “Revised DBSCAN Algorithm to Cluster Data with Dense Adjacent Clusters”, Chemometrics and Intelligent Laboratory Systems, Vol. 120, pp. 92-96, 2013.
  • S. Anitha Elavarasi and J. Akilandeswari, “Survey on Clustering Algorithm and Similarity Measure for Categorical Data”, ICTACT Journal on Soft Computing, Vol. 4, No. 2, pp. 715-722, 2014.
  • T. Ali, S. Asghar and N.A. Sajid, “Critical Analysis of DBSCAN Variations”, Proceedings of International Conference on Information Emerging Technologies, pp. 17, 2010.
  • K. Kameshwaran and K. Malarvizhi, “Survey on Various Clustering Techniques in Data Mining”, International Journal of Science and Research, Vol. 5, No. 2, pp. 22722276, 2014.
  • W.K. Loh and Y.H. Park, “A Survey on Density-Based Clustering Algorithms”, Lecture Notes in Electrical Engineering, Vol. 280, pp. 775-780, 2014.
  • P. Bhattacharjee and P. Mitra, “A Survey of Density Based Clustering Algorithms”, Frontiers of Computer Science, Vol. 15, No. 1, pp. 1-14, 2021.
  • M.A. Ahmed, H. Baharin and P.N.E. Nohuddin, “Analysis of K-means, DBSCAN and OPTICS Cluster Algorithms on Al-Quran Verses”, International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, pp. 248254, 2020.
  • S. Priyadarshini and A. Freeda, “Implementation of Adaptive DBSCAN for Cluster Analysis”, International Journal of Science Technology and Engineering, Vol. 2, No. 9, pp. 164-168, 2016.
  • R.G. Creţulescu, D.I. Morariu, M. Breazu and D. Volovici, “DBSCAN Algorithm for Document Clustering”, International Journal of Advanced Statistics and IT&C for Economics and Life Sciences, Vol. 9, No. 1, pp. 58-66, 2019.
  • A. Mustakiml, “DBSCAN Algorithm: Twitter Text Clustering of Trend Topic Pilkada Pekanbaru”, Journal of Physics: Conference Series, Vol. 1363, No. 1, pp. 1-9, 2019.
  • Z. Ghaemi and M. Farnaghi, “A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data”, ISPRS International Journal of Geo-Information, Vol. 8, No. 2, pp. 1-8, 2019.
  • J.H. Friedman, J.L. Bentley and R.A. Finkel, “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, pp. 209-226, 1977.
  • J.L. Bentley, “Multidimensional Binary Search Trees used for Associative Search”, Communications of the ACM, Vol. 18, No. 9. pp. 509-517, 1975.
  • A. Gionis, H. Mannila and P. Tsaparas, “Clustering Aggregation”, ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 1-17, 2007.
  • P. Franti and S. Sieranoja, “K-Means Properties on Six Clustering Benchmark Datasets”, Applied Intelligence, Vol. 48, No. 12, pp. 4743–4759, 2018.
  • R.A. Fisher, “The use of Multiple Measurements in Taxonomic Problems”, Annals of Human Genetics, Vol. 7, No. 2, pp. 179-188, 1936.
  • C. Dua, Dheeru and Graff, “UCI Machine Learning Repository”, Available at http://archive.ics.uci.edu/ml, Accessed at 2019.
  • C.T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters”, IEEE Transactions on Computers, Vol. 20, No. 1, pp. 68-86, 1971.
  • P. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, Vol. 20, pp. 53-65, 1986.

Abstract Views: 240

PDF Views: 1




  • Automatic Generation Of Parameters In Density-based Spatial Clustering

Abstract Views: 240  |  PDF Views: 1

Authors

Jayasree Ravi
Department of Computer Science, University of Mumbai, India
Sushil Kulkarni
Department of Computer Science, University of Mumbai, India

Abstract


As a result of emerging new techniques for scientific way of collecting data, we are able to accumulate data in large scale pertaining to various fields. One such method of data mining is Cluster analysis. Of all clustering algorithms, density-based clustering is better in terms of clustering quality and the way the data are handled. Density based clustering is advantageous over other clustering algorithms in the following ways – arbitrary shaped clusters are formed; number of clusters need not be known and noise is handled. However, there are two main points that are critical in density-based clustering. Firstly, it is not effective while handling datasets of varied density. Secondly, the selection of input parameters ε and Min Pts play a critical role in the quality of clustering. This paper proposes a model – Automatic Generation of Parameters in Density-Based Spatial Clustering (AGPDBSCAN) that aims at improving the density-based clustering by generating different candidate parameters. With these candidates, we will be able to handle both uniform density and varied density datasets. The results of experiments also look promising for different clustering datasets.

Keywords


Clustering Algorithms, Density-based Clustering, Density Parameters, Generation of Parameters

References