Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Gene Biclustering On Large Datasets Using Fuzzy C-means Clustering


Affiliations
1 Department of Computer Science and Engineering, HKBK College of Engineering, India
2 Department of Computer Science and Engineering, Jain University, India
3 Department of Computer Science and Engineering, Presidency University, India
4 Department of Electronics and Telecommunications Engineering, University of Technology and Applied Sciences, Oman
     

   Subscribe/Renew Journal


The current study employs biclustering to alleviate some of the drawbacks associated with gene expression data grouping. Different biclustering algorithms are used in this study to detect unique gene activity in various contexts and reduce the duplication of broad gene information. Furthermore, machine learning or heuristic algorithms have become widely utilised for biclustering due to their suitability in problems where populations of potential solutions allow examination of a larger percentage of the research area. To begin with, gene expression data biclusters frequently contain data that is the same under a variety of different situations of gene expression. Therefore, the biclustering technique is particularly effective if the matrix lines and columns are merged immediately. Submatrices can be identified using the Large Average Sub matrix. A Fuzzy C-Means algorithm is also used to ensure that the sub-matrix can be expanded to include more rows and columns for further analysis. The sub-matrices and component precision and strength are factored into the system design. It uses biclustering techniques to differentiate gene expression information. On the Garber dataset, the simulation is run in Java. Using the average match score for non-overlapping modules, the influence of noise on overlapping modules using constant bicluster and additive bicluster, and the overall run duration, the study is assessed.

Keywords

Heuristic Algorithm, Gene Expression, Data Biclusters, Fuzzy C-Means
Subscription Login to verify subscription
User
Notifications
Font Size

  • H. Bulut and A. Onan, “An Improved Ant-Based Algorithm Based on Heaps Merging and Fuzzy C-Means for Clustering Cancer Gene Expression Data”, Sadhana, Vol. 45, No. 1, pp. 1-17, 2020.
  • C. Lopez, S. Tucker and T., Salameh, “An Unsupervised Machine Learning Method for Discovering Patient Clusters based on Genetic Signatures”, Journal of Biomedical Informatics, Vol. 85, pp. 30-39, 2018.
  • S. Lee, “Fuzzy Clustering with Optimization for Collaborative Filtering-Based Recommender Systems”, Journal of Ambient Intelligence and Humanized Computing, Vol. 52, 1-18, 2021.
  • P. Edwin Dhas and B. Sankara Gomathi, “A Novel Clustering Algorithm by Clubbing GHFCM and GWO for Microarray Gene Data”, The Journal of Supercomputing, Vol. 76, No. 8, pp. 5679-5693, 2020.
  • I. Aljarah, M. Habib, H. Faris and S. Mirjalili, “Introduction to Evolutionary Data Clustering and Its Applications.”, Proceedings of International Conference on Evolutionary Data Clustering: Algorithms and Applications, pp. 1-21, 2021.
  • M. Fratello, L. Cattelani, A. Federico, and D. Greco, “Unsupervised Algorithms for Microarray Sample Stratification”, Proceedings of International Conference on Microarray Data Analysis, pp. 121-146, 2022.
  • D. Yan, H. Cao, Y. Yu and X. Yu, “SingleObjective/Multiobjective Cat Swarm Optimization Clustering Analysis for Data Partition”, IEEE Transactions on Automation Science and Engineering, Vol. 17, No. 33, pp. 1633-1646, 2020.
  • N. Kushwaha, M. Pant, S. Kant and V.K. Jain, “Magnetic Optimization Algorithm for Data Clustering”, Pattern Recognition Letters, Vol. 115, pp. 59-65, 2018.
  • Y. Yan and F.C. Harris, “A Survey of Data Clustering for Cancer Subtyping”, International Journal for Computers and Their Applications, Vol. 28, No. 2, pp. 1-13, 2021.
  • M. Franco and J.M. Vivo, “Cluster Analysis of Microarray Data”, Proceedings of International Conference on Microarray bioinformatics, pp. 153-18, 2019.

Abstract Views: 225

PDF Views: 1




  • Gene Biclustering On Large Datasets Using Fuzzy C-means Clustering

Abstract Views: 225  |  PDF Views: 1

Authors

M. Ramkumar
Department of Computer Science and Engineering, HKBK College of Engineering, India
J. Gowrishankar
Department of Computer Science and Engineering, Jain University, India
V. Amirtha Preeya
Department of Computer Science and Engineering, Presidency University, India
T. Pushpa
Department of Computer Science and Engineering, HKBK College of Engineering, India
T. Karthikeyan
Department of Electronics and Telecommunications Engineering, University of Technology and Applied Sciences, Oman

Abstract


The current study employs biclustering to alleviate some of the drawbacks associated with gene expression data grouping. Different biclustering algorithms are used in this study to detect unique gene activity in various contexts and reduce the duplication of broad gene information. Furthermore, machine learning or heuristic algorithms have become widely utilised for biclustering due to their suitability in problems where populations of potential solutions allow examination of a larger percentage of the research area. To begin with, gene expression data biclusters frequently contain data that is the same under a variety of different situations of gene expression. Therefore, the biclustering technique is particularly effective if the matrix lines and columns are merged immediately. Submatrices can be identified using the Large Average Sub matrix. A Fuzzy C-Means algorithm is also used to ensure that the sub-matrix can be expanded to include more rows and columns for further analysis. The sub-matrices and component precision and strength are factored into the system design. It uses biclustering techniques to differentiate gene expression information. On the Garber dataset, the simulation is run in Java. Using the average match score for non-overlapping modules, the influence of noise on overlapping modules using constant bicluster and additive bicluster, and the overall run duration, the study is assessed.

Keywords


Heuristic Algorithm, Gene Expression, Data Biclusters, Fuzzy C-Means

References