Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Illustrating a Scalable Architecture-Powered Disease Prediction Using Machine Learning Techniques


Affiliations
1 Centre for Distance and Online Education, Bharathidasan University, India., India
2 Department of Computer Science, Government Arts and Science College, Srirangam, Tiruchirappalli, India., India
3 Department of Computer Applications, Holy Cross College, India., India
4 Edge AI Division, Reliance Jio Platforms Ltd., Bangalore, India., India
     

   Subscribe/Renew Journal


Healthcare information systems typically collect, store and manage various kinds of data such as illness details, clinical history, essential body parameters, health insurance plans, and other related data towards enabling data processing and analytics to arrive at better decision making with all the clarity and alacrity. To reduce the mortality rate due to heart diseases, it is essential to predict the presence of disease in its budding stage itself. Manual extraction of the useful knowledge from historical data is practically tedious and timeconsuming. Machine learning (ML) algorithms are being used to detect and predict something useful out of both historical and current data. Despite the applicability of machine learning algorithms for prediction, the accuracy of prediction is significantly influenced by features used for prediction. Moreover, to meet the needs of evolving data sizes, suitable technologies for data storage also become essential. Based on these two aspects, a comparative analysis has been performed for feature selection using four filter methods, namely, correlation measure, information gain, gain ratio and relief. Further, a scalable architecture using Hadoop framework has been proposed to enable the machine learning algorithms to handle larger datasets while performing prediction task. The impact of the proposed architecture on the performance of machine learning algorithm has been evaluated with benchmark dataset and found to have improved scalability and accuracy.

Keywords

Disease Prediction, Hadoop Distributed File System, Machine Learning, Random Forest, Support Vector Machine, Scalable Architecture.
Subscription Login to verify subscription
User
Notifications
Font Size

  • Senthilkumar Mohan and Gautam Srivastava, “Effective Heart Disease Prediction using Hybrid Machine Learning Techniques”, IEEE Access, Vol. 7, pp. 81542-81553, 2019.
  • Animesh Hazra, Subrata Kumar Mandal, Amit Gupta, Arkomita Mukherjee and Asmita Mukherjee, “Heart Disease Diagnosis and Prediction using Machine Learning and Data Mining Techniques: A Review”, Advances in Computational Sciences and Technology, Vol. 10, pp. 2137- 2159, 2017.
  • Apurb Rajdhan, Milan sai, Avi Agarwal, Dundigalla Ravi and Poonam Ghuli, “Heart Disease Prediction using Machine Learning”, International Journal of Engineering Research and Technology, Vol. 9, No. 4, pp. 659-662, 2020.
  • N. Arunpradeep and G. Niranjana, “Different Machine Learning Models Based Heart Disease Prediction”, International Journal of Recent Technology and Engineering, Vol. 8, No. 6, pp. 544-548, 2020.
  • Raparthi Yaswanth and Y.M. Riyazuddin, “Heart Disease Prediction using Machine learning Techniques”, International Journal of Innovative Technology and Exploring, Vol. 9, No. 5, pp. 1456-1460, 2020.
  • Rajatdeep Kaur and Kamaljit Kaur, “Cardiovascular Disease Recognition through Machine Learning Algorithms”, International Journal of Engineering and Advanced Technology, Vol. 9, No. 4, pp. 2109-2115, 2020.
  • Zhenlin Kan, Xinru Cheng, Seung Hyun Kim and Yuting Jin, “Apache Hive-Based Big Data Analysis of Healthcare”, International Journal of Pure and Applied Mathematics, Vol. 119, No. 8, pp. 237-259, 2018.
  • Niha Beera, Nysha Chaparala and Jaya Lakshmi Gundabathina, “Data Analysis of Heart Disease Dataset using Hadoop and Impala with MySQL”, International Journal of Applied Engineering Research, Vol. 13, No. 7, pp. 5311-5315, 2018.
  • Shraddha Subhash Shirsath and Pro. Shubhangi Patil, “Disease Prediction using Machine Learning over Big Data”, International Journal of Innovative Research in Science, Engineering and Technology, Vol. 7, No. 6, pp. 6752-6757, 2018.
  • S. Bagavathy, V. Gomathy, S. Sheeba Rani, and Monica Murugesan, “Early Heart Disease Detection using Data Mining Techniques with Hadoop Map Reduce”, International Journal of Pure and Applied Mathematics, Vol. 119, No. 12, pp. 1915-1920, 2018.
  • T. Nagamani, S. Logeswari and B. Gomathy, “Heart Disease Prediction using Data Mining with Mapreduce Algorithm”, International Journal of Innovative Technology and Exploring Engineering, Vol. 8, No. 3, pp. 1-13, 2019.
  • Heba F. Rammal and Ahmed Z. Emam, “Heart Failure Prediction Models using Big Data Techniques”, International Journal of advanced Computer Science and Applications, Vol. 9, No. 5, pp. 363-371, 2018.
  • S. Yamini and K.P. Rama Prabha, “A Data Mining with Big Data Disease Prediction”, International Research Journal of Engineering and Technology, Vol. 5, No. 4, pp. 829-832, 2018.
  • Abderrahmane ED Daoudy and Khalil Maalmi, “Real-Time Machine Learning for Early Detection of Heart Disease using Big Data Approach”, Proceedings of IEEE International Conference on Wireless Technologies, Embedded and Intelligent Systems, pp. 1-6, 2019.
  • R. Venkatesh, C. Balasubramanian and M. Kaliappan, “Development of Big Data Predictive Analytics Model for Disease Prediction using Machine Learning Technique”, Journal of Medical Systems, Vol. 78, pp. 1-14, 2019.
  • S. Vinitha and S. Sajini, “Disease Prediction using Machine Learning over Big Data”, Computer Science and Engineering: An International Journal, Vol. 8, No. 1, pp. 1- 8, 2018.
  • G. Vaishali and V. Kalaivani, “Big Data Analysis for Heart Disease Detection System using Map Reduce Technique”, Proceedings of International Conference on Computing Technologies and Intelligent Data Engineering, pp. 1-6, 2016.
  • Prema Jain and Amandeep Kaur, “Big Data Analysis for Prediction of Coronary Artery Disease”, Proceedings of International Conference on Computing Sciences, pp. 188- 193, 2018.
  • Cheryl Ann Alexander and Lidong Wang, “Big Data Analytics in Heart Attack Prediction”, Journal of Nursing and Care, Vol. 6, No. 2, pp. 1-9, 2017.
  • Mohmmed Abdulrazzaq Thanoon, Mohammad J.M. Zedan and Abdulhameed N. Hameed, “Feature Selection Based on Wrapper and Information Gain”, Proceedings of International Conference for Science and Technology, pp. 1-6, 2019.
  • Heart Disease Data Set, Available at http.//archive.ics.uci.edu/ml/datasets/Heart+Disease, Accessed at 2022.
  • Priya R. Patil and S.A. Kinariwala, “Automated Diagnosis of Heart Disease using Random Forest Algorithm”, International Journal of Advance Research, Ideas and Innovations in Technology, Vol. 3, No. 2, pp. 579-589, 2017.
  • C. Beulah Christalin Latha and S Carolin Jeeva, “Improving the Accuracy of Prediction of Heart Disease Risk based on Ensemble Classification Techniques”, Informatics in Medicine Unlocked, Vol. 16, No. 1, pp. 1-14, 2019.
  • P. Nancy, B. Swaminathan, K. Navina, B. Nandhini and P. Lokesh, “Tuned Random Forest Algorithm for Improved Prediction of Cardiovascular Disease”, International Journal of Recent Technology and Engineering, Vol. 9, No. 1, pp. 1355-1360, 2020.
  • K.S. Shalet and V.J. Sarath Kumar, “Diagnosis of Heart Disease using Decision Tree and SVM classifier”, International Journal of Applied Engineering Research, Vol. 10, No. 68, pp. 598-602, 2015.
  • Deepika Kancherla, Jyostna Devi Bodapati and N. Veeranjaneyulu, “Effect of Different Kernels on the Performance of an SVM Based Classification”, International Journal of Recent Technology and Engineering, Vol. 7, No. 4, pp. 1-6, 2019.
  • K.M. Almustafa, “Prediction of Heart Disease and Classifiers’ Sensitivity Analysis”, BMC Bioinformatics, Vol. 21, pp. 278-289, 2020.

Abstract Views: 116

PDF Views: 0




  • Illustrating a Scalable Architecture-Powered Disease Prediction Using Machine Learning Techniques

Abstract Views: 116  |  PDF Views: 0

Authors

Chellammal Surianarayanan
Centre for Distance and Online Education, Bharathidasan University, India., India
Sharmila Rengasamy
Department of Computer Science, Government Arts and Science College, Srirangam, Tiruchirappalli, India., India
M. Baby Nirmala
Department of Computer Applications, Holy Cross College, India., India
Pethuru Raj Chelliah
Edge AI Division, Reliance Jio Platforms Ltd., Bangalore, India., India

Abstract


Healthcare information systems typically collect, store and manage various kinds of data such as illness details, clinical history, essential body parameters, health insurance plans, and other related data towards enabling data processing and analytics to arrive at better decision making with all the clarity and alacrity. To reduce the mortality rate due to heart diseases, it is essential to predict the presence of disease in its budding stage itself. Manual extraction of the useful knowledge from historical data is practically tedious and timeconsuming. Machine learning (ML) algorithms are being used to detect and predict something useful out of both historical and current data. Despite the applicability of machine learning algorithms for prediction, the accuracy of prediction is significantly influenced by features used for prediction. Moreover, to meet the needs of evolving data sizes, suitable technologies for data storage also become essential. Based on these two aspects, a comparative analysis has been performed for feature selection using four filter methods, namely, correlation measure, information gain, gain ratio and relief. Further, a scalable architecture using Hadoop framework has been proposed to enable the machine learning algorithms to handle larger datasets while performing prediction task. The impact of the proposed architecture on the performance of machine learning algorithm has been evaluated with benchmark dataset and found to have improved scalability and accuracy.

Keywords


Disease Prediction, Hadoop Distributed File System, Machine Learning, Random Forest, Support Vector Machine, Scalable Architecture.

References