Open Access Open Access  Restricted Access Subscription Access

On the Analysis of Some Machine Learning Algorithms for the Prediction of Diabetes

1 Department of Computer Science, Usmanu Danfodiyo University, Sokoto, Nigeria
2 Department of Computer Science, Waziri Ummaru Federal Polytechnic, Birnin-Kebbi, Nigeria

Diabetes or Diabetes Mellitus (DM) is noxious diseases in the world. Diabetes is caused by obesity or high blood glucose level, lack of exercise and so forth. It can be manage if it’s detected at early state. Machine learning is the construction of computer system or program that can adapt and learn from their experience. PIMA dataset is used in this research works. The dataset contains some 9 attributes of 768 patients. There are different kinds of machine learning algorithms but in this research works we choose three algorithms which are under supervised learning. The algorithms are Logistic regression, Decision tree and Random forest. Each of these algorithms model were trained and tested. We later use some measure to compare and analyze the performance of the machine learning algorithms. The performance measures used are Accuracy, F-measure, Recall and Precision. Logistic Regression has the highest accuracy score which is 77%, also have the highest precision score 0.77 and have the highest f-measure 0.64. Decision Tree has the highest recall score 0.58.


Diabetes, Machine Learning, Logistic Regression, Decision Tree, Random Forest.
Font Size

  • Deeraj Shetty, Kishor Rit, Sohail Shaikh, Nikita Patil, "Diabetes Disease Prediction Using Data Mining ".International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), 2017.
  • Tejas N. Joshi, Prof. Pramila M. Chawan, "Diabetes Prediction Using Machine Learning Techniques". Int. Journal of Engineering Research and Application, Vol. 8, Issue 1, (Part -II) January 2018, pp.-09-13.
  • Jitranjan Sahoo, Manoranjan Dash & Abhilash Pati, “Diabetes Prediction Using Machine Learning Classification Algorithms”, International Research Journal of Engineering and Technology, Vol. 7, Issue 8, August 2020.
  • Nonso Nnamoko, Abir Hussain, David England, "Predicting Diabetes Onset: an Ensemble Supervised Learning Approach ". IEEE Congress on Evolutionary Computation (CEC), 2018.
  • Mitushi Soni, ‘Diabetes Prediction using Machine Learning Techniques’, International Journal of Engineering Research & Technology, Vol. 9, Issue 9, September 2020.

Abstract Views: 237

PDF Views: 0

  • On the Analysis of Some Machine Learning Algorithms for the Prediction of Diabetes

Abstract Views: 237  |  PDF Views: 0


Bello A. Bodinga
Department of Computer Science, Usmanu Danfodiyo University, Sokoto, Nigeria
Mukhtar A. Abdulsalam
Department of Computer Science, Usmanu Danfodiyo University, Sokoto, Nigeria
Bello A. Buhari
Department of Computer Science, Usmanu Danfodiyo University, Sokoto, Nigeria
Muzzammil Mansur
Department of Computer Science, Waziri Ummaru Federal Polytechnic, Birnin-Kebbi, Nigeria


Diabetes or Diabetes Mellitus (DM) is noxious diseases in the world. Diabetes is caused by obesity or high blood glucose level, lack of exercise and so forth. It can be manage if it’s detected at early state. Machine learning is the construction of computer system or program that can adapt and learn from their experience. PIMA dataset is used in this research works. The dataset contains some 9 attributes of 768 patients. There are different kinds of machine learning algorithms but in this research works we choose three algorithms which are under supervised learning. The algorithms are Logistic regression, Decision tree and Random forest. Each of these algorithms model were trained and tested. We later use some measure to compare and analyze the performance of the machine learning algorithms. The performance measures used are Accuracy, F-measure, Recall and Precision. Logistic Regression has the highest accuracy score which is 77%, also have the highest precision score 0.77 and have the highest f-measure 0.64. Decision Tree has the highest recall score 0.58.


Diabetes, Machine Learning, Logistic Regression, Decision Tree, Random Forest.
