Performance evaluation of supervised machine learning algorithms for diabetes prediction
DOI:
https://doi.org/10.64497/jssci.126Keywords:
machine learning, diabetes prediction, random forest, class imbalance, performance evaluationAbstract
This study presents a performance evaluation of six machine learning algorithms for the prediction of diabetes using a publicly available dataset from Kaggle, which includes relevant clinical and demographic features. Comprehensive preprocessing procedures were undertaken to address missing values, outliers, and a pronounced class imbalance (38:1 ratio of non-diabetic to diabetic cases), which poses significant challenges to model performance. The evaluated algorithms—Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Naive Bayes, and Artificial Neural Network (ANN)—were assessed using key performance metrics, including balanced accuracy, precision, recall, F1-score, sensitivity, specificity, and detection rate. Among the models, Random Forest achieved the highest balanced accuracy (93%), followed by SVM (83%) and Decision Tree (82%), demonstrating superior predictive performance. The findings underscore the potential of machine learning in enhancing diabetes diagnosis and management and conclude that a careful balance of performance metrics, especially when dealing with imbalanced healthcare datasets should guide model selection.
Downloads
References
[1] Nishat,M.M.,Faisal,F.,Mahbub,M.A.,Mahbub,M.H.,Islam,S.,&Hoque,M.A.(2021). Performance assessment of different machine learning algorithms in predicting diabetes mel- litus. Biosc. Biotech. Res. Comm, 14(1), 74-82. DOI: https://doi.org/10.21786/bbrc/14.1/10
[2] Febrian, M. E., Ferdinan, F. X., Sendani, G. P., Suryanigrum, K. M., & Yunanda, R. (2023). Diabetes prediction using supervised machine learning. Procedia Computer Science, 216, 21-30. DOI: https://doi.org/10.1016/j.procs.2022.12.107
[3] Assegie, T. A., & Nair, P. S. (2020). The performance of different machine learning models on diabetes prediction. International journal of scientific & technology research, 9(01).
[4] Khanam,J.J.,&Foo,S.Y.(2021).Acomparisonofmachinelearningalgorithmsfordiabetes prediction. Ict Express, 7(4), 432-439. DOI: https://doi.org/10.1016/j.icte.2021.02.004
[5] Tan, K. R., Seng, J. J. B., Kwan, Y. H., Chen, Y. J., Zainudin, S. B., Loh, D. H. F., ... & Low, L. L. (2023). Evaluation of machine learning methods developed for prediction of diabetes complications: a systematic review. Journal of diabetes science and technology, 17(2), 474- 489. DOI: https://doi.org/10.1177/19322968211056917
[6] Soni, M., & Varma, S. (2020). Diabetes prediction using machine learning techniques. Inter- national Journal of Engineering Research & Technology (IJERT), 9(09), 2278-0181. DOI: https://doi.org/10.17577/IJERTV9IS090345
[7] Kangra, K., & Singh, J. (2023). Comparative analysis of predictive machine learning al- gorithms for diabetes mellitus. Bulletin of Electrical Engineering and Informatics, 12(3), 1728-1737. DOI: https://doi.org/10.11591/eei.v12i3.4412
[8] Nahzat, S., & Yag ̆anog ̆lu, M. (2021). Diabetes prediction using machine learning classifica- tion algorithms. Avrupa Bilim ve Teknoloji Dergisi, (24), 53-59.
[9] Pranto, B., Mehnaz, S. M., Mahid, E. B., Sadman, I. M., Rahman, A., & Momen, S. (2020). Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information, 11(8), 374. DOI: https://doi.org/10.3390/info11080374
[10] El Massari, H., Sabouri, Z., Mhammedi, S., & Gherabi, N. (2022). Diabetes prediction using machine learning algorithms and ontology. Journal of ICT Standardization, 10(2), 319-337. DOI: https://doi.org/10.13052/jicts2245-800X.10212
[11] Ahmed, N., Ahammed, R., Islam, M. M., Uddin, M. A., Akhter, A., Talukder, M. A., & Paul, B. K. (2021). Machine learning based diabetes prediction and development of smart web application. International Journal of Cognitive Computing in Engineering, 2, 229-241. DOI: https://doi.org/10.1016/j.ijcce.2021.12.001
[12] Haque, F., Bin Ibne Reaz, M., Chowdhury, M. E. H., Srivastava, G., Hamid Md Ali, S., Bakar, A. A. A., & Bhuiyan, M. A. S. (2021). Performance analysis of conventional ma- chine learning algorithms for diabetic sensorimotor polyneuropathy severity classification. Diagnostics, 11(5), 801. DOI: https://doi.org/10.3390/diagnostics11050801
[13] Li, L., Lee, C. C., Zhou, F. L., Molony, C., Doder, Z., Zalmover, E., ... & Wu, C. (2021). Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data. Pharmacoepi- demiology and drug safety, 30(5), 610-618. DOI: https://doi.org/10.1002/pds.5199
[14] Kaur, H., & Kumari, V. (2022). Predictive modelling and analytics for diabetes using a ma- chine learning approach. Applied computing and informatics, 18(1/2), 90-100. DOI: https://doi.org/10.1016/j.aci.2018.12.004
[15] Modu, B., & Fika, I. A. (2025). Supervised Machine Learning Models for COVID-19 Pre- diction. Asian Journal of Probability and Statistics, 27(3), 13-23. DOI: https://doi.org/10.9734/ajpas/2025/v27i3719
[16] Wibbeke, J., Rohjans, S., & Rauh, A. (2025). Quantification of Data Imbalance. Expert Systems, 42(3), e13840. DOI: https://doi.org/10.1111/exsy.13840
[17] Band, S. S., Yarahmadi, A., Hsu, C. C., Biyari, M., Sookhak, M., Ameri, R., ... & Liang, H. W. (2023). Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Informatics in Medicine Unlocked, 40, 101286. DOI: https://doi.org/10.1016/j.imu.2023.101286
[18] Nandan Prasad, A. (2024). Data Quality and Preprocessing. In Introduction to Data Gover- nance for Machine Learning Systems: Fundamental Principles, Critical Practices, and Future Trends (pp. 109-223). Berkeley, CA: Apress. DOI: https://doi.org/10.1007/979-8-8688-1023-7_3
[19] Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classi- fication: Experimental evaluation. Information Sciences, 513, 429-441. DOI: https://doi.org/10.1016/j.ins.2019.11.004
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Babagana Modu, Kale Kawu Kale

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Abstract 86
- PDF 50

