Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes


Authors

  • Aniq Astofa Universitas Pamulang, Tangerang Selatan, Indonesia
  • Perani Rosyani Universitas Pamulang, Tangerang Selatan, Indonesia
  • Rahmawati Rahmawati Universitas Pamulang, Tangerang Selatan, Indonesia
  • Sopiyan Apandi Universitas Pamulang, Tangerang Selatan, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v6i1.859

Keywords:

Machine Learning; Early Prediction; Diabetes; Clinical Data; Classification

Abstract

Diabetes is one of the non-communicable diseases that is often detected at an advanced stage, thereby increasing the risk of serious complications. The application of machine learning has the potential to support early diabetes detection; however, most previous studies have focused on large-scale datasets and high predictive accuracy, while methodological evaluations on small-sized clinical data remain limited. This study aims to evaluate and compare the performance of several machine learning algorithms for early diabetes prediction using a limited clinical dataset, with particular emphasis on analyzing the impact of data characteristics on model performance. The dataset used in this study consists of 22 samples with eight clinical features and one target variable, which were divided into 17 training samples and 5 testing samples. The research stages include data preprocessing, training–testing data splitting, model training, and performance evaluation using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The algorithms evaluated include Logistic Regression, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. The experimental results indicate that none of the evaluated models were able to effectively detect the diabetes class, as reflected by precision, recall, and F1-score values of zero across all models. Although Random Forest and XGBoost achieved an accuracy of 0.6, this value was largely influenced by the dominance of the non-diabetes class in the very limited test set. Correlation analysis further reveals that Glucose, BMI, and Diabetes Pedigree Function are the most influential features associated with diabetes status. The main contribution of this study lies in providing a realistic methodological evaluation of machine learning models applied to small-sized clinical datasets, highlighting that limited sample size and training–testing data partitioning have a substantial impact on model performance and the interpretation of evaluation metrics. These findings provide an important methodological reference for future studies aiming to develop more reliable early diabetes prediction models under constrained clinical data conditions.

Downloads

Download data is not yet available.

References

H. Zhou, S. Rahman, M. Angelova, C. R. Bruce, and C. Karmakar, “A robust and generalized framework in diabetes classification across heterogeneous environments,” Comput. Biol. Med., vol. 186, no. January, p. 109720, 2025, doi: 10.1016/j.compbiomed.2025.109720.

I. Kurniastuti, A. Andini, and M. R. Dwisapta, “Implementation of Neural Network for Classification of Diabetes Mellitus through Finger Nail Image,” Procedia Comput. Sci., vol. 234, no. 2023, pp. 1625–1632, 2024, doi: 10.1016/j.procs.2024.03.166.

M. S. Reza, R. Amin, R. Yasmin, W. Kulsum, and S. Ruhi, “Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data,” Heliyon, vol. 10, no. 2, p. e24536, 2024, doi: 10.1016/j.heliyon.2024.e24536.

J. Wei et al., “Metadata information and fundus image fusion neural network for hyperuricemia classification in diabetes,” Comput. Methods Programs Biomed., vol. 256, no. July, pp. 1–12, 2024, doi: 10.1016/j.cmpb.2024.108382.

M. Z. Atwany, A. H. Sahyoun, and M. Yaqub, “Deep Learning Techniques for Diabetic Retinopathy Classification: A Survey,” IEEE Access, vol. 10, pp. 28642–28655, 2022, doi: 10.1109/ACCESS.2022.3157632.

F. A. Khan, K. Zeb, M. Al-Rakhami, A. Derhab, and S. A. C. Bukhari, “Detection and Prediction of Diabetes Using Data Mining: A Comprehensive Review,” IEEE Access, vol. 9, pp. 43711–43735, 2021, doi: 10.1109/ACCESS.2021.3059343.

F. Ariska, V. Sihombing, and I. Irmayani, “Student Graduation Predictions Using Comparison of C5.0 Algorithm With Linear Regression,” SinkrOn, vol. 7, no. 1, pp. 256–266, Feb. 2022, doi: 10.33395/sinkron.v7i1.11261.

S. Tilki, H. B. Dogru, A. A. Hameed, A. Jamil, and J. Rasheed, “Gender Classification using Deep Learning Techniques,” Manchester J. Artif. Intell. Appl. Sci., vol. 2, no. May, 2021.

M. Sinsirimongkhon, S. Arwatchananukul, and P. Temdee, “Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes,” J. Mob. Multimed., vol. 19, no. 3, pp. 799–822, 2023, doi: 10.13052/jmm1550-4646.1937.

S. Uhl, A. Choure, B. Rouse, A. Loblack, and P. Reaven, “Effectiveness of Continuous Glucose Monitoring on Metrics of Glycemic Control in Type 2 Diabetes Mellitus: A Systematic Review and Meta-analysis of Randomized Controlled Trials,” J. Clin. Endocrinol. Metab., vol. 109, no. 4, pp. 1119–1131, 2024, doi: 10.1210/clinem/dgad652.

S. Apandi et al., “Classification of Lung Diseases Using the Desicison Tree Method,” Formosa J. Sci. Technol., vol. 4, no. 1, pp. 393–412, Jan. 2025, doi: 10.55927/fjst.v4i1.13442.

P. I. Ritharson, K. Raimond, X. A. Mary, J. E. Robert, and A. J, “DeepRice: A deep learning and deep feature based classification of Rice leaf disease subtypes,” Artif. Intell. Agric., vol. 11, pp. 34–49, Mar. 2024, doi: 10.1016/j.aiia.2023.11.001.

L. Otero Sanchez et al., “A machine learning-based classification of adult-onset diabetes identifies patients at risk of liver-related complications,” JHEP Reports, vol. 5, no. 8, p. 100791, 2023, doi: 10.1016/j.jhepr.2023.100791.

Y. Zhang et al., “A new classification method for gestational diabetes mellitus: a study on the relationship between abnormal blood glucose values at different time points in oral glucose tolerance test and adverse maternal and neonatal outcomes in pregnant women with gest,” AJOG Glob. Reports, vol. 4, no. 4, p. 100390, 2024, doi: 10.1016/j.xagr.2024.100390.

F. J. Lara-Abelenda, D. Chushig-Muzo, C. B. Acosta, A. M. Wägner, C. Granja, and C. Soguero-Ruiz, “Evaluating Time Series Classification Models for Nocturnal Hypoglycemia: From Predictive Performance to Environmental Impact,” IEEE Access, vol. 13, no. September, pp. 150756–150771, 2025, doi: 10.1109/ACCESS.2025.3600917.

P. Rosyani, S. Saprudin, and R. Amalia, “Klasifikasi Citra Menggunakan Metode Random Forest dan Sequential Minimal Optimization (SMO),” J. Sist. dan Teknol. Inf., vol. 9, no. 2, p. 132, 2021, doi: 10.26418/justin.v9i2.44120.

O. U. Lenz, H. Bollaert, and C. Cornelis, “A unified weighting framework for evaluating nearest neighbour classification,” Fuzzy Sets Syst., vol. 519, Nov. 2025, doi: 10.1016/j.fss.2025.109516.

R. Amalia, A. F. Zaidan, S. Ramadhan, F. Septian, A. M. Aqsha, and P. Rosyani, “Classification of Autoimmune Diseases Using the K-Nearest Neighbors Algorithm,” Formosa J. Sci. Technol., vol. 4, no. 1, pp. 337–348, Jan. 2025, doi: 10.55927/fjst.v4i1.13443.

R. Pambudi, A. R. Harahap, F. D. Saputra, and M. Jusub, “Klasifikasi Penyakit Paru-paru Menggunakan Metode Decision Tree,” vol. 3, no. 9, pp. 2397–2402, 2024.

S. Dwi, Y. Kusuma, H. Al Islami, and D. P. Rosyani, “Penerapan Naive Bayes Untuk Klasifikasi Penyakit Endokrin Pada Pasien Lansia,” vol. 5, no. 2, 2024, doi: 10.31284/j.kernel.2024.

S. Raj, S. Raj, V. Namdeo, and A. Srivastava, “Decoding the gene-disease associations in type 2 diabetes: A curated dataset for text mining-based classification,” Data Br., vol. 54, 2024, doi: 10.1016/j.dib.2024.110418.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes

Dimensions Badge

ARTICLE HISTORY

Published: 2025-12-31

Abstract View: 13 times
PDF Download: 3 times

How to Cite

Astofa, A. ., Rosyani, P. ., Rahmawati, R., & Apandi, S. (2025). Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes. Bulletin of Computer Science Research, 6(1), 558-565. https://doi.org/10.47065/bulletincsr.v6i1.859

Issue

Section

Articles