Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes
DOI:
https://doi.org/10.47065/bulletincsr.v6i1.859Keywords:
Machine Learning; Early Prediction; Diabetes; Clinical Data; ClassificationAbstract
Diabetes is one of the non-communicable diseases that is often detected at an advanced stage, thereby increasing the risk of serious complications. The application of machine learning has the potential to support early diabetes detection; however, most previous studies have focused on large-scale datasets and high predictive accuracy, while methodological evaluations on small-sized clinical data remain limited. This study aims to evaluate and compare the performance of several machine learning algorithms for early diabetes prediction using a limited clinical dataset, with particular emphasis on analyzing the impact of data characteristics on model performance. The dataset used in this study consists of 22 samples with eight clinical features and one target variable, which were divided into 17 training samples and 5 testing samples. The research stages include data preprocessing, training–testing data splitting, model training, and performance evaluation using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The algorithms evaluated include Logistic Regression, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. The experimental results indicate that none of the evaluated models were able to effectively detect the diabetes class, as reflected by precision, recall, and F1-score values of zero across all models. Although Random Forest and XGBoost achieved an accuracy of 0.6, this value was largely influenced by the dominance of the non-diabetes class in the very limited test set. Correlation analysis further reveals that Glucose, BMI, and Diabetes Pedigree Function are the most influential features associated with diabetes status. The main contribution of this study lies in providing a realistic methodological evaluation of machine learning models applied to small-sized clinical datasets, highlighting that limited sample size and training–testing data partitioning have a substantial impact on model performance and the interpretation of evaluation metrics. These findings provide an important methodological reference for future studies aiming to develop more reliable early diabetes prediction models under constrained clinical data conditions.
Downloads
References
H. Zhou, S. Rahman, M. Angelova, C. R. Bruce, and C. Karmakar, “A robust and generalized framework in diabetes classification across heterogeneous environments,” Comput. Biol. Med., vol. 186, no. January, p. 109720, 2025, doi: 10.1016/j.compbiomed.2025.109720.
I. Kurniastuti, A. Andini, and M. R. Dwisapta, “Implementation of Neural Network for Classification of Diabetes Mellitus through Finger Nail Image,” Procedia Comput. Sci., vol. 234, no. 2023, pp. 1625–1632, 2024, doi: 10.1016/j.procs.2024.03.166.
M. S. Reza, R. Amin, R. Yasmin, W. Kulsum, and S. Ruhi, “Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data,” Heliyon, vol. 10, no. 2, p. e24536, 2024, doi: 10.1016/j.heliyon.2024.e24536.
J. Wei et al., “Metadata information and fundus image fusion neural network for hyperuricemia classification in diabetes,” Comput. Methods Programs Biomed., vol. 256, no. July, pp. 1–12, 2024, doi: 10.1016/j.cmpb.2024.108382.
M. Z. Atwany, A. H. Sahyoun, and M. Yaqub, “Deep Learning Techniques for Diabetic Retinopathy Classification: A Survey,” IEEE Access, vol. 10, pp. 28642–28655, 2022, doi: 10.1109/ACCESS.2022.3157632.
F. A. Khan, K. Zeb, M. Al-Rakhami, A. Derhab, and S. A. C. Bukhari, “Detection and Prediction of Diabetes Using Data Mining: A Comprehensive Review,” IEEE Access, vol. 9, pp. 43711–43735, 2021, doi: 10.1109/ACCESS.2021.3059343.
F. Ariska, V. Sihombing, and I. Irmayani, “Student Graduation Predictions Using Comparison of C5.0 Algorithm With Linear Regression,” SinkrOn, vol. 7, no. 1, pp. 256–266, Feb. 2022, doi: 10.33395/sinkron.v7i1.11261.
S. Tilki, H. B. Dogru, A. A. Hameed, A. Jamil, and J. Rasheed, “Gender Classification using Deep Learning Techniques,” Manchester J. Artif. Intell. Appl. Sci., vol. 2, no. May, 2021.
M. Sinsirimongkhon, S. Arwatchananukul, and P. Temdee, “Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes,” J. Mob. Multimed., vol. 19, no. 3, pp. 799–822, 2023, doi: 10.13052/jmm1550-4646.1937.
S. Uhl, A. Choure, B. Rouse, A. Loblack, and P. Reaven, “Effectiveness of Continuous Glucose Monitoring on Metrics of Glycemic Control in Type 2 Diabetes Mellitus: A Systematic Review and Meta-analysis of Randomized Controlled Trials,” J. Clin. Endocrinol. Metab., vol. 109, no. 4, pp. 1119–1131, 2024, doi: 10.1210/clinem/dgad652.
S. Apandi et al., “Classification of Lung Diseases Using the Desicison Tree Method,” Formosa J. Sci. Technol., vol. 4, no. 1, pp. 393–412, Jan. 2025, doi: 10.55927/fjst.v4i1.13442.
P. I. Ritharson, K. Raimond, X. A. Mary, J. E. Robert, and A. J, “DeepRice: A deep learning and deep feature based classification of Rice leaf disease subtypes,” Artif. Intell. Agric., vol. 11, pp. 34–49, Mar. 2024, doi: 10.1016/j.aiia.2023.11.001.
L. Otero Sanchez et al., “A machine learning-based classification of adult-onset diabetes identifies patients at risk of liver-related complications,” JHEP Reports, vol. 5, no. 8, p. 100791, 2023, doi: 10.1016/j.jhepr.2023.100791.
Y. Zhang et al., “A new classification method for gestational diabetes mellitus: a study on the relationship between abnormal blood glucose values at different time points in oral glucose tolerance test and adverse maternal and neonatal outcomes in pregnant women with gest,” AJOG Glob. Reports, vol. 4, no. 4, p. 100390, 2024, doi: 10.1016/j.xagr.2024.100390.
F. J. Lara-Abelenda, D. Chushig-Muzo, C. B. Acosta, A. M. Wägner, C. Granja, and C. Soguero-Ruiz, “Evaluating Time Series Classification Models for Nocturnal Hypoglycemia: From Predictive Performance to Environmental Impact,” IEEE Access, vol. 13, no. September, pp. 150756–150771, 2025, doi: 10.1109/ACCESS.2025.3600917.
P. Rosyani, S. Saprudin, and R. Amalia, “Klasifikasi Citra Menggunakan Metode Random Forest dan Sequential Minimal Optimization (SMO),” J. Sist. dan Teknol. Inf., vol. 9, no. 2, p. 132, 2021, doi: 10.26418/justin.v9i2.44120.
O. U. Lenz, H. Bollaert, and C. Cornelis, “A unified weighting framework for evaluating nearest neighbour classification,” Fuzzy Sets Syst., vol. 519, Nov. 2025, doi: 10.1016/j.fss.2025.109516.
R. Amalia, A. F. Zaidan, S. Ramadhan, F. Septian, A. M. Aqsha, and P. Rosyani, “Classification of Autoimmune Diseases Using the K-Nearest Neighbors Algorithm,” Formosa J. Sci. Technol., vol. 4, no. 1, pp. 337–348, Jan. 2025, doi: 10.55927/fjst.v4i1.13443.
R. Pambudi, A. R. Harahap, F. D. Saputra, and M. Jusub, “Klasifikasi Penyakit Paru-paru Menggunakan Metode Decision Tree,” vol. 3, no. 9, pp. 2397–2402, 2024.
S. Dwi, Y. Kusuma, H. Al Islami, and D. P. Rosyani, “Penerapan Naive Bayes Untuk Klasifikasi Penyakit Endokrin Pada Pasien Lansia,” vol. 5, no. 2, 2024, doi: 10.31284/j.kernel.2024.
S. Raj, S. Raj, V. Namdeo, and A. Srivastava, “Decoding the gene-disease associations in type 2 diabetes: A curated dataset for text mining-based classification,” Data Br., vol. 54, 2024, doi: 10.1016/j.dib.2024.110418.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2025 Aniq Astofa, Perani Rosyani, Rahmawati Rahmawati, Sopiyan Apandi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













