Pendekatan Hybrid K-Means SMOTE dan Logistic Regression Untuk Deteksi Dini Diabetes Mellitus Pada Imbalanced Data
DOI:
https://doi.org/10.47065/bulletincsr.v5i3.502Keywords:
Diabetes Mellitus; K-Means SMOTE; Logistic Regression; Medical Classification; Imbalanced DataAbstract
The increasing global prevalence of Diabetes Mellitus necessitates more accurate early detection efforts, particularly through machine learning-based approaches. However, one of the main challenges in medical classification lies in data imbalance, where the number of diabetic cases is significantly lower than that of non-diabetic ones. This study aims to develop a hybrid model by integrating Logistic Regression and K-Means SMOTE to enhance the sensitivity of early detection for Diabetes Mellitus, especially toward the minority class. Logistic Regression is chosen for its computational efficiency and interpretability, while K-Means SMOTE plays a role in balancing class distribution by generating synthetic samples in a structured manner based on clusters of minority class data. The dataset used consists of 2,000 records with 9 health-related features, obtained from the Kaggle platform. Evaluation results indicate that the model utilizing K-Means SMOTE achieves the best performance, with an accuracy of 82.00%, an F1-score of 72.73% for the Diabetes class, and the highest ROC-AUC score of 87.48%. Compared to models without oversampling and with standard SMOTE, this approach improves model generalization and sensitivity to positive cases. These findings have practical implications for the development of fairer and more effective machine learning-based early detection systems, particularly for implementation in healthcare facilities with limited resources.
Downloads
References
N. Singh, A. Kumari, and L. Kishore, “New-insight Management Implications of Diabetic Autonomic Neuropathy: Future Perspectives,” Int. J. Res. Pharm. Allied Sci., vol. 3, no. 6, pp. 63–71, 2024, doi: 10.71431/IJRPAS.2025.4106.
Reuters, “More than 800 million adults have diabetes globally, many untreated, study suggests,” reuters.com. Accessed: Apr. 15, 2025. [Online]. Available: https://www.reuters.com/business/healthcare-pharmaceuticals/more-than-800-million-adults-have-diabetes-globally-many-untreated-study-2024-11-13
A. Aminuddin, Yenny Sima, Nurril Cholifatul Izza, Nur Syamsi Norma Lalla, and Darmi Arda, “Edukasi Kesehatan Tentang Penyakit Diabetes Melitus bagi Masyarakat,” Abdimas Polsaka, pp. 7–12, 2023, doi: 10.35816/abdimaspolsaka.v2i1.25.
R. Rianto and P. I. Santosa, Data Preparation untuk Machine Learning & Deep Learning. Yogyakarta: Penerbit Andi, 2024.
V. R. Konasani and S. Kadre, Machine Learning and Deep Learning Using Python and TensorFlow. New York: McGraw Hill LLC, 2021.
L. Safitri and Z. Fatah, “Implementasi Prediksi Penyakit Diabetes Menggunakan Metode Decision Tree,” JUSIFOR J. Sist. Inf. dan Inform., vol. 2, no. 2, pp. 125–132, 2023, doi: 10.70609/jusifor.v3i2.5788 .
A. W. Mucholladin, F. A. Bachtiar, and M. T. Furqon, “Klasifikasi Penyakit Diabetes menggunakan Metode Support Vector Machine,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 2, pp. 622–633, 2021.
N. Maulidah, R. Supriyadi, D. Y. Utami, F. N. Hasan, A. Fauzi, and A. Christian, “Prediksi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine dan Naive Bayes,” Indones. J. Softw. Eng., vol. 7, no. 1, pp. 63–68, 2021, doi: 10.31294/ijse.v7i1.10279.
S. P. Nainggolan and A. Sinaga, “Comparative Analysis of Accuracy of Random Forest and Gradient Boosting Classifier Algorithm for Diabetes Classification,” Sebatik, vol. 27, no. 1, pp. 97–102, 2023, doi: 10.46984/sebatik.v27i1.2157.
A. P. Silalahi and H. G. Simanullang, “Supervised Learning Metode K-Nearest Neighbor Untuk Prediksi Diabetes Pada Wanita,” METHOMIKA J. Manaj. Inform. dan Komputerisasi Akunt., vol. 7, no. 1, pp. 144–149, 2023, doi: 10.46880/jmika.vol7no1.pp144-149.
S. Sutarman, R. Siringoringo, D. Arisandi, E. Kurniawan, and E. B. Nababan, “Model Klasifikasi Dengan Logistic Regression Dan Recursive Feature Elimination Pada Data Tidak Seimbang,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 735–742, 2024, doi: 10.25126/jtiik.1148198.
C. Haryawan and Y. M. K. Ardhana, “Analisa Perbandingan Teknik Oversampling SMOTE Pada Imbalanced Data,” J. Inform. dan Rekayasa Elektron., vol. 6, no. 1, pp. 73–78, 2023, doi: 10.36595/jire.v6i1.834.
N. Indrani et al., “Classification of Natural Disaster Reports from Social Media using K-Means SMOTE and Multinomial Naïve Bayes,” J. Comput. Sci. Informatics Eng., vol. 7, no. 1, pp. 60–67, 2023, doi: 10.29303/jcosine.v7i1.503.
C. V. Angkoso, M. A. N. Thrisna, B. D. Satoto, and A. Kusumaningsih, “Optimasi Klasifikasi Sentimen Menggunakan Random Forest dengan Preprocessing K-Means Clustering dan SMOTE,” JEPIN (Jurnal Edukasi dan Penelit. Inform., vol. 10, no. 3, pp. 389–400, 2024.
R. I. Borman, F. Rossi, Y. Jusman, A. A. A. Rahni, S. D. Putra, and A. Herdiansah, “Identification of Herbal Leaf Types Based on Their Image Using First Order Feature Extraction and Multiclass SVM Algorithm,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2021, pp. 12–17.
J. Dasilva, “Diabetes Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/johndasilva/diabetes
R. I. Borman, D. E. Kurniawan, Styawati, I. Ahmad, and D. Alita, “Classification of Maturity Levels of Palm Fresh Fruit Bunches Using the Linear Discriminant Analysis Algorithm,” AIP Conf. Proc., vol. 2665, no. 1, pp. 30023.1-30023.8, 2023, doi: 10.1063/5.0126513.
A. Bisri and M. Man, “Machine Learning Algorithms Based on Sampling Techniques for Raisin Grains Classification,” Int. J. Informatics Vis., vol. 7, no. 1, pp. 7–14, 2023, doi: 10.30630/joiv.7.1.970.
X. Zhu et al., “An automatic identification method of imbalanced lithology based on Deep Forest and K-means SMOTE,” Geoenergy Sci. Eng., vol. 224, no. February, p. 211595, 2023, doi: 10.1016/j.geoen.2023.211595.
W. F. Hidayat, T. Asra, and A. Setiadi, “Klasifikasi Penyakit Daun Kentang Menggunakan Model Logistic Regression,” Indones. J. Softw. Eng., vol. 8, no. 2, pp. 173–179, 2022.
S. Suhliyyah, H. H. Handayani, and K. A. Baihaqi, “Implementasi Algoritma Logistic Regression Untuk Klasifikasi Penyakit Stroke,” Syntax J. Inform., vol. 12, no. 01, pp. 15–23, 2023.
Z. Abidin, R. I. Borman, F. B. Ananda, P. Prasetyawan, F. Rossi, and Y. Jusman, “Classification of Indonesian Traditional Snacks Based on Image Using Convolutional Neural Network (CNN) Algorithm,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2022, pp. 18–23.
Y. Liu, Y. Li, and D. Xie, “Implications of imbalanced datasets for empirical ROC-AUC estimation in binary classification tasks,” J. Stat. Comput. Simul., vol. 94, no. 1, pp. 183–203, Jan. 2024, doi: 10.1080/00949655.2023.2238235.
H. Hairani, “Peningkatan Kinerja Metode SVM Menggunakan Metode KNN Imputasi dan K-Means-SMOTE untuk Klasifikasi Kelulusan Mahasiswa Universitas Bumigora,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 4, pp. 713–718, 2021, doi: 10.25126/jtiik.2021843428.
S. rahmah Jabir, H. Azis, D. Widyawatia, and A. U. Tenripada, “Prediksi Potensi Donatur Menggunakan Model Logistic Regression,” Indones. J. Data Sci., vol. 4, no. 1, pp. 31–37, 2023, doi: 10.56705/ijodas.v4i1.64.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Pendekatan Hybrid K-Means SMOTE dan Logistic Regression Untuk Deteksi Dini Diabetes Mellitus Pada Imbalanced Data
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2025 Abdus Salam, Lukman Azhari, Ri Sabti Septarini, Nofitri Heriyani

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).