Penerapan Metode ADASYN Dalam Mengatasi Imbalanced Data Untuk Klasifikasi Penyakit Stroke Menggunakan Support Vector Machine

Alwaliyanto Alwaliyanto; Siska Kurnia Gusti; Iis Afrianty; Fadhilah Syafria

doi:10.47065/bulletincsr.v5i4.612

Authors

Alwaliyanto Alwaliyanto Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Siska Kurnia Gusti Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Iis Afrianty Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Fadhilah Syafria Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v5i4.612

Keywords:

Adaptive Synthetic Sampling Approach; Imbalanced Data; K-fold Cross Validation; Stroke; Support Vector Machine

Abstract

Stroke is one of the leading causes of death and disability worldwide, making it essential to develop classification models that can assist in early and accurate diagnosis. This study aims to implement the Support Vector Machine (SVM) algorithm with three types of kernels linear, polynomial, and Radial Basis Function (RBF) to classify stroke disease data. The Adaptive Synthetic Sampling (ADASYN) method is employed to address the class imbalance problem, while model training and evaluation are carried out using 5-Fold Cross-Validation to ensure stable and reliable results. The findings indicate that ADASYN successfully improves the model’s sensitivity to stroke cases (the minority class), as reflected by an increase in recall and F1-score, despite a slight decrease in overall accuracy a common trade-off in handling imbalanced data. The linear kernel (after ADASYN) achieved the best performance after imbalance handling, with an average AUC-ROC of 0.8333, recall of 0.7827, and F1-score of 0.2181 for the stroke class. Although the F1-score remains relatively low, it improved compared to the pre-ADASYN results, indicating better detection of stroke cases. The implementation was conducted using Google Colab, which also contributed to efficient data processing and visualization. Overall, the results demonstrate that the combination of SVM and ADASYN is effective in enhancing the model’s sensitivity to minority classes and is well-suited for medical data classification tasks, particularly in the early diagnosis of stroke using machine learning approaches.

Downloads

Download data is not yet available.

References

D. E. Cahyani, “Penerapan Machine Learning Untuk Prediksi Penyakit Stroke,” J. Kaji. Mat. dan Apl., 2022, doi: 10.17977/um055v3i12022p15-22.

Y. Azhar, A. K. Firdausy, and P. J. Amelia, “Perbandingan Algoritma Klasifikasi Data Mining Untuk Prediksi Penyakit Stroke,” SINTECH (Science Inf. Technol. J., 2022, doi: 10.31598/sintechjournal.v5i2.1222.

E. Firmawati, E. Rochmawati, and I. Setyopranoto, “Deteksi Risiko Stroke Dan Edukasi Sebagai Upaya Pencegahan Primer Terjadinya Stroke,” J. SOLMA, 2023, doi: 10.22236/solma.v12i2.11834.

A. M. Ramadhan, J. S. Zahra, K. Al Rasyid, and D. O. W. Nugroho, “Aplikasi Forecasting Risiko Terkena Penyakit Stroke Menggunakan Program R-Shiny,” J. Sains dan Seni ITS, 2022, doi: 10.12962/j23373520.v11i3.62543.

Ardi Ramdani, Christian Dwi Sofyan, Fauzi Ramdani, Muhamad Fauzi Arya Tama, and Muhammad Angga Rachmatsyah, “Algoritma Klasifikasi Data Mining Untuk Memprediksi Masyarakat Dalam Menerima Bantuan Sosial,” J. Ilm. Sist. Inf., 2022, doi: 10.51903/juisi.v1i2.363.

K. Fithriasari, I. Hariastuti, and K. S. Wening, “Handling Imbalance Data in Classification Model with Nominal Predictors,” Int. J. Comput. Sci. Appl. Math., 2020, doi: 10.12962/j24775401.v6i1.6643.

Rahel Lina Simanjuntak, Rizki Agung Ramadhan, Theresia Romauli Siagian, and Vina Anggriani, “Komparasi Algoritma KNN dan SVM dalam Memprediksi Penyakit Stroke,” J. Tek. Mesin, Elektro dan Ilmu Komput., vol. 3, no. 3, pp. 60–74, 2023, doi: 10.55606/teknik.v3i3.2474.

U. Amelia, J. Indra, and A. F. N. Masruriyah, “Implementasi Algoritma Support Vector Machine (Svm) Untuk Prediksi Penyakit Stroke Dengan Atribut Berpengaruh,” Sci. Student J. Information, Technol. Sci., vol. III, no. 2, pp. 254–259, 2022.

M. Khushi et al., “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, 2021, doi: 10.1109/ACCESS.2021.3102399.

I. W. Dharmana, I. G. A. Gunadi, and L. J. E. Dewi, “Deteksi Transaksi Fraud Kartu Kredit Menggunankan Oversampling ADASYN dan Seleksi Fitur SVM-RFECV,” J. Teknol. Inf. dan Ilmu Komput., 2024, doi: 10.25126/jtiik.20241117640.

R. M. Munshi, “Novel ensemble learning approach with SVM-imputed ADASYN features for enhanced cervical cancer prediction,” PLoS One, 2024, doi: 10.1371/journal.pone.0296107.

I. Pratama, A. Y. Chandra, and P. T. Presetyaningrum, “Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN,” J. Eksplora Inform., 2022, doi: 10.30864/eksplora.v11i1.578.

A. A. Rahman, S. S. Prasetiyowati, and Y. Sibaroni, “Performance Analysis Of The Imbalanced Data Method On Increasing The Classification Accuracy Of The Machine Learning Hybrid Method,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., 2023, doi: 10.29100/jipi.v8i1.3286.

B. J. Jansen, K. K. Aldous, J. Salminen, H. Almerekhi, and S. gyo Jung, “Data Preprocessing,” in Synthesis Lectures on Information Concepts, Retrieval, and Services, 2024. doi: 10.1007/978-3-031-41933-1_6.

C. Herdian, A. Kamila, and I. G. Agung Musa Budidarma, “Studi Kasus Feature Engineering Untuk Data Teks: Perbandingan Label Encoding dan One-Hot Encoding Pada Metode Linear Regresi,” Technol. J. Ilm., 2024, doi: 10.31602/tji.v15i1.13457.

V. Werner de Vargas, J. A. Schneider Aranda, R. dos Santos Costa, P. R. da Silva Pereira, and J. L. Victória Barbosa, “Imbalanced data preprocessing techniques for machine learning: a systematic mapping study,” Knowl. Inf. Syst., 2023, doi: 10.1007/s10115-022-01772-8.

R. Mia et al., “Exploring Machine Learning for Predicting Cerebral Stroke: A Study in Discovery,” Electron., 2024, doi: 10.3390/electronics13040686.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput. Oper. Res., 2023, doi: 10.1016/j.cor.2022.106131.

F. O. Awalullaili, D. Ispriyanti, and T. Widiharih, “Klasifikasi Penyakit Hipertensi Menggunakan Metode Svm Grid Search Dan Svm Genetic Algorithm (Ga),” J. Gaussian, 2023, doi: 10.14710/j.gauss.11.4.488-498.

Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” J. Komput. dan Inform., 2022, doi: 10.35508/jicon.v10i1.6554.

G. Abdurrahman, “Klasifikasi Kanker Payudara Menggunakan Algoritma SVM dengan Kernel RBF, Linier, dan Sigmoid,” JUSTIFY J. Sist. Inf. Ibrahimy, 2023, doi: 10.35316/justify.v2i1.3370.

D. Nurlaily, Y. P. Irfandi, N. Santoso, S. Qomariyah, and D. Wibowo, “Classification of Hepatitis Patients Using Logistic Regression and Support Vector Machines Methods,” J. Pendidik. Mat., 2022, doi: 10.21043/jpmk.v5i2.17052.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Penerapan Metode ADASYN Dalam Mengatasi Imbalanced Data Untuk Klasifikasi Penyakit Stroke Menggunakan Support Vector Machine

Penerapan Metode ADASYN Dalam Mengatasi Imbalanced Data Untuk Klasifikasi Penyakit Stroke Menggunakan Support Vector Machine

Authors

DOI:

Keywords:

Abstract

Downloads

References

ARTICLE HISTORY

How to Cite

Issue

Section

Most read articles by the same author(s)