OPTIMALISASI DIAGNOSIS STROKE DENGAN ALGORITMA C4.5 DAN STRATEGI IMPUTASI k-NN UNTUK MENGATASI MISSING VALUE

Zainal Abidin; Teguh Tamrin; Vilanda Harsono; Duwi Nur Aziza; Istianti Kansania

doi:10.34001/jdpt.v15i2.6701

OPTIMALISASI DIAGNOSIS STROKE DENGAN ALGORITMA C4.5 DAN STRATEGI IMPUTASI k-NN UNTUK MENGATASI MISSING VALUE

Zainal Abidin, Teguh Tamrin, Vilanda Harsono, Duwi Nur Aziza, Istianti Kansania

Abstract

Penyakit yang menyerang pembuluh darah didalam Otak(Stroke), mengakibatkan terhambatya aliran darah dan oksigen ke otak. Gejala stroke bisa berbeda-beda, namu umumnya meliputi kelemahan atau mati rasa pada wajah, lengan, atau kaki, kebingungan, kesulitan berbicara, dan kesulitan berjalan. Diagnosa stroke yang cepat dan tepat sangatlah penting untuk mendapatkan penanganan yang tepat dan mencegah komplikasi. Salah satu factor untuk mendiagnosis stroke dengan cepat dan akurat dengan menggunakan penerapan algoritma C4.5. Algoritma C4.6 yaitu algoritma klasifikasi yang efektif digunakan untuk membangun pohon keputusan dalam memprediksi. Penelitian ini menggunakan data dari Kaggle stroke prediksi dengan jumlah 15.000 record, 22 atribut dan 2500 data hilang . hasil penelitian menunjukan bahwa algoritma C4.5 dapat digunakan untuk membangun system diagnosis gejala penyakit stroke yang akurat. System ini mampu mengkatogorikan pasien stroke dengan metode imputasi k-NN dengan nilai akurasi 91.40%. Pohon keputusan algoritma C4.5 juga dapat digunakan untuk memenuhi factor yang penting dalam diagnosis stroke.

OPTIMIZATION OF STROKE DIAGNOSIS USING C4.5 ALGORITHM AND k-NN IMPUTATION STRATEGY TO OVERCOME MISSING VALUE

the brain. Stroke symptoms can vary but generally include weakness or numbness in the face, arms, or legs, confusion, difficulty speaking, and difficulty walking. Rapid and accurate diagnosis of stroke is crucial to obtain appropriate treatment and prevent complications. One factor for diagnosing stroke quickly and accurately is the application of the C4.5 algorithm. The C4.5 algorithm is a classification algorithm effectively used to build decision trees for prediction. This study uses data from the Kaggle stroke prediction dataset with a total of 297520 records 22 atribut and 2500 missing value. The results of the study indicate that the C4.5 algorithm can be used to build an accurate stroke symptom diagnosis system. This system can categorize stroke patients with an accuracy rate of K-NN Imputation K-NN with classification values of accuration 91,40%.Â The C4.5 algorithm decision tree can also be used to fulfill important factors in stroke diagnosis.

Keywords

Stroke; Klasifikasi; Algoritma C4.5; K-NN; Strokel; Classification; Algorithm C4.5; K-NN

Full Text:

PDF

References

S. R. Laily, â€œHubungan Karakteristik Penderita dan Hipertensi dengan Kejadian Stroke Iskemik Relationship Between Characteristic and Hypertension With Incidence of Ischemic Stroke,â€ Berkali Epidemiol., vol. 5, no. February, pp. 48â€“59, 2018, doi: 10.20473/jbe.v5i1.

R. S. Rohman, R. A. Saputra, and D. A. Firmansaha, â€œKomparasi Algoritma C4.5 Berbasis PSO Dan GA Untuk Diagnosa Penyakit Stroke,â€ CESS (Journal Comput. Eng. Syst. Sci., vol. 5, no. 1, p. 155, 2020, doi: 10.24114/cess.v5i1.15225.

N. Gusrialni Fitri, S. Adilya, and F. Azizi, â€œComparison of the Naive Bayes Classification System and C4.5 for the Diagnosis of Stroke Perbandingan Sistem Klasifikasi Naive Bayes dan C4.5 Untuk Diagnosa Penyakit stoke,â€ SENTIMAS Semin. Nas. Penelit. dan Pengabdi. Masy., pp. 49â€“55, 2023.

Suryani, D. Rahmadani, A. A. Muzafar, A. Hamid, R. Annisa, and Mustakim, â€œAnalisis Perbandingan Algoritma C4.5 dan CART untuk Klasifikasi Penyakit Stroke,â€ SENTIMAS Semin. Nas. Penelit. dan Pengabdi. Masy., pp. 197â€“206, 2022, [Online]. Available: https://journal.irpi.or.id/index.php/sentimas

Prita Prita, I Made Lana Prasetya, and Rahmat Widodo, â€œProsedur Pemeriksaan MRI Brain Pada Kasus Stroke Hemoragik,â€ J. Ris. Rumpun Ilmu Kedokt., vol. 2, no. 2, pp. 82â€“91, 2023, doi: 10.55606/jurrike.v2i2.1859.

P. K. Kognisi et al., â€œNo ä¸»è¦³çš„å¥åº·æ„Ÿã‚’ä¸å¿ƒã¨ã—ãŸåœ¨å®…é«˜é½¢è€…ã«ãŠã‘ã‚‹ å¥åº·é–¢é€£æŒ‡æ¨™ã«é–¢ã™ã‚‹å…±åˆ†æ•£æ§‹é€ åˆ†æžTitle,â€ Ind. High. Educ., vol. 3, no. 1, pp. 1689â€“1699, 2021, [Online]. Available: http://journal.unilak.ac.id/index.php/JIEB/article/view/3845%0Ahttp://dspace.uc.ac.id/handle/123456789/1288

A. Purwar and S. K. Singh, â€œHybrid prediction model with missing value imputation for medical data,â€ Expert Syst. Appl., vol. 42, no. 13, pp. 5621â€“5631, 2015, doi: 10.1016/j.eswa.2015.02.050.

D. Setsirichok et al., â€œClassification of complete blood count and haemoglobin typing data by a C4.5 decision tree, a naive Bayes classifier and a multilayer perceptron for thalassaemia screening,â€ Biomed. Signal Process. Control, vol. 7, no. 2, pp. 202â€“212, 2012, doi: 10.1016/j.bspc.2011.03.007.

P. Duchessi and E. J. M. LaurÃa, â€œDecision tree models for profiling ski resortsâ€™ promotional and advertising strategies and the impact on sales,â€ Expert Syst. Appl., vol. 40, no. 15, pp. 5822â€“5829, 2013, doi: 10.1016/j.eswa.2013.05.017.

Y. Sahin, S. Bulkan, and E. Duman, â€œA cost-sensitive decision tree approach for fraud detection,â€ Expert Syst. Appl., vol. 40, no. 15, pp. 5916â€“5923, 2013, doi: 10.1016/j.eswa.2013.05.021.

M. Ture, F. Tokatli, and I. Kurt, â€œUsing Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients,â€ Expert Syst. Appl., vol. 36, no. 2 PART 1, pp. 2017â€“2026, 2009, doi: 10.1016/j.eswa.2007.12.002.

D. Williams, X. Liao, Y. Xue, L. Carin, and B. Krishnapuram, â€œOn classification with incomplete data,â€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 427â€“436, 2007, doi: 10.1109/TPAMI.2007.52.

M. Saar-Tsechansky and F. Provost, â€œHandling Missing Values when Applying Classification Models,â€ J. Mach. Learn. Res., vol. 8, pp. 1625â€“1657, 2007, doi: 10.1.1.72.3271.

R. J. Hathaway and J. C. Bezdek, â€œClustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm,â€ Pattern Recognit. Lett., vol. 23, no. 1â€“3, pp. 151â€“160, 2002, doi: 10.1016/S0167-8655(01)00115-5.

A. Aussem and S. Rodrigues de Morais, â€œA conservative feature subset selection algorithm with missing data,â€ Neurocomputing, vol. 73, no. 4â€“6, pp. 585â€“590, 2010, doi: 10.1016/j.neucom.2009.05.019.

Y. Qin, S. Zhang, X. Zhu, J. Zhang, and C. Zhang, â€œSemi-parametric optimization for missing data imputation,â€ Appl. Intell., vol. 27, no. 1, pp. 79â€“88, 2007, doi: 10.1007/s10489-006-0032-0.

E. R. Hruschka, E. R. Hruschka, and N. F. F. Ebecken, â€œBayesian networks for imputation in classification problems,â€ J. Intell. Inf. Syst., vol. 29, no. 3, pp. 231â€“252, 2007, doi: 10.1007/s10844-006-0016-x.

A. Farhangfar, L. Kurgan, and J. Dy, â€œImpact of imputation of missing values on classification error for discrete data,â€ vol. 41, pp. 3692â€“3705, 2008, doi: 10.1016/j.patcog.2008.05.019.

Q. Song, M. Shepperd, X. Chen, and J. Liu, â€œCan k -NN Imputation Improve the Performance of C4 . 5 With Small Software Project Data Sets ? A Comparative Evaluation,â€ pp. 1â€“31, 2008.

B. Twala, â€œAn Empirical Comparison of Techniques for Handling Incomplete Data Using Decision Trees,â€ Model. Digit. Intel., no. Ml, pp. 1â€“35, 1998.

G. Batista and M. C. Monard, â€œA Study of K-Nearest Neighbour as an Imputation Method,â€ Hybrid Intell. Syst., vol. 87, no. 48, pp. 251â€“260, 2002.

J. W. Grzymala-Busse, â€œA comparison of traditional and rough set approaches to missing attribute values in data mining,â€ in WIT Transactions on Information and Communication Technologies, May 2009, pp. 155â€“163. doi: 10.2495/DATA090161.

O. Troyanskaya et al., â€œMissing value estimation methods for DNA microarrays,â€ Bioinformatics, vol. 17, no. 6, pp. 520â€“525, 2001, doi: 10.1093/bioinformatics/17.6.520.

P. J. GarcÃa-Laencina, J.-L. Sancho-GÃ³mez, and A. R. Figueiras-Vidal, â€œPattern classification with missing data: a review,â€ Neural Comput. Appl., vol. 19, no. 2, pp. 263â€“282, 2009, doi: 10.1007/s00521-009-0295-6.

C. J. Mantas and J. AbellÃ¡n, â€œCredal-C4.5: Decision tree based on imprecise probabilities to classify noisy data,â€ Expert Syst. Appl., vol. 41, no. 10, pp. 4625â€“4637, 2014, doi: 10.1016/j.eswa.2014.01.017.

F. Gorunescu, Data Mining Concept, Model and Techniques, vol. 12. in Intelligent Systems Reference Library, vol. 12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. doi: 10.1007/978-3-642-19721-5.

T. Pang-Ning, M. Steinbach, and V. Kumar, â€œIntroduction to data mining,â€ Libr. Congr., p. 796, 2006, doi: 10.1016/0022-4405(81)90007-8.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Elsevier Inc., 2012.

D. T. Larose, Data Mining Methods and Models. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2005. doi: 10.1002/0471756482.

DOI: https://doi.org/10.34001/jdpt.v15i2.6701

Article Metrics

Abstract view : 35 times

PDF - 14 times

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Disprotek Indexed by:

Visitor Statistics

DISPROTEK: Journal of Informatics Engineering, Information Systems, Electrical Engineering, Industrial Engineering, Civil Engineering, and Aquaculture is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me