OPTIMALISASI DIAGNOSIS STROKE DENGAN ALGORITMA C4.5 DAN STRATEGI IMPUTASI k-NN UNTUK MENGATASI MISSING VALUE

Zainal Abidin, Teguh Tamrin, Vilanda Harsono, Duwi Nur Aziza, Istianti Kansania

Abstract


Penyakit yang menyerang pembuluh darah didalam Otak(Stroke), mengakibatkan terhambatya aliran darah dan oksigen ke otak. Gejala stroke bisa berbeda-beda, namu umumnya meliputi kelemahan atau mati rasa pada wajah, lengan, atau kaki, kebingungan, kesulitan berbicara, dan kesulitan berjalan. Diagnosa stroke yang cepat dan tepat sangatlah penting untuk mendapatkan penanganan yang tepat dan mencegah komplikasi. Salah satu factor untuk mendiagnosis stroke dengan cepat dan akurat dengan menggunakan penerapan algoritma C4.5. Algoritma C4.6 yaitu algoritma klasifikasi yang efektif digunakan untuk membangun pohon keputusan dalam memprediksi. Penelitian ini menggunakan data dari Kaggle stroke prediksi dengan jumlah 15.000 record, 22 atribut dan 2500 data hilang . hasil penelitian menunjukan bahwa algoritma C4.5 dapat digunakan untuk membangun system diagnosis gejala penyakit stroke yang akurat. System ini mampu mengkatogorikan pasien stroke dengan metode imputasi k-NN dengan nilai akurasi 91.40%. Pohon keputusan algoritma C4.5 juga dapat digunakan untuk memenuhi factor yang penting dalam diagnosis stroke.


OPTIMIZATION OF STROKE DIAGNOSIS USING C4.5 ALGORITHM AND k-NN IMPUTATION STRATEGY TO OVERCOME MISSING VALUE

the brain. Stroke symptoms can vary but generally include weakness or numbness in the face, arms, or legs, confusion, difficulty speaking, and difficulty walking. Rapid and accurate diagnosis of stroke is crucial to obtain appropriate treatment and prevent complications. One factor for diagnosing stroke quickly and accurately is the application of the C4.5 algorithm. The C4.5 algorithm is a classification algorithm effectively used to build decision trees for prediction. This study uses data from the Kaggle stroke prediction dataset with a total of 297520 records 22 atribut and 2500 missing value. The results of the study indicate that the C4.5 algorithm can be used to build an accurate stroke symptom diagnosis system. This system can categorize stroke patients with an accuracy rate of K-NN Imputation K-NN with classification values of accuration 91,40%.  The C4.5 algorithm decision tree can also be used to fulfill important factors in stroke diagnosis.


Keywords


Stroke; Klasifikasi; Algoritma C4.5; K-NN; Strokel; Classification; Algorithm C4.5; K-NN

Full Text:

PDF

References


S. R. Laily, “Hubungan Karakteristik Penderita dan Hipertensi dengan Kejadian Stroke Iskemik Relationship Between Characteristic and Hypertension With Incidence of Ischemic Stroke,†Berkali Epidemiol., vol. 5, no. February, pp. 48–59, 2018, doi: 10.20473/jbe.v5i1.

R. S. Rohman, R. A. Saputra, and D. A. Firmansaha, “Komparasi Algoritma C4.5 Berbasis PSO Dan GA Untuk Diagnosa Penyakit Stroke,†CESS (Journal Comput. Eng. Syst. Sci., vol. 5, no. 1, p. 155, 2020, doi: 10.24114/cess.v5i1.15225.

N. Gusrialni Fitri, S. Adilya, and F. Azizi, “Comparison of the Naive Bayes Classification System and C4.5 for the Diagnosis of Stroke Perbandingan Sistem Klasifikasi Naive Bayes dan C4.5 Untuk Diagnosa Penyakit stoke,†SENTIMAS Semin. Nas. Penelit. dan Pengabdi. Masy., pp. 49–55, 2023.

Suryani, D. Rahmadani, A. A. Muzafar, A. Hamid, R. Annisa, and Mustakim, “Analisis Perbandingan Algoritma C4.5 dan CART untuk Klasifikasi Penyakit Stroke,†SENTIMAS Semin. Nas. Penelit. dan Pengabdi. Masy., pp. 197–206, 2022, [Online]. Available: https://journal.irpi.or.id/index.php/sentimas

Prita Prita, I Made Lana Prasetya, and Rahmat Widodo, “Prosedur Pemeriksaan MRI Brain Pada Kasus Stroke Hemoragik,†J. Ris. Rumpun Ilmu Kedokt., vol. 2, no. 2, pp. 82–91, 2023, doi: 10.55606/jurrike.v2i2.1859.

P. K. Kognisi et al., “No 主観的å¥åº·æ„Ÿã‚’中心ã¨ã—ãŸåœ¨å®…高齢者ã«ãŠã‘ã‚‹ å¥åº·é–¢é€£æŒ‡æ¨™ã«é–¢ã™ã‚‹å…±åˆ†æ•£æ§‹é€ 分æžTitle,†Ind. High. Educ., vol. 3, no. 1, pp. 1689–1699, 2021, [Online]. Available: http://journal.unilak.ac.id/index.php/JIEB/article/view/3845%0Ahttp://dspace.uc.ac.id/handle/123456789/1288

A. Purwar and S. K. Singh, “Hybrid prediction model with missing value imputation for medical data,†Expert Syst. Appl., vol. 42, no. 13, pp. 5621–5631, 2015, doi: 10.1016/j.eswa.2015.02.050.

D. Setsirichok et al., “Classification of complete blood count and haemoglobin typing data by a C4.5 decision tree, a naive Bayes classifier and a multilayer perceptron for thalassaemia screening,†Biomed. Signal Process. Control, vol. 7, no. 2, pp. 202–212, 2012, doi: 10.1016/j.bspc.2011.03.007.

P. Duchessi and E. J. M. Lauría, “Decision tree models for profiling ski resorts’ promotional and advertising strategies and the impact on sales,†Expert Syst. Appl., vol. 40, no. 15, pp. 5822–5829, 2013, doi: 10.1016/j.eswa.2013.05.017.

Y. Sahin, S. Bulkan, and E. Duman, “A cost-sensitive decision tree approach for fraud detection,†Expert Syst. Appl., vol. 40, no. 15, pp. 5916–5923, 2013, doi: 10.1016/j.eswa.2013.05.021.

M. Ture, F. Tokatli, and I. Kurt, “Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients,†Expert Syst. Appl., vol. 36, no. 2 PART 1, pp. 2017–2026, 2009, doi: 10.1016/j.eswa.2007.12.002.

D. Williams, X. Liao, Y. Xue, L. Carin, and B. Krishnapuram, “On classification with incomplete data,†IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 427–436, 2007, doi: 10.1109/TPAMI.2007.52.

M. Saar-Tsechansky and F. Provost, “Handling Missing Values when Applying Classification Models,†J. Mach. Learn. Res., vol. 8, pp. 1625–1657, 2007, doi: 10.1.1.72.3271.

R. J. Hathaway and J. C. Bezdek, “Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm,†Pattern Recognit. Lett., vol. 23, no. 1–3, pp. 151–160, 2002, doi: 10.1016/S0167-8655(01)00115-5.

A. Aussem and S. Rodrigues de Morais, “A conservative feature subset selection algorithm with missing data,†Neurocomputing, vol. 73, no. 4–6, pp. 585–590, 2010, doi: 10.1016/j.neucom.2009.05.019.

Y. Qin, S. Zhang, X. Zhu, J. Zhang, and C. Zhang, “Semi-parametric optimization for missing data imputation,†Appl. Intell., vol. 27, no. 1, pp. 79–88, 2007, doi: 10.1007/s10489-006-0032-0.

E. R. Hruschka, E. R. Hruschka, and N. F. F. Ebecken, “Bayesian networks for imputation in classification problems,†J. Intell. Inf. Syst., vol. 29, no. 3, pp. 231–252, 2007, doi: 10.1007/s10844-006-0016-x.

A. Farhangfar, L. Kurgan, and J. Dy, “Impact of imputation of missing values on classification error for discrete data,†vol. 41, pp. 3692–3705, 2008, doi: 10.1016/j.patcog.2008.05.019.

Q. Song, M. Shepperd, X. Chen, and J. Liu, “Can k -NN Imputation Improve the Performance of C4 . 5 With Small Software Project Data Sets ? A Comparative Evaluation,†pp. 1–31, 2008.

B. Twala, “An Empirical Comparison of Techniques for Handling Incomplete Data Using Decision Trees,†Model. Digit. Intel., no. Ml, pp. 1–35, 1998.

G. Batista and M. C. Monard, “A Study of K-Nearest Neighbour as an Imputation Method,†Hybrid Intell. Syst., vol. 87, no. 48, pp. 251–260, 2002.

J. W. Grzymala-Busse, “A comparison of traditional and rough set approaches to missing attribute values in data mining,†in WIT Transactions on Information and Communication Technologies, May 2009, pp. 155–163. doi: 10.2495/DATA090161.

O. Troyanskaya et al., “Missing value estimation methods for DNA microarrays,†Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001, doi: 10.1093/bioinformatics/17.6.520.

P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,†Neural Comput. Appl., vol. 19, no. 2, pp. 263–282, 2009, doi: 10.1007/s00521-009-0295-6.

C. J. Mantas and J. Abellán, “Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data,†Expert Syst. Appl., vol. 41, no. 10, pp. 4625–4637, 2014, doi: 10.1016/j.eswa.2014.01.017.

F. Gorunescu, Data Mining Concept, Model and Techniques, vol. 12. in Intelligent Systems Reference Library, vol. 12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. doi: 10.1007/978-3-642-19721-5.

T. Pang-Ning, M. Steinbach, and V. Kumar, “Introduction to data mining,†Libr. Congr., p. 796, 2006, doi: 10.1016/0022-4405(81)90007-8.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Elsevier Inc., 2012.

D. T. Larose, Data Mining Methods and Models. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2005. doi: 10.1002/0471756482.




DOI: https://doi.org/10.34001/jdpt.v15i2.6701

Article Metrics

Abstract view : 35 times
PDF - 14 times

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Disprotek Indexed by:

1 Google Scholar  2 BASe3 Onsesearch 4 Garuda 5 Sinta 6 Dimensions7 Crossref 8 JurnalStories 9 ROAD 10 ICE11 ORCID

Visitor Statistics
Web
Analytics Made Easy - StatCounter
Flag Counter

Lisensi Creative Commons

DISPROTEK: Journal of Informatics Engineering, Information Systems, Electrical Engineering, Industrial Engineering, Civil Engineering, and Aquaculture is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.