A Robust Methodology for Stroke Disease Prediction using a Hard Voting Classifier

Nouralhuda Ali Abdulsamad; Ali A. Yassin; Zaid Ameen Abduljabbar; Mohammed S. Hashim; Vincent Omollo Nyangaresi

doi:10.48084/etasr.10292

Authors

Nouralhuda Ali Abdulsamad Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq
Ali A. Yassin Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq
Zaid Ameen Abduljabbar Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq | Department of Business Management, Al-Imam University College, Balad 34011, Iraq | Shenzhen Institute, Huazhong University of Science and Technology, Shenzhen 518000, China
Mohammed S. Hashim Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq
Vincent Omollo Nyangaresi Department of Computer Science and Software Engineering, Jaramogi Oginga Odinga University of Science and Technology, Bondo 40601, Kenya | Department of Applied Electronics, Saveetha School of Engineering, SIMATS, Chennai, Tamil Nadu 602105, India

Volume: 15 | Issue: 3 | Pages: 22830-22836 | June 2025 | https://doi.org/10.48084/etasr.10292

Received: 19 January 2025 | Revised: 20 February 2025 and 09 March 2025 | Accepted: 13 March 2025 | Online:

Corresponding author: Ali A. Yassin

Abstract

The vast majority of strokes are caused by an unexpected occlusion of the blood vessels that supply the brain and the heart arteries. Early detection of the many warning symptoms of stroke can help reduce the severity of the stroke and save the patient's life. Although researchers have proposed a variety of diagnostic methods to detect this disease, the methods currently in use still need further improvement. In this paper, we propose an effective methodology that utilizes a hard voting classifier based on three Machine Learning (ML) models, namely, Random Forest (RF), K-Nearest Neighbors (KNN), and Extra Trees Classifier (ETC). First, a series of data quality improvement procedures were performed using the Synthetic Minority Oversampling Technique (SMOTE) approach for data balancing to ensure an unbiased training process without majority class dominance. Next, we divided the dataset into two parts, a training part and a testing part, and these data were fed to the models used. In the last phase, we implemented four ML algorithms to evaluate their effectiveness and then selected the three most effective models for integration into our proposed hard voting classifier. The hard voting outperformed the results of modern studies with an accuracy of 97.48%, a precision of 0.9802, a recall of 0.9691, and an F1 score of 0.9747. Furthermore, we applied K-fold cross-validation (K=10), which systematically partitions the dataset into multiple subsets, preventing overfitting and providing a robust estimate of model performance across different data splits, where a mean accuracy of 97.1% was achieved.

Keywords:

stroke disease, machine learning, prediction, hard voting classifier, SMOTE, cross-validation

References

J. T. Marbun, Seniman, and U. Andayani, "Classification of stroke disease using convolutional neural network," Journal of Physics: Conference Series, vol. 978, no. 1, Mar. 2018, Art. no. 012092. DOI: https://doi.org/10.1088/1742-6596/978/1/012092

O. Almadani and R. Alshammari, "Prediction of Stroke using Data Mining Classification Techniques," International Journal of Advanced Computer Science and Applications, vol. 9, no. 1, pp. 457–460, Jan. 2018. DOI: https://doi.org/10.14569/IJACSA.2018.090163

V. L. Feigin et al., "World Stroke Organization (WSO): Global Stroke Fact Sheet 2022," International Journal of Stroke, vol. 17, no. 1, pp. 18–29, Jan. 2022. DOI: https://doi.org/10.1177/17474930211065917

S. Negash, P. Musa, D. Vogel, and S. Sahay, "Healthcare information technology for development: improvements in people’s lives through innovations in the uses of technologies," Information Technology for Development, vol. 24, no. 2, pp. 189–197, Apr. 2018. DOI: https://doi.org/10.1080/02681102.2018.1422477

R. S. Jeena and S. Kumar, "Machine intelligence in stroke prediction," International Journal of Bioinformatics Research and Applications, vol. 14, no. 1–2, pp. 29–48, Jan. 2018. DOI: https://doi.org/10.1504/IJBRA.2018.10009163

R. J. Mohammed et al., "A Robust Hybrid Machine and Deep Learning-based Model for Classification and Identification of Chest X-ray Images," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16212–16220, Oct. 2024. DOI: https://doi.org/10.48084/etasr.7828

M. Kong, Q. He, and L. Li, "AI Assisted Clinical Diagnosis & Treatment, and Development Strategy," Strategic Study of Chinese Academy of Engineering, vol. 20, no. 2, pp. 86–91, Apr. 2018. DOI: https://doi.org/10.15302/J-SSCAE-2018.02.013

S. Campagnini, C. Arienti, M. Patrini, P. Liuzzi, A. Mannini, and M. C. Carrozza, "Machine learning methods for functional recovery prediction and prognosis in post-stroke rehabilitation: a systematic review," Journal of NeuroEngineering and Rehabilitation, vol. 19, no. 1, Jun. 2022, Art. no. 54. DOI: https://doi.org/10.1186/s12984-022-01032-4

H. Ahmed, S. F. Abd-el Ghany, E. M. G. Youn, N. F. Omran, and A. A. Ali, "Stroke Prediction using Distributed Machine Learning Based on Apache Spark," International Journal of Advanced Science and Technology, vol. 28, no. 15, pp. 89–97, Nov. 2019.

M. U. Emon, M. S. Keya, T. I. Meghla, Md. M. Rahman, M. S. A. Mamun, and M. S. Kaiser, "Performance Analysis of Machine Learning Approaches in Stroke Prediction," in 2020 4th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore, India, 2020, pp. 1464–1469. DOI: https://doi.org/10.1109/ICECA49313.2020.9297525

T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, "Stroke Disease Detection and Prediction Using Robust Learning Approaches," Journal of Healthcare Engineering, vol. 2021, no. 1, Nov. 2021, Art. no. 7633381. DOI: https://doi.org/10.1155/2021/7633381

G. Sailasya and G. L. A. Kumari, "Analyzing the Performance of Stroke Prediction using ML Classification Algorithms," International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, pp. 539–545, Jun. 2021. DOI: https://doi.org/10.14569/IJACSA.2021.0120662

H. Al-Zubaidi, M. Dweik, and A. Al-Mousa, "Stroke Prediction Using Machine Learning Classification Methods," in 2022 International Arab Conference on Information Technology, Abu Dhabi, United Arab Emirates, 2022, pp. 1–8. DOI: https://doi.org/10.1109/ACIT57182.2022.10022050

A. M. A. Rahim, A. Sunyoto, and M. R. Arief, "Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm," MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 595–606, Aug. 2022. DOI: https://doi.org/10.30812/matrik.v21i3.1666

P. Agarwal, M. Khandelwal, Nishtha, and D. A. K. Kadam, "Brain Stroke Prediction using Machine Learning Approach," Iconic Research And Engineering Journals, vol. 6, no. 1, pp. 273–277, Jul. 2022.

N. Alageel, R. Alharbi, R. Alharbi, M. Alsayil, and L. A. Alharbi, "Using Machine Learning Algorithm as a Method for Improving Stroke Prediction," International Journal of Advanced Computer Science and Applications, vol. 14, no. 4, pp. 738–744, Apr. 2023. DOI: https://doi.org/10.14569/IJACSA.2023.0140481

R. Kuksal, G. A. Reddy, M. Vaqur, A. Bhatt, and K. Joshi, "Brain stroke prediction using machine learning," AIP Conference Proceedings, vol. 2919, no. 1, Mar. 2024, Art. no. 090013. DOI: https://doi.org/10.1063/5.0185040

"Stroke Prediction Dataset." Kaggle. [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset.

S. Rahman, M. Hasan, and A. K. Sarkar, "Prediction of Brain Stroke using Machine Learning Algorithms and Deep Neural Network Techniques," European Journal of Electrical Engineering and Computer Science, vol. 7, no. 1, pp. 23–30, Jan. 2023. DOI: https://doi.org/10.24018/ejece.2023.7.1.483

M. J. J. Ghrabat, G. Ma, I. Y. Maolood, S. S. Alresheedi, and Z. A. Abduljabbar, "An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier," Human-centric Computing and Information Sciences, vol. 9, no. 1, Aug. 2019, Art. no. 31. DOI: https://doi.org/10.1186/s13673-019-0191-8

W. A. Awadh, M. S. Hashim, and A. S. Alasady, "Implementing the Triple-Data Encryption Standard for Secure and Efficient Healthcare Data Storage in Cloud Computing Environments," Informatica, vol. 48, no. 6, pp. 173–184, Apr. 2024. DOI: https://doi.org/10.31449/inf.v48i6.5641

K. Potdar, T. S. Pardawala, and C. D. Pai, "A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers," International Journal of Computer Applications, vol. 175, no. 4, pp. 7–9, Oct. 2017. DOI: https://doi.org/10.5120/ijca2017915495

I. Letteri, A. D. Cecco, A. Dyoub, and G. D. Penna, "A Novel Resampling Technique for Imbalanced Dataset Optimization." arXiv, Dec. 30, 2020.

M. Hashim and A. Yassin, "Using Pearson Correlation and Mutual Information (PC-MI) to Select Features for Accurate Breast Cancer Diagnosis Based on a Soft Voting Classifier," Iraqi Journal for Electrical and Electronic Engineering, vol. 19, no. 2, pp. 43–53, Dec. 2023. DOI: https://doi.org/10.37917/ijeee.19.2.6

D. Dablain, B. Krawczyk, and N. V. Chawla, "DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 6390–6404, Sep. 2023. DOI: https://doi.org/10.1109/TNNLS.2021.3136503

J. Sola and J. Sevilla, "Importance of input data normalization for the application of neural networks to complex industrial problems," IEEE Transactions on Nuclear Science, vol. 44, no. 3, pp. 1464–1468, Jun. 1997. DOI: https://doi.org/10.1109/23.589532

M. J. J. Ghrabat, G. Ma, Z. A. Abduljabbar, M. A. Al Sibahee, and S. J. Jassim, "Greedy Learning of Deep Boltzmann Machine (GDBM)’s Variance and Search Algorithm for Efficient Image Retrieval," IEEE Access, vol. 7, pp. 169142–169159, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2948266

P. Ferreira, D. C. Le, and N. Zincir-Heywood, "Exploring Feature Normalization and Temporal Information for Machine Learning Based Insider Threat Detection," in 2019 15th International Conference on Network and Service Management, Halifax, Canada, 2019, pp. 1–7. DOI: https://doi.org/10.23919/CNSM46954.2019.9012708