A Robust Methodology for Stroke Disease Prediction using a Hard Voting Classifier
Received: 19 January 2025 | Revised: 20 February 2025 and 09 March 2025 | Accepted: 13 March 2025 | Online:
Corresponding author: Ali A. Yassin
Abstract
The vast majority of strokes are caused by an unexpected occlusion of the blood vessels that supply the brain and the heart arteries. Early detection of the many warning symptoms of stroke can help reduce the severity of the stroke and save the patient's life. Although researchers have proposed a variety of diagnostic methods to detect this disease, the methods currently in use still need further improvement. In this paper, we propose an effective methodology that utilizes a hard voting classifier based on three Machine Learning (ML) models, namely, Random Forest (RF), K-Nearest Neighbors (KNN), and Extra Trees Classifier (ETC). First, a series of data quality improvement procedures were performed using the Synthetic Minority Oversampling Technique (SMOTE) approach for data balancing to ensure an unbiased training process without majority class dominance. Next, we divided the dataset into two parts, a training part and a testing part, and these data were fed to the models used. In the last phase, we implemented four ML algorithms to evaluate their effectiveness and then selected the three most effective models for integration into our proposed hard voting classifier. The hard voting outperformed the results of modern studies with an accuracy of 97.48%, a precision of 0.9802, a recall of 0.9691, and an F1 score of 0.9747. Furthermore, we applied K-fold cross-validation (K=10), which systematically partitions the dataset into multiple subsets, preventing overfitting and providing a robust estimate of model performance across different data splits, where a mean accuracy of 97.1% was achieved.
Keywords:
stroke disease, machine learning, prediction, hard voting classifier, SMOTE, cross-validationDownloads
References
J. T. Marbun, Seniman, and U. Andayani, "Classification of stroke disease using convolutional neural network," Journal of Physics: Conference Series, vol. 978, no. 1, Mar. 2018, Art. no. 012092.
O. Almadani and R. Alshammari, "Prediction of Stroke using Data Mining Classification Techniques," International Journal of Advanced Computer Science and Applications, vol. 9, no. 1, pp. 457–460, Jan. 2018.
V. L. Feigin et al., "World Stroke Organization (WSO): Global Stroke Fact Sheet 2022," International Journal of Stroke, vol. 17, no. 1, pp. 18–29, Jan. 2022.
S. Negash, P. Musa, D. Vogel, and S. Sahay, "Healthcare information technology for development: improvements in people’s lives through innovations in the uses of technologies," Information Technology for Development, vol. 24, no. 2, pp. 189–197, Apr. 2018.
R. S. Jeena and S. Kumar, "Machine intelligence in stroke prediction," International Journal of Bioinformatics Research and Applications, vol. 14, no. 1–2, pp. 29–48, Jan. 2018.
R. J. Mohammed et al., "A Robust Hybrid Machine and Deep Learning-based Model for Classification and Identification of Chest X-ray Images," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16212–16220, Oct. 2024.
M. Kong, Q. He, and L. Li, "AI Assisted Clinical Diagnosis & Treatment, and Development Strategy," Strategic Study of Chinese Academy of Engineering, vol. 20, no. 2, pp. 86–91, Apr. 2018.
S. Campagnini, C. Arienti, M. Patrini, P. Liuzzi, A. Mannini, and M. C. Carrozza, "Machine learning methods for functional recovery prediction and prognosis in post-stroke rehabilitation: a systematic review," Journal of NeuroEngineering and Rehabilitation, vol. 19, no. 1, Jun. 2022, Art. no. 54.
H. Ahmed, S. F. Abd-el Ghany, E. M. G. Youn, N. F. Omran, and A. A. Ali, "Stroke Prediction using Distributed Machine Learning Based on Apache Spark," International Journal of Advanced Science and Technology, vol. 28, no. 15, pp. 89–97, Nov. 2019.
M. U. Emon, M. S. Keya, T. I. Meghla, Md. M. Rahman, M. S. A. Mamun, and M. S. Kaiser, "Performance Analysis of Machine Learning Approaches in Stroke Prediction," in 2020 4th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore, India, 2020, pp. 1464–1469.
T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, "Stroke Disease Detection and Prediction Using Robust Learning Approaches," Journal of Healthcare Engineering, vol. 2021, no. 1, Nov. 2021, Art. no. 7633381.
G. Sailasya and G. L. A. Kumari, "Analyzing the Performance of Stroke Prediction using ML Classification Algorithms," International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, pp. 539–545, Jun. 2021.
H. Al-Zubaidi, M. Dweik, and A. Al-Mousa, "Stroke Prediction Using Machine Learning Classification Methods," in 2022 International Arab Conference on Information Technology, Abu Dhabi, United Arab Emirates, 2022, pp. 1–8.
A. M. A. Rahim, A. Sunyoto, and M. R. Arief, "Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm," MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 595–606, Aug. 2022.
P. Agarwal, M. Khandelwal, Nishtha, and D. A. K. Kadam, "Brain Stroke Prediction using Machine Learning Approach," Iconic Research And Engineering Journals, vol. 6, no. 1, pp. 273–277, Jul. 2022.
N. Alageel, R. Alharbi, R. Alharbi, M. Alsayil, and L. A. Alharbi, "Using Machine Learning Algorithm as a Method for Improving Stroke Prediction," International Journal of Advanced Computer Science and Applications, vol. 14, no. 4, pp. 738–744, Apr. 2023.
R. Kuksal, G. A. Reddy, M. Vaqur, A. Bhatt, and K. Joshi, "Brain stroke prediction using machine learning," AIP Conference Proceedings, vol. 2919, no. 1, Mar. 2024, Art. no. 090013.
"Stroke Prediction Dataset." Kaggle. [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset.
S. Rahman, M. Hasan, and A. K. Sarkar, "Prediction of Brain Stroke using Machine Learning Algorithms and Deep Neural Network Techniques," European Journal of Electrical Engineering and Computer Science, vol. 7, no. 1, pp. 23–30, Jan. 2023.
M. J. J. Ghrabat, G. Ma, I. Y. Maolood, S. S. Alresheedi, and Z. A. Abduljabbar, "An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier," Human-centric Computing and Information Sciences, vol. 9, no. 1, Aug. 2019, Art. no. 31.
W. A. Awadh, M. S. Hashim, and A. S. Alasady, "Implementing the Triple-Data Encryption Standard for Secure and Efficient Healthcare Data Storage in Cloud Computing Environments," Informatica, vol. 48, no. 6, pp. 173–184, Apr. 2024.
K. Potdar, T. S. Pardawala, and C. D. Pai, "A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers," International Journal of Computer Applications, vol. 175, no. 4, pp. 7–9, Oct. 2017.
I. Letteri, A. D. Cecco, A. Dyoub, and G. D. Penna, "A Novel Resampling Technique for Imbalanced Dataset Optimization." arXiv, Dec. 30, 2020.
M. Hashim and A. Yassin, "Using Pearson Correlation and Mutual Information (PC-MI) to Select Features for Accurate Breast Cancer Diagnosis Based on a Soft Voting Classifier," Iraqi Journal for Electrical and Electronic Engineering, vol. 19, no. 2, pp. 43–53, Dec. 2023.
D. Dablain, B. Krawczyk, and N. V. Chawla, "DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 6390–6404, Sep. 2023.
J. Sola and J. Sevilla, "Importance of input data normalization for the application of neural networks to complex industrial problems," IEEE Transactions on Nuclear Science, vol. 44, no. 3, pp. 1464–1468, Jun. 1997.
M. J. J. Ghrabat, G. Ma, Z. A. Abduljabbar, M. A. Al Sibahee, and S. J. Jassim, "Greedy Learning of Deep Boltzmann Machine (GDBM)’s Variance and Search Algorithm for Efficient Image Retrieval," IEEE Access, vol. 7, pp. 169142–169159, 2019.
P. Ferreira, D. C. Le, and N. Zincir-Heywood, "Exploring Feature Normalization and Temporal Information for Machine Learning Based Insider Threat Detection," in 2019 15th International Conference on Network and Service Management, Halifax, Canada, 2019, pp. 1–7.
Downloads
How to Cite
License
Copyright (c) 2025 Nouralhuda Ali Abdulsamad, Ali A. Yassin, Zaid Ameen Abduljabbar, Mohammed S. Hashim, Vincent Omollo Nyangaresi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.