Heart Disease Predictive Modeling with XGBoost and SMOTE-Driven Class Imbalance Mitigation
Received: 25 August 2025 | Revised: 14 September 2025, 16 September 2025, and 20 September 2025 | Accepted: 21 September 2025 | Online: 24 October 2025
Corresponding author: T. Roopa
Abstract
Heart disease is one of the leading causes of death worldwide, highlighting the importance of early and precise diagnosis techniques. Clinical and demographic data can be analyzed using machine learning techniques to detect heart disease. This study combines XGBoost with the Synthetic Minority Oversampling Technique (SMOTE), introducing a strong prediction framework that handles class imbalance in the dataset. This study used 303 patient records with 13 clinical features from the UCI Heart Disease dataset, involving preprocessing, including addressing missing values, categorical variable one-hot encoding, and standardization of numeric features, followed by hyperparameter optimization using GridSearchCV. According to experimental findings, the model achieved an overall accuracy of 90%, with true positive and true negative counts of 96 and 66, respectively. The classification report shows good precision (0.87–0.93), recall (0.83–0.95), and F1-score (0.88–0.91), with low misclassification rates. Age, the type of chest pain, maximum heart rate, and cholesterol levels are highlighted as important predictors in the feature importance evaluation. According to the results, the XGBoost+SMOTE pipeline is very successful at classifying binary heart disease and can help with early therapeutic intervention techniques, which may lead to better patient outcomes.
Keywords:
heart disease, machine learning, XGBoost, SMOTE, clinical diagnosisDownloads
References
R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and P. Singh, "Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning," Computational Intelligence and Neuroscience, vol. 2021, no. 1, Jan. 2021, Art. no. 8387680. DOI: https://doi.org/10.1155/2021/8387680
H. Khdair and N. M. Dasari, "Exploring Machine Learning Techniques for Coronary Heart Disease Prediction," International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, 2021. DOI: https://doi.org/10.14569/IJACSA.2021.0120505
C. A. U. Hassan et al., "Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers," Sensors, vol. 22, no. 19, Sept. 2022, Art. no. 7227. DOI: https://doi.org/10.3390/s22197227
M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, "Analyzing the impact of feature selection on the accuracy of heart disease prediction," Healthcare Analytics, vol. 2, Nov. 2022, Art. no. 100060. DOI: https://doi.org/10.1016/j.health.2022.100060
Y. Xu et al., "Predicting ICU Mortality in Rheumatic Heart Disease: Comparison of XGBoost and Logistic Regression," Frontiers in Cardiovascular Medicine, vol. 9, Feb. 2022, Art. no. 847206. DOI: https://doi.org/10.3389/fcvm.2022.847206
K. Budholiya, S. K. Shrivastava, and V. Sharma, "An optimized XGBoost based diagnostic system for effective prediction of heart disease," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4514–4523, July 2022. DOI: https://doi.org/10.1016/j.jksuci.2020.10.013
J. Yang and J. Guan, "A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm," School of Information, Shanxi University of Finance and Economics, vol. 13, no. 10, Oct. 2022, Art. no. 475. DOI: https://doi.org/10.3390/info13100475
U. Nagavelli, D. Samanta, and P. Chakraborty, "Machine Learning Technology-Based Heart Disease Detection Models," Journal of Healthcare Engineering, vol. 2022, pp. 1–9, Feb. 2022. DOI: https://doi.org/10.1155/2022/7351061
M. Aryuni, S. Adiarto, E. Miranda, E. D. Madyatmadja, V. D. S. Albert, and E. Sestomi, "Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm," International Journal of Fuzzy Logic and Intelligent Systems, vol. 23, no. 2, pp. 140–151, June 2023. DOI: https://doi.org/10.5391/IJFIS.2023.23.2.140
F. Novitasari, E. Haerani, A. Nazir, J. Jasril, and F. Insani, "Sistem Klasifikasi Penyakit Jantung Menggunakan Teknik Pendekatan SMOTE Pada Algoritma Modified K-Nearest Neighbor," Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, June 2023. DOI: https://doi.org/10.47065/bits.v5i1.3610
K. Kalita, N. Ganesh, S. Jayalakshmi, J. S. Chohan, S. Mallik, and H. Qin, "Multi-Objective artificial bee colony optimized hybrid deep belief network and XGBoost algorithm for heart disease prediction," Frontiers in Digital Health, vol. 5, Nov. 2023, Art. no. 1279644. DOI: https://doi.org/10.3389/fdgth.2023.1279644
B. S. Peteti and D. Nandan, "Heart Disease Classification/Prediction: A Review," Revue d’Intelligence Artificielle, vol. 37, no. 2, pp. 347–377, Apr. 2023. DOI: https://doi.org/10.18280/ria.370213
G. Abdulsalam, S. Meshoul, and H. Shaiba, "Explainable Heart Disease Prediction Using Ensemble-Quantum Machine Learning Approach," Intelligent Automation & Soft Computing, vol. 36, no. 1, pp. 761–779, 2023. DOI: https://doi.org/10.32604/iasc.2023.032262
V. S. Devare, "Heart Disease Prediction Using Binary Classification," M.S. Thesis, California State University, USA, 2023.
N. Chandrasekhar and S. Peddakrishna, "Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization," School of Electronics Engineering, VIT-AP University, vol. 11, no. 4, Apr. 2023, Art. no. 1210. DOI: https://doi.org/10.3390/pr11041210
X. Liu, D. Li, and J. Zhao, "A Mortality Predicting Model for Heart Failure Patients Based on AdaBoost with Multi-kernel SVM," Taiyuan University of Technology, vol. 54, no. 5, 2023.
S. P. Barfungpa, L. Samantaray, and H. K. D. Sarma, "SMOTE-based adaptive coati kepler optimized hybrid deep network for predicting the survival of heart failure patients," Multimedia Tools and Applications, vol. 83, no. 24, pp. 65497–65524, Jan. 2024. DOI: https://doi.org/10.1007/s11042-023-18061-3
S. Naganjaneyulu, G. Akanksha, S. Shaheeda, and M. Sadhak, "HMLF_CDD_SSBM: A Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Prediction Using the SMOTE Stacking Method," in International Conference on Innovative Computing and Communications, 2023, Delhi, India, pp. 571–585. DOI: https://doi.org/10.1007/978-981-99-3010-4_47
A. A. H. Alkurdi, "Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers," Fusion: Practice and Applications, vol. 13, no. 1, pp. 08–18, 2023. DOI: https://doi.org/10.54216/FPA.130101
A. F. Tasnim et al., "Explainable Machine Learning Algorithms to Predict Cardiovascular Strokes," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 20131–20137, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9152
A. N. Cahyani, J. Zeniarja, S. Winarno, R. T. E. Putri, and A. A. Maulani, "Heart Disease Classification Using Deep Neural Network with SMOTE Technique for Balancing Data," Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, Dec. 2023, Art. no. 0240108. DOI: https://doi.org/10.26877/asset.v6i1.17521
A. Ogunpola, F. Saeed, S. Basurra, A. M. Albarrak, and S. N. Qasem, "Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases," Diagnostics, vol. 14, no. 2, Jan. 2024, Art. no. 144. DOI: https://doi.org/10.3390/diagnostics14020144
W. S. Andras Janosi, "Heart Disease." UCI Machine Learning Repository, 1989.
Downloads
How to Cite
License
Copyright (c) 2025 T. Roopa, Ganesh Dalappagari Ramanjinappa

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
