Heart Disease Predictive Modeling with XGBoost and SMOTE-Driven Class Imbalance Mitigation

Authors

  • T. Roopa School of Computing & Information Technology, REVA University, Bengaluru, Karnataka, India | SSIT, SSAHE University, Tumkur, India
  • Ganesh Dalappagari Ramanjinappa School of Computing and Information Technology, REVA University, Bengaluru, Karnataka, India
Volume: 15 | Issue: 6 | Pages: 29914-29918 | December 2025 | https://doi.org/10.48084/etasr.14301

Abstract

Heart disease is one of the leading causes of death worldwide, highlighting the importance of early and precise diagnosis techniques. Clinical and demographic data can be analyzed using machine learning techniques to detect heart disease. This study combines XGBoost with the Synthetic Minority Oversampling Technique (SMOTE), introducing a strong prediction framework that handles class imbalance in the dataset. This study used 303 patient records with 13 clinical features from the UCI Heart Disease dataset, involving preprocessing, including addressing missing values, categorical variable one-hot encoding, and standardization of numeric features, followed by hyperparameter optimization using GridSearchCV. According to experimental findings, the model achieved an overall accuracy of 90%, with true positive and true negative counts of 96 and 66, respectively. The classification report shows good precision (0.87–0.93), recall (0.83–0.95), and F1-score (0.88–0.91), with low misclassification rates. Age, the type of chest pain, maximum heart rate, and cholesterol levels are highlighted as important predictors in the feature importance evaluation. According to the results, the XGBoost+SMOTE pipeline is very successful at classifying binary heart disease and can help with early therapeutic intervention techniques, which may lead to better patient outcomes.

Keywords:

heart disease, machine learning, XGBoost, SMOTE, clinical diagnosis

Downloads

Download data is not yet available.

References

R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and P. Singh, "Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning," Computational Intelligence and Neuroscience, vol. 2021, no. 1, Jan. 2021, Art. no. 8387680. DOI: https://doi.org/10.1155/2021/8387680

H. Khdair and N. M. Dasari, "Exploring Machine Learning Techniques for Coronary Heart Disease Prediction," International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, 2021. DOI: https://doi.org/10.14569/IJACSA.2021.0120505

C. A. U. Hassan et al., "Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers," Sensors, vol. 22, no. 19, Sept. 2022, Art. no. 7227. DOI: https://doi.org/10.3390/s22197227

M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, "Analyzing the impact of feature selection on the accuracy of heart disease prediction," Healthcare Analytics, vol. 2, Nov. 2022, Art. no. 100060. DOI: https://doi.org/10.1016/j.health.2022.100060

Y. Xu et al., "Predicting ICU Mortality in Rheumatic Heart Disease: Comparison of XGBoost and Logistic Regression," Frontiers in Cardiovascular Medicine, vol. 9, Feb. 2022, Art. no. 847206. DOI: https://doi.org/10.3389/fcvm.2022.847206

K. Budholiya, S. K. Shrivastava, and V. Sharma, "An optimized XGBoost based diagnostic system for effective prediction of heart disease," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4514–4523, July 2022. DOI: https://doi.org/10.1016/j.jksuci.2020.10.013

J. Yang and J. Guan, "A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm," School of Information, Shanxi University of Finance and Economics, vol. 13, no. 10, Oct. 2022, Art. no. 475. DOI: https://doi.org/10.3390/info13100475

U. Nagavelli, D. Samanta, and P. Chakraborty, "Machine Learning Technology-Based Heart Disease Detection Models," Journal of Healthcare Engineering, vol. 2022, pp. 1–9, Feb. 2022. DOI: https://doi.org/10.1155/2022/7351061

M. Aryuni, S. Adiarto, E. Miranda, E. D. Madyatmadja, V. D. S. Albert, and E. Sestomi, "Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm," International Journal of Fuzzy Logic and Intelligent Systems, vol. 23, no. 2, pp. 140–151, June 2023. DOI: https://doi.org/10.5391/IJFIS.2023.23.2.140

F. Novitasari, E. Haerani, A. Nazir, J. Jasril, and F. Insani, "Sistem Klasifikasi Penyakit Jantung Menggunakan Teknik Pendekatan SMOTE Pada Algoritma Modified K-Nearest Neighbor," Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, June 2023. DOI: https://doi.org/10.47065/bits.v5i1.3610

K. Kalita, N. Ganesh, S. Jayalakshmi, J. S. Chohan, S. Mallik, and H. Qin, "Multi-Objective artificial bee colony optimized hybrid deep belief network and XGBoost algorithm for heart disease prediction," Frontiers in Digital Health, vol. 5, Nov. 2023, Art. no. 1279644. DOI: https://doi.org/10.3389/fdgth.2023.1279644

B. S. Peteti and D. Nandan, "Heart Disease Classification/Prediction: A Review," Revue d’Intelligence Artificielle, vol. 37, no. 2, pp. 347–377, Apr. 2023. DOI: https://doi.org/10.18280/ria.370213

G. Abdulsalam, S. Meshoul, and H. Shaiba, "Explainable Heart Disease Prediction Using Ensemble-Quantum Machine Learning Approach," Intelligent Automation & Soft Computing, vol. 36, no. 1, pp. 761–779, 2023. DOI: https://doi.org/10.32604/iasc.2023.032262

V. S. Devare, "Heart Disease Prediction Using Binary Classification," M.S. Thesis, California State University, USA, 2023.

N. Chandrasekhar and S. Peddakrishna, "Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization," School of Electronics Engineering, VIT-AP University, vol. 11, no. 4, Apr. 2023, Art. no. 1210. DOI: https://doi.org/10.3390/pr11041210

X. Liu, D. Li, and J. Zhao, "A Mortality Predicting Model for Heart Failure Patients Based on AdaBoost with Multi-kernel SVM," Taiyuan University of Technology, vol. 54, no. 5, 2023.

S. P. Barfungpa, L. Samantaray, and H. K. D. Sarma, "SMOTE-based adaptive coati kepler optimized hybrid deep network for predicting the survival of heart failure patients," Multimedia Tools and Applications, vol. 83, no. 24, pp. 65497–65524, Jan. 2024. DOI: https://doi.org/10.1007/s11042-023-18061-3

S. Naganjaneyulu, G. Akanksha, S. Shaheeda, and M. Sadhak, "HMLF_CDD_SSBM: A Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Prediction Using the SMOTE Stacking Method," in International Conference on Innovative Computing and Communications, 2023, Delhi, India, pp. 571–585. DOI: https://doi.org/10.1007/978-981-99-3010-4_47

A. A. H. Alkurdi, "Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers," Fusion: Practice and Applications, vol. 13, no. 1, pp. 08–18, 2023. DOI: https://doi.org/10.54216/FPA.130101

A. F. Tasnim et al., "Explainable Machine Learning Algorithms to Predict Cardiovascular Strokes," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 20131–20137, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9152

A. N. Cahyani, J. Zeniarja, S. Winarno, R. T. E. Putri, and A. A. Maulani, "Heart Disease Classification Using Deep Neural Network with SMOTE Technique for Balancing Data," Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, Dec. 2023, Art. no. 0240108. DOI: https://doi.org/10.26877/asset.v6i1.17521

A. Ogunpola, F. Saeed, S. Basurra, A. M. Albarrak, and S. N. Qasem, "Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases," Diagnostics, vol. 14, no. 2, Jan. 2024, Art. no. 144. DOI: https://doi.org/10.3390/diagnostics14020144

W. S. Andras Janosi, "Heart Disease." UCI Machine Learning Repository, 1989.

Downloads

How to Cite

[1]
T. Roopa and G. D. Ramanjinappa, “Heart Disease Predictive Modeling with XGBoost and SMOTE-Driven Class Imbalance Mitigation”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 6, pp. 29914–29918, Dec. 2025.

Metrics

Abstract Views: 692
PDF Downloads: 336

Metrics Information