Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction

Authors

  • Majid Rahardi Department of Informatics, Faculty of Computer Science, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Bima Pramudya Asaddulloh Department of Informatics, Postgraduate Program, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Afrig Aminuddin Department of Information Systems, Faculty of Computer Science, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Ferian Fauzi Abdulloh Department of Informatics, Faculty of Computer Science, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Ilham Saifudin Department of Informatics Engineering, Faculty of Engineering, Universitas Muhammadiyah Jember, Jember, 68121, Indonesia
  • Faqih Putra Kusumawijaya School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, 4001, Australia
Volume: 15 | Issue: 3 | Pages: 23599-23604 | June 2025 | https://doi.org/10.48084/etasr.10407

Abstract

Machine learning models serve as a potent instrument for forecasting heart diseases, nevertheless, class imbalance in datasets—characterized by a disproportionate number of healthy individuals compared to those with heart disease—can markedly affect the efficacy of these models. This study presents a machine learning pipeline that incorporates resampling methods, including SMOTE, ADASYN, and Random Oversampling (ROS), with commonly utilized classifiers, such as Random Forest (RF), k-Nearest Neighbors (kNN), Gradient Boosting, and Adaboost. Utilizing the 2022 CDC's Indicators of Heart Disease dataset, we examine the efficacy of these methodologies considering prediction accuracy, precision, recall, F1-score, and AUC. Compared to various previous studies, the findings show that RF with ROS achieves the highest overall performance, showing 95.75% accuracy, 99.84% recall, 95.91% F1-score, and 99.59% AUC. The findings illustrate the efficacy of oversampling approaches to rectify class imbalance and enhance heart disease prediction.

Keywords:

ML, heart disease classification, resampling methods

Downloads

Download data is not yet available.

References

J. Rwebembera et al., "2023 World Heart Federation guidelines for the echocardiographic diagnosis of rheumatic heart disease," Nature Reviews Cardiology, vol. 21, no. 4, pp. 250–263, Apr. 2024.

G. S. Bhavekar, A. Das Goswami, C. P. Vasantrao, A. K. Gaikwad, A. V. Zade, and H. Vyawahare, "Heart disease prediction using machine learning, deep Learning and optimization techniques-A semantic review," Multimedia Tools and Applications, vol. 83, no. 39, pp. 86895–86922, Nov. 2024.

M. Dubey, J. Tembhurne, and R. Makhijani, "Improving coronary heart disease prediction with real-life dataset: a stacked generalization framework with maximum clinical attributes and SMOTE balancing for imbalanced data," Multimedia Tools and Applications, vol. 83, no. 37, pp. 85139–85168, Nov. 2024.

M. Kadhim and A. Radhi, "Heart disease classification using optimized Machine learning algorithms," Iraqi Journal for Computer Science and Mathematics, vol. 4, no. 2, Jan. 2023.

J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, "Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare," IEEE Access, vol. 8, pp. 107562–107582, 2020.

S. Mohan, C. Thirumalai, and G. Srivastava, "Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques," IEEE Access, vol. 7, pp. 81542–81554, 2019.

A. Gupta, R. Kumar, H. Singh Arora, and B. Raman, "MIFH: A Machine Intelligence Framework for Heart Disease Diagnosis," IEEE Access, vol. 8, pp. 14659–14674, 2020.

A. Javeed, S. Zhou, L. Yongjian, I. Qasim, A. Noor, and R. Nour, "An Intelligent Learning System Based on Random Search Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection," IEEE Access, vol. 7, pp. 180235–180243, 2019.

S. M. Alanazi and G. S. M. Khamis, "Optimizing Machine Learning Classifiers for Enhanced Cardiovascular Disease Prediction," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12911–12917, Feb. 2024.

M. J. J. Ghrabat et al., "Utilizing Machine Learning for the Early Detection of Coronary Heart Disease," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17363–17375, Oct. 2024.

A. K. Dubey, A. K. Sinhal, and R. Sharma, "An Improved Auto Categorical PSO with ML for Heart Disease Prediction," Engineering, Technology & Applied Science Research, vol. 12, no. 3, pp. 8567–8573, Jun. 2022.

K. Pytlak, "Indicators of Heart Disease (2022 UPDATE)." kaggle, 2022, [Online]. Available: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease.

A. Thakur, T. Zhu, V. Abrol, J. Armstrong, Y. Wang, and D. A. Clifton, "Data encoding for healthcare data democratization and information leakage prevention," Nature Communications, vol. 15, no. 1, Feb. 2024, Art. no. 1582.

H. Ameur, H. Njah, and S. Jamoussi, "Merits of Bayesian networks in overcoming small data challenges: a meta-model for handling missing data," International Journal of Machine Learning and Cybernetics, vol. 14, no. 1, pp. 229–251, Jan. 2023.

H. Park and H. Kim, "AR-ADASYN: angle radius-adaptive synthetic data generation approach for imbalanced learning," Statistics and Computing, vol. 34, no. 5, Aug. 2024, Art. no. 166.

F. Kamalov, H.-H. Leung, and A. K. Cherukuri, "Keep it simple: random oversampling for imbalanced data," in 2023 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, Oct. 2023, pp. 1–4.

L. da Silva Renato, R. M. de Souza, E. Aparecido Vieira, J. R. de Oliveira, and P. V. Morbach Dixini, "Statistical Inference Modeling Using Pearson Correlation Filters and Best Subset Selection Applied to Steel Desulfurization Preliminary to Ladle Furnace Treatment," JOM, vol. 75, no. 4, pp. 1284–1293, Apr. 2023.

B. Biswas, N. Kumar, Md. A. Hoque, and Md. A. Alam, "Weighted scaling approach for metabolomics data analysis," Japanese Journal of Statistics and Data Science, vol. 6, no. 2, pp. 785–802, Nov. 2023.

Additional Files

How to Cite

[1]
Rahardi, M., Asaddulloh, B.P., Aminuddin, A., Abdulloh, F.F., Saifudin, I. and Kusumawijaya, F.P. 2025. Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction. Engineering, Technology & Applied Science Research. 15, 3 (Jun. 2025), 23599–23604. DOI:https://doi.org/10.48084/etasr.10407.

Metrics

Abstract Views: 44
PDF Downloads: 0

Metrics Information