Comparative Analysis of Oversampling and Undersampling Techniques in Predicting Customer Churn for Dqlab Telco

Authors

  • Bima Pramudya Asaddulloh Department of Informatics, Postgraduate Program, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Kusrini Department of Informatics, Postgraduate Program, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
  • Dhani Ariatmanto Department of Informatics, Postgraduate Program, Universitas Amikom Yogyakarta, Sleman, 55283, Indonesia
Volume: 15 | Issue: 3 | Pages: 22257-22261 | June 2025 | https://doi.org/10.48084/etasr.10396

Abstract

Customer churn prediction is a critical task in the telecommunications (telecom) industry for optimizing retention efforts and reducing customer attrition. This paper presents a churn prediction model using Machine Learning (ML) techniques, focusing on handling imbalanced data through resampling methods. A novel approach is proposed combining Gradient Boosting (GB) with Random Undersampling (RUS), (GB+RUS), and Random Forest (RF) with Synthetic Minority Oversampling Technique (SMOTE). Model performance is evaluated on a real-world telecom dataset, achieving significant results. The RF+SMOTE method outperforms existing models, obtaining an accuracy of 79.23%, precision of 79.32%, recall of 80.15%, F1-score of 79.73%, and AUC of 87.25%, outperforming traditional approaches, such as RF and Support Vector Machines (SVM). The importance of using advanced resampling techniques to address data imbalance and improve churn prediction models is highlighted.

Keywords:

machine learning, customer churn classification, resampling method

Downloads

Download data is not yet available.

References

K. Kusnawi, J. Ipmawati, B. P. Asadulloh, A. Aminuddin, F. F. Abdulloh, and M. Rahardi, "Leveraging Various Feature Selection Methods for Churn Prediction Using Various Machine Learning Algorithms," JOIV : International Journal on Informatics Visualization, vol. 8, no. 2, pp. 897–905, May 2024.

S. Ouf, K. T. Mahmoud, and M. A. Abdel-Fattah, "A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry," Journal of Big Data, vol. 11, no. 1, May 2024, Art. no. 70.

A. Sikri, R. Jameel, S. M. Idrees, and H. Kaur, "Enhancing customer retention in telecom industry with machine learning driven churn prediction," Scientific Reports, vol. 14, no. 1, Jun. 2024, Art. no. 13097.

O. Soleiman-garmabaki and M. H. Rezvani, "Ensemble classification using balanced data to predict customer churn: a case study on the telecom industry," Multimedia Tools and Applications, vol. 83, no. 15, pp. 44799–44831, May 2024.

M. Mandic and G. Kraljevic, "Churn Prediction Model Improvement Using Automated Machine Learning with Social Network Parameters," Revue d’Intelligence Artificielle, vol. 36, no. 3, pp. 373–379, Jun. 2022.

N. Siddiqui, M. A. Haque, S. M. S. Khan, M. Adil, and H. Shoaib, "Different ML-based strategies for customer churn prediction in banking sector," Journal of Data, Information and Management, vol. 6, no. 3, pp. 217–234, Sep. 2024.

K. Eria and B. P. Marikannan, "Significance-Based Feature Extraction for Customer Churn Prediction Data in the Telecom Sector," Journal of Computational and Theoretical Nanoscience, vol. 16, no. 8, pp. 3428–3431, Aug. 2019.

S. Brmez and M. Znidarsic, "A Case of Churn Prediction in Telecommunications Industry," IPSI Transactions on Internet Research, vol. 15, no. 2, pp. 3–9, 2019.

C. Colot, P. Baecke, and I. Linden, "Leveraging fine-grained mobile data for churn detection through Essence Random Forest," Journal of Big Data, vol. 8, no. 1, Apr. 2021, Art. no. 63.

I. N. M. Adiputra and P. Wanchai, "CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction," Journal of Big Data, vol. 11, no. 1, Sep. 2024, Art. no. 121.

W. Deng, L. Deng, J. Liu, and J. Qi, "Sampling method based on improved C4.5 decision tree and its application in prediction of telecom customer churn," International Journal of Information Technology and Management, vol. 18, no. 1, pp. 93–109, Jan. 2019.

"Telco Customer Churn." https://www.kaggle.com/datasets/blastchar/telco-customer-churn.

G. A. Lopez-Ramirez, A. Aragon-Zavala, and C. Vargas-Rosales, "Exploratory Data Analysis for Path Loss Measurements: Unveiling Patterns and Insights Before Machine Learning," IEEE Access, vol. 12, pp. 62279–62295, Jan. 2024.

M. Rath and H. Date, "Quantum data encoding: a comparative analysis of classical-to-quantum mapping techniques and their impact on machine learning accuracy," EPJ Quantum Technology, vol. 11, no. 1, Dec. 2024, Art. no. 72.

A. Ali, N. A. Emran, and S. A. Asmai, "Missing values compensation in duplicates detection using hot deck method," Journal of Big Data, vol. 8, no. 1, Aug. 2021, Art. no. 112.

C. Ma, H. Wang, O. O. Odegbile, S. Chen, and D. Melissourgos, "Virtual Filter for Non-Duplicate Sampling With Network Applications," IEEE/ACM Transactions on Networking, vol. 30, no. 6, pp. 2818–2833, Sep. 2022.

M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, "Analyzing the impact of feature selection on the accuracy of heart disease prediction," Healthcare Analytics, vol. 2, Nov. 2022, Art. no. 100060.

P. Kumari, "A fast feature selection technique in multi modal biometrics using cloud framework," Microprocessors and Microsystems, vol. 79, Nov. 2020, Art. no. 103277.

B. Biswas, N. Kumar, Md. A. Hoque, and Md. A. Alam, "Weighted scaling approach for metabolomics data analysis," Japanese Journal of Statistics and Data Science, vol. 6, no. 2, pp. 785–802, Nov. 2023.

D. Medyakov, G. Molodtsov, A. Beznosikov, and A. Gasnikov, "Optimal Data Splitting in Distributed Optimization for Machine Learning," Doklady Mathematics, vol. 108, no. 2, pp. S465–S475, Dec. 2023.

R. K. Halder, M. N. Uddin, Md. A. Uddin, S. Aryal, and A. Khraisat, "Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications," Journal of Big Data, vol. 11, no. 1, Aug. 2024, Art. no. 113.

I. AlShourbaji, N. Helian, Y. Sun, A. G. Hussien, L. Abualigah, and B. Elnaim, "An efficient churn prediction model using gradient boosting machine and metaheuristic optimization," Scientific Reports, vol. 13, no. 1, Sep. 2023, Art. no. 14441.

T. Pitka et al., "Time analysis of online consumer behavior by decision trees, GUHA association rules, and formal concept analysis," Journal of Marketing Analytics, vol. 13, no. 1, pp. 29–52, Mar. 2025.

A. Vanacore, M. S. Pellegrino, and A. Ciardiello, "Fair evaluation of classifier predictive performance based on binary confusion matrix," Computational Statistics, vol. 39, no. 1, pp. 363–383, Feb. 2024.

Downloads

How to Cite

[1]
Asaddulloh, B.P., Kusrini, . and Ariatmanto, D. 2025. Comparative Analysis of Oversampling and Undersampling Techniques in Predicting Customer Churn for Dqlab Telco. Engineering, Technology & Applied Science Research. 15, 3 (Jun. 2025), 22257–22261. DOI:https://doi.org/10.48084/etasr.10396.

Metrics

Abstract Views: 68
PDF Downloads: 60

Metrics Information