Addressing the Coupled Optimization of Feature Selection and Hyperparameter Tuning Using a TPE-Driven XGBoost-RFE Framework

Authors

  • N. Mohamed Abdul Kader Jailani School of Computer Science & Applications, REVA University, Bangalore, India
  • Geeta C. Mara School of Computing & Information Technology, REVA University, Bangalore, India
Volume: 16 | Issue: 1 | Pages: 32357-32362 | February 2026 | https://doi.org/10.48084/etasr.15024

Abstract

This study presents a methodological advancement for machine learning by developing a framework that solves the coupled problem of feature selection and hyperparameter optimization. The proposed TPE-XGBoost-RFE algorithm integrates a sequential model-based optimization technique, the Tree-structured Parzen Estimator (TPE), with a wrapper feature selection method. This approach concurrently searches for a globally optimal combination of predictive features and model hyperparameters. The efficacy of the framework is demonstrated in the task of predicting long-term tropospheric ozone concentrations. This integrated process identifies an optimal 22-feature subset, reducing dimensionality by 37% while simultaneously tuning nine key XGBoost hyperparameters. The robustness of this subset is validated across multiple machine learning models, all exhibiting superior predictive performance with lower error metrics compared to those trained on the full feature set or through a simpler filter-based method. This study demonstrates that a unified optimization strategy is critical for developing high-performing predictive models.

Keywords:

feature selection, hyperparameter optimization, XGBoost, recursive feature elimination (RFE), tree-structured parzen estimator (TPE), ozone prediction, optuna

Downloads

Download data is not yet available.

References

E. K. Juarez and M. R. Petersen, "A Comparison of Machine Learning Methods to Forecast Tropospheric Ozone Levels in Delhi," Atmosphere, vol. 13, no. 1, Dec. 2021. DOI: https://doi.org/10.3390/atmos13010046

M. A. M. Bhuiyan, R. K. Sahi, M. R. Islam, and S. Mahmud, "Machine Learning Techniques Applied to Predict Tropospheric Ozone in a Semi-Arid Climate Region," Mathematics, vol. 9, no. 22, Jan. 2021, Art. no. 2901. DOI: https://doi.org/10.3390/math9222901

M. J. Jiménez-Navarro, M. Martínez-Ballesteros, F. Martínez-Álvarez, and G. Asencio-Cortés, "Explaining deep learning models for ozone pollution prediction via embedded feature selection," Applied Soft Computing, vol. 157, May 2024, Art. no. 111504. DOI: https://doi.org/10.1016/j.asoc.2024.111504

L. Zhang et al., "Explainable ensemble machine learning revealing the effect of meteorology and sources on ozone formation in megacity Hangzhou, China," Science of The Total Environment, vol. 922, Apr. 2024, Art. no. 171295. DOI: https://doi.org/10.1016/j.scitotenv.2024.171295

Z. Li, Y. Wang, J. Liu, and J. Xian, "Using machine learning to unravel chemical and meteorological effects on ground-level ozone: Insights for ozone-climate control strategies," Environment International, vol. 201, July 2025, Art. no. 109567. DOI: https://doi.org/10.1016/j.envint.2025.109567

Z. Liu et al., "Comparison of machine learning methods for predicting ground-level ozone pollution in Beijing," Frontiers in Environmental Science, vol. 13, Apr. 2025, Art. no. 1561794. DOI: https://doi.org/10.3389/fenvs.2025.1561794

Q. Pan, F. Harrou, and Y. Sun, "A comparison of machine learning methods for ozone pollution prediction," Journal of Big Data, vol. 10, no. 1, May 2023, Art. no. 63. DOI: https://doi.org/10.1186/s40537-023-00748-x

Z. Xiao, Y. Lu, and G. Xiu, "Multi-Machine Learning Approaches to Modeling Small-Scale Source Attribution of Ozone Formation." Gases/Machine Learning/Troposphere/Chemistry (chemical composition and reactions), Mar. 05, 2025. DOI: https://doi.org/10.5194/egusphere-2025-160

K. Do, M. Mahish, A. K. Yeganeh, Z. Gao, C. L. Blanchard, and C. E. Ivey, "Emerging investigator series: a machine learning approach to quantify the impact of meteorology on tropospheric ozone in the inland southern California," Environmental Science: Atmospheres, vol. 3, no. 8, pp. 1159–1173, 2023. DOI: https://doi.org/10.1039/D2EA00077F

N. E. Selin et al., "Global health and economic impacts of future ozone pollution," Environmental Research Letters, vol. 4, no. 4, Oct. 2009, Art. no. 044014. DOI: https://doi.org/10.1088/1748-9326/4/4/044014

S. Räss and M. C. Leuenberger, "Analysis and prediction of atmospheric ozone concentrations using machine learning," Frontiers in Big Data, vol. 7, Jan. 2025. DOI: https://doi.org/10.3389/fdata.2024.1469809

N. M. A. K. Jailani and G. C. Mara, "Ozone Concentration Forecasting: Assessing the Efficacy of MLP, DNN, and XGBoost in Environmental Bench-AQ Dataset," in 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), Chikkaballapur, India, Apr. 2024, pp. 1–5. DOI: https://doi.org/10.1109/ICKECS61492.2024.10616879

C. Betancourt et al., "Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties," Geoscientific Model Development, vol. 15, no. 11, pp. 4331–4354, June 2022. DOI: https://doi.org/10.5194/gmd-15-4331-2022

L. Castro-Martín, M. del Mar Rueda, R. Ferri-García, and C. Hernando-Tamayo, "On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures," Mathematics, vol. 9, no. 23, Jan. 2021, Art. no. 2991. DOI: https://doi.org/10.3390/math9232991

T. L. He et al., "Deep Learning to Evaluate US NOx Emissions Using Surface Ozone Predictions," Journal of Geophysical Research: Atmospheres, vol. 127, no. 4, 2022, Art. no. e2021JD035597. DOI: https://doi.org/10.1029/2021JD035597

Y. Wang and X. S. Ni, "A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization." arXiv, Jan. 24, 2019. DOI: https://doi.org/10.5121/ijdms.2019.11101

D. Akritidis et al., "A deep stratosphere-to-troposphere ozone transport event over Europe simulated in CAMS global and regional forecast systems: analysis and evaluation," Atmospheric Chemistry and Physics, vol. 18, no. 20, pp. 15515–15534, Oct. 2018. DOI: https://doi.org/10.5194/acp-18-15515-2018

D. Effrosynidis and A. Arampatzis, "An evaluation of feature selection methods for environmental data," Ecological Informatics, vol. 61, Mar. 2021, Art. no. 101224. DOI: https://doi.org/10.1016/j.ecoinf.2021.101224

C. Ferhatoglu and B. A. Miller, "Choosing feature selection methods for spatial modeling of soil fertility properties at the field scale," in Proceedings of the 30th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, Nov. 2022. DOI: https://doi.org/10.1145/3557915.3565531

A. M. A. Zeyad and A. Biradar, "A-Hybrid-Text-Summarization-Approach-Using-Neural-Networks-and-Metaheuristic-Algorithms.pdf," International Journal of Safety and Security Engineering, vol. 13, no. 3, pp. 479-489, 2023. DOI: https://doi.org/10.18280/ijsse.130310

A. M. A. Zeyad and A. Biradar, "Abstractive Text Summarization: A Hybrid Evaluation of Integrating Flan-T5 (Dual Framework) with Pegasus Reveals Conciseness Advantages across Diverse Datasets," International Journal of Computer Network and Information Security, vol. 17, no. 6, pp. 98–115, Dec. 2025. DOI: https://doi.org/10.5815/ijcnis.2025.06.07

I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," Journal of Machine Learning Research, vol. 3, no. Mar, pp. 1157–1182, 2003.

L. Kovács, "Feature selection algorithms in generalized additive models under concurvity," Computational Statistics, vol. 39, no. 2, pp. 461–493, Apr. 2024. DOI: https://doi.org/10.1007/s00180-022-01292-7

J. P. Chaudhari et al., "Recursive Feature Elimination and Optimized Hybrid Ensemble Approach for Early Heart Disease Prediction," Advances in Technology Innovation, vol. 10, no. 1, pp. 58–71, Jan. 2025. DOI: https://doi.org/10.46604/aiti.2024.13825

K. R. Swetha and M. A. K. N. Jailani, "Multi-Target Ozone Prediction Using Hybrid GWO+SVM-RFE Feature Selection," in 2025 Third International Conference on Networks, Multimedia and Information Technology (NMITCON), Bengaluru, India, Dec. 2025, pp. 1–5. DOI: https://doi.org/10.1109/NMITCON65824.2025.11188068

K. R. S. Kumar and M. A. K. N. Jailani, "Hybrid Feature Selection Using ACO+SVM-RFE for Multi-Target Regression in Ozone Modeling," in 2025 Third International Conference on Networks, Multimedia and Information Technology (NMITCON), Bengaluru, India, Dec. 2025. DOI: https://doi.org/10.1109/NMITCON65824.2025.11188161

J. Adkins, M. Bowling, and A. White, "A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning," Advances in Neural Information Processing Systems, vol. 37, pp. 124820–124842, Dec. 2024. DOI: https://doi.org/10.52202/079017-3964

M. Rezaali, M. S. Jahangir, R. Fouladi-Fard, and D. Keellings, "An ensemble deep learning approach to spatiotemporal tropospheric ozone forecasting: A case study of Tehran, Iran," Urban Climate, vol. 55, May 2024, Art. no. 101950. DOI: https://doi.org/10.1016/j.uclim.2024.101950

L. Li, "Towards Efficient Automated Machine Learning." 2020

C. Betancourt, T. Stomberg, R. Roscher, M. G. Schultz, and S. Stadtler, "AQ-Bench: a benchmark dataset for machine learning on global air quality metrics," Earth System Science Data, vol. 13, no. 6, pp. 3013–3033, June 2021. DOI: https://doi.org/10.5194/essd-13-3013-2021

M. A. K. Jailani N and G. C. Mara, "Feature Selection in Ozone Feature Space Impacts Performance in Gradient Boosting, Random Forest, Xgboost and Adaptive Boosting Regressors," in 2024 International Conference on Current Trends in Advanced Computing (ICCTAC), Bengaluru, India, Feb. 2024, pp. 1–6. DOI: https://doi.org/10.1109/ICCTAC61556.2024.10581262

B. Zhang, Y. Zhang, and X. Jiang, "Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm," Scientific Reports, vol. 12, no. 1, June 2022, Art. no. 9244. DOI: https://doi.org/10.1038/s41598-022-13498-2

Downloads

How to Cite

[1]
N. M. A. K. Jailani and G. C. Mara, “Addressing the Coupled Optimization of Feature Selection and Hyperparameter Tuning Using a TPE-Driven XGBoost-RFE Framework”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 1, pp. 32357–32362, Feb. 2026.

Metrics

Abstract Views: 100
PDF Downloads: 73

Metrics Information