Enhancing the Prediction of Multiple Ozone Metrics Using Genetic Algorithm-Based Feature Selection for the Multi-Target Regression of the Environmental AQ-Bench Dataset
Received: 21 September 2025 | Revised: 29 October 2025 | Accepted: 6 November 2025 | Online: 14 December 2025
Corresponding author: N. Mohamed Abdul Kader Jailani
Abstract
Predicting multiple air quality metrics from high-dimensional environmental datasets is a significant challenge hampered by the "curse of dimensionality". The present study introduces and evaluates three novel Genetic Algorithm (GA)-based feature selection methodologies designed specifically for Multi-Target Regression (MTR) tasks using the AQ-Bench dataset. The proposed wrapper-based approach integrates GAs with MTR models to identify optimal feature subsets. The results demonstrate a substantial reduction in feature dimensionality, by up to 61.6%, while concurrently improving predictive performance over baseline models. This research establishes a practical framework for practitioners, showing that a common feature subset (GA-FS-MTR) is effective for correlated targets, whereas a per-target approach (GA-FS-TARGET) excels when precision for heterogeneous targets is required. A key finding of the present study is the identification of a structural sensitivity in complex models like the Ensemble of Regressor Chains (ERC), where global optimization can inadvertently remove features vital for its chained architecture. This work validates GA-based feature selection as an effective tool for optimizing MTR models in environmental science and provides a strategic guide for its implementation.
Keywords:
feature selection, genetic algorithm, Multi-Target Regression (MTR), air quality prediction, Single Target (ST), Stacked Single Target (SST), Ensemble of Regressor Chains (ERC), Average RRMSEDownloads
References
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York City, NY, USA: Springer New York, 2009. DOI: https://doi.org/10.1007/978-0-387-84858-7
D. Kocev, S. Džeroski, M. D. White, G. R. Newell, and P. Griffioen, "Using Single- and Multi-target Regression Trees and Ensembles to Model a Compound Index of Vegetation Condition," Ecological Modelling, vol. 220, no. 8, pp. 1159–1168, Apr. 2009. DOI: https://doi.org/10.1016/j.ecolmodel.2009.01.037
G. R. Brindha, B. S. Rishiikeshwer, B. Santhi, K. Nakendraprasath, R. Manikandan, and A. H. Gandomi, "Precise Prediction of Multiple Anticancer Drug Efficacy using Multi Target Regression and Support Vector Regression Analysis," Computer Methods and Programs in Biomedicine, vol. 224, Sept. 2022, Art. no. 107027. DOI: https://doi.org/10.1016/j.cmpb.2022.107027
S. Barbon Junior et al., "Multi-Target Prediction of Wheat Flour Quality Parameters With Near Infrared Spectroscopy," Information Processing in Agriculture, vol. 7, no. 2, pp. 342–354, Jun. 2020. DOI: https://doi.org/10.1016/j.inpa.2019.07.001
B. F. Darst, K. C. Malecki, and C. D. Engelman, "Using Recursive Feature Elimination in Random Forest to Account for Correlated Variables in High Dimensional Data," BMC Genetics, vol. 19, no. S1, Sept. 2018, Art. no. 65. DOI: https://doi.org/10.1186/s12863-018-0633-8
Z. Wen and Y. Li, "A Spatial-Constrained Multi-target Regression Model for Human Brain Activity Prediction," Applied Informatics, vol. 3, no. 1, Dec. 2016, Art. no. 10. DOI: https://doi.org/10.1186/s40535-016-0026-x
F. I. Lewis and M. P. Ward, "Improving Epidemiologic Data Analyses Through Multivariate Regression Modelling," Emerging Themes in Epidemiology, vol. 10, no. 1, May 2013, Art. no. 4. DOI: https://doi.org/10.1186/1742-7622-10-4
H. Borchani, G. Varando, C. Bielza, and P. Larrañaga, "A Survey on Multi‐output Regression," WIREs Data Mining and Knowledge Discovery, vol. 5, no. 5, pp. 216–233, Sept. 2015. DOI: https://doi.org/10.1002/widm.1157
S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, "Gradient Descent Finds Global Minima of Deep Neural Networks," in 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, Art. no. 97.
C. Rudin, "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead," Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, May 2019. DOI: https://doi.org/10.1038/s42256-019-0048-x
T. Aho, B. Ženko, S. Džeroski, and T. Elomaa, "Multi-Target Regression with Rule Ensembles," Journal of Machine Learning Research, vol. 13, pp. 2367–2407, Aug. 2012.
M. T. Ribeiro, S. Singh, and C. Guestrin, "‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 1135–1144. DOI: https://doi.org/10.1145/2939672.2939778
P. Domingos, "A Few Useful Things to Know About Machine Learning," Communications of the ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. DOI: https://doi.org/10.1145/2347736.2347755
G. Tsoumakas, I. Katakis, and I. Vlahavas, "Random k-Labelsets for Multilabel Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1079–1089, Jul. 2011. DOI: https://doi.org/10.1109/TKDE.2010.164
D. Di Fina, S. Karaman, A. D. Bagdanov, and A. Del Bimbo, "MORF: Multi-Objective Random Forests for face characteristic estimation," in 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, Germany, Aug. 2015, pp. 1–6. DOI: https://doi.org/10.1109/AVSS.2015.7301793
D. Kocev, C. Vens, J. Struyf, and S. Džeroski, "Ensembles of Multi-Objective Decision Trees," in Machine Learning: ECML 2007, J. N. Kok, J. Koronacki, R. L. D. Mantaras, S. Matwin, D. Mladenič, and A. Skowron, Eds. Berlin, Heidelberg, Germany: Springer Berlin Heidelberg, 2007, vol. 4701, pp. 624–631. DOI: https://doi.org/10.1007/978-3-540-74958-5_61
I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," Journal of Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.
M. Petković, D. Kocev, and S. Džeroski, "Feature Ranking for Multi-Target Regression," Machine Learning, vol. 109, no. 6, pp. 1179–1204, Jun. 2020. DOI: https://doi.org/10.1007/s10994-019-05829-8
E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, and I. Vlahavas, "Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs," Machine Learning, vol. 104, no. 1, pp. 55–98, Jul. 2016. DOI: https://doi.org/10.1007/s10994-016-5546-z
F. H. Syed, M. A. Tahir, M. Rafi, and M. D. Shahab, "Feature Selection for Semi-Supervised Multi-target Regression Using Genetic Algorithm," Applied Intelligence, vol. 51, no. 12, pp. 8961–8984, Dec. 2021. DOI: https://doi.org/10.1007/s10489-021-02291-9
A. L. Blum and P. Langley, "Selection of Relevant Features and Examples in Machine Learning," Artificial Intelligence, vol. 97, no. 1–2, pp. 245–271, Dec. 1997. DOI: https://doi.org/10.1016/S0004-3702(97)00063-5
J. Li et al., "Feature Selection: A Data Perspective," ACM Computing Surveys, vol. 50, no. 6, pp. 1–45, Nov. 2018. DOI: https://doi.org/10.1145/3136625
R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, Dec. 1997. DOI: https://doi.org/10.1016/S0004-3702(97)00043-X
Q. Al-Tashi, S. J. Abdul Kadir, H. M. Rais, S. Mirjalili, and H. Alhussian, "Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection," IEEE Access, vol. 7, pp. 39496–39508, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2906757
J. H. Holland, Adaptation in Natural and Artificial Systems, 1st ed. Cambridge, MA, USA: MIT Press, 1992. DOI: https://doi.org/10.7551/mitpress/1090.001.0001
B. Xue, M. Zhang, W. N. Browne, and X. Yao, "A Survey on Evolutionary Computation Approaches to Feature Selection," IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606–626, Aug. 2016. DOI: https://doi.org/10.1109/TEVC.2015.2504420
S. Khedekar and S. Thakare, "Predicting Air Pollution Levels in Pune, India using Generative Adversarial Networks," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17405–17413, Oct. 2024. DOI: https://doi.org/10.48084/etasr.8512
M. Friedman, "A Comparison of Alternative Tests of Significance for the Problem of m Rankings," The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, Mar. 1940. DOI: https://doi.org/10.1214/aoms/1177731944
C. Betancourt, T. Stomberg, R. Roscher, M. G. Schultz, and S. Stadtler, "AQ-Bench: a Benchmark Dataset for Machine Learning on Global Air Quality Metrics," Earth System Science Data, vol. 13, no. 6, pp. 3013–3033, Jun. 2021. DOI: https://doi.org/10.5194/essd-13-3013-2021
A. S. Brar and K. Singh, "A Multi-Objective Stacked Regression Method for Distance Based Colour Measuring Device," Scientific Reports, vol. 14, no. 1, Mar. 2024, Art. no. 5530. DOI: https://doi.org/10.1038/s41598-024-54785-4
S. Masmoudi, H. Elghazel, D. Taieb, O. Yazar, and A. Kallel, "A Machine-Learning Framework for Predicting Multiple Air Pollutants’ Concentrations via Multi-target Regression and Feature Selection," Science of the Total Environment, vol. 715, May 2020, Art. no. 136991. DOI: https://doi.org/10.1016/j.scitotenv.2020.136991
F. H. Syed, M. A. Tahir, J. Frnda, M. Rafi, M. S. Anwar, and J. Nedoma, "Toward an Optimal and Structured Feature Subset Selection for Multi-Target Regression Using Genetic Algorithm," IEEE Access, vol. 11, pp. 121966–121977, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3327870
G. Melki, A. Cano, V. Kecman, and S. Ventura, "Multi-Target Support Vector Regression via Correlation Regressor Chains," Information Sciences, vol. 415–416, pp. 53–69, Nov. 2017. DOI: https://doi.org/10.1016/j.ins.2017.06.017
Z. Ning et al., "Prediction and Explanation for Ozone Variability Using Cross-stacked Ensemble Learning Model," Science of The Total Environment, vol. 935, Jul. 2024, Art. no. 173382. DOI: https://doi.org/10.1016/j.scitotenv.2024.173382
O. Reyes, H. M. Fardoun, and S. Ventura, "An Ensemble-Based Method for the Selection of Instances in the Multi-target Regression Problem," Integrated Computer-Aided Engineering, vol. 25, no. 4, pp. 305–320, Sept. 2018. DOI: https://doi.org/10.3233/ICA-180581
Downloads
How to Cite
License
Copyright (c) 2025 N. Mohamed Abdul Kader Jailani, Geeta C. Mara

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
