Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques
Received: 5 August 2025 | Revised: 8 September 2025 | Accepted: 15 September 2025 | Online: 8 December 2025
Corresponding author: Ahmed Fahim
Abstract
The accurate classification of Air Quality Index (AQI) is critical for environmental monitoring and public health protection. In this paper, we utilized a publicly available daily air quality dataset from U.S. counties, comprising six classification categories: Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, and Hazardous. The dataset underwent preprocessing through missing value imputation and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Several machine learning and deep learning models were trained and evaluated on the dataset, including Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP) neural network. The models were assessed using cross-validation accuracy, test set accuracy, macro-averaged recall, F1-Score, and ROC-AUC metrics. Ensemble methods (RRF and ET) and the MLP classifier achieved superior results compared to traditional models. The RF model achieved a test accuracy of 99.3%, while the MLP classifier achieved 99.0% . The stacking ensemble model achieved a test accuracy of 99.99 %, a macro-averaged recall of 87.12 %, and an ROC-AUC of 1.0000, highlighting the strong potential of ensemble learning techniques in enhancing the performance of AQI multi-class classification.
Keywords:
air pollution, Air Quality Index (AQI), environmental monitoring, machine learning, air quality classification, ensemble machine learningDownloads
References
P. Mullangi et al., "Assessing Real-Time Health Impacts of outdoor Air Pollution through IoT Integration," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13796–13803, Apr. 2024. DOI: https://doi.org/10.48084/etasr.6981
H. Alkabbani, A. Ramadan, Q. Zhu, and A. Elkamel, "An Improved Air Quality Index Machine Learning-Based Forecasting with Multivariate Data Imputation Approach," Atmosphere, vol. 13, no. 7, Jul. 2022, Art. no. 1144. DOI: https://doi.org/10.3390/atmos13071144
S. V. Razavi-Termeh, A. Sadeghi-Niaraki, and S.-M. Choi, "Spatial Modeling of Asthma-Prone Areas Using Remote Sensing and Ensemble Machine Learning Algorithms," Remote Sensing, vol. 13, no. 16, Jan. 2021, Art. no. 3222. DOI: https://doi.org/10.3390/rs13163222
M. T. Udristioiu, Y. EL Mghouchi, and H. Yildizhan, "Prediction, modelling, and forecasting of PM and AQI using hybrid machine learning," Journal of Cleaner Production, vol. 421, Oct. 2023, Art. no. 138496. DOI: https://doi.org/10.1016/j.jclepro.2023.138496
J. K. Sethi and M. Mittal, "An efficient correlation based adaptive LASSO regression method for air quality index prediction," Earth Science Informatics, vol. 14, no. 4, pp. 1777–1786, Dec. 2021. DOI: https://doi.org/10.1007/s12145-021-00618-1
R. S. Rao, L. R. Kalabarige, B. Alankar, and A. K. Sahu, "Multimodal imputation-based stacked ensemble for prediction and classification of air quality index in Indian cities," Computers and Electrical Engineering, vol. 114, Mar. 2024, Art. no. 109098. DOI: https://doi.org/10.1016/j.compeleceng.2024.109098
A. S. Mohan and L. Abraham, "An ensemble deep learning approach for air quality estimation in Delhi, India," Earth Science Informatics, vol. 17, no. 3, pp. 1923–1948, Jun. 2024. DOI: https://doi.org/10.1007/s12145-023-01210-5
O. Farooq et al., "An enhanced approach for predicting air pollution using quantum support vector machine," Scientific Reports, vol. 14, no. 1, Aug. 2024, Art. no. 19521. DOI: https://doi.org/10.1038/s41598-024-69663-2
S. Ma, J. He, J. He, Q. Feng, and Y. Bi, "Forecasting air quality Index in yan’an using temporal encoded Informer," Expert Systems with Applications, vol. 255, Dec. 2024, Art. no. 124868. DOI: https://doi.org/10.1016/j.eswa.2024.124868
M. Ahmadi, M. Khashei, and N. Bakhtiarvand, "Enhancing air quality classification using a novel discrete learning-based multilayer perceptron model (DMLP)," International Journal of Environmental Science and Technology, vol. 22, no. 5, pp. 3051–3062, Mar. 2025. DOI: https://doi.org/10.1007/s13762-024-06017-5
S. Singh and G. Suthar, "Machine learning and deep learning approaches for PM2.5 prediction: a study on urban air quality in Jaipur, India," Earth Science Informatics, vol. 18, no. 1, Dec. 2024, Art. no. 97. DOI: https://doi.org/10.1007/s12145-024-01648-1
Z. Xu, H. Zhang, A. Zhai, C. Kong, and J. Zhang, "Stacking Ensemble Learning and SHAP-Based Insights for Urban Air Quality Forecasting: Evidence from Shenyang and Global Implications," Atmosphere, vol. 16, no. 7, Jul. 2025, Art. no. 776. DOI: https://doi.org/10.3390/atmos16070776
Y. Özüpak, F. Alpsalaz, and E. Aslan, "Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction," Water, Air, & Soil Pollution, vol. 236, no. 7, May 2025, Art. no. 464. DOI: https://doi.org/10.1007/s11270-025-08122-8
Y. Choi, B. Kang, and D. Kim, "Utilizing Machine Learning-based Classification Models for Tracking Air Pollution Sources: A Case Study in Korea," Aerosol and Air Quality Research, vol. 24, no. 7, 2024, Art. no. 230222. DOI: https://doi.org/10.4209/aaqr.230222
R. Srinivasa Rao, L. Rao Kalabarige, M. R. Holla, and A. Kumar Sahu, "Multimodal Imputation-Based Multimodal Autoencoder Framework for AQI Classification and Prediction of Indian Cities," IEEE Access, vol. 12, pp. 108350–108363, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3438573
A. Barthwal and A. K. Goel, "Advancing air quality prediction models in urban India: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification," Modeling Earth Systems and Environment, vol. 10, no. 2, pp. 2935–2955, Apr. 2024. DOI: https://doi.org/10.1007/s40808-023-01934-9
M. A. I. Rafi, M. R. Sohan, G. J. Rayhan, M. S. I. Khairul, A. A. Noman, and M. H. Nadid, "Air pollution prediction and classification with a hybrid ANN-LSTM model in modern cities: a comparative study," IET Conference Proceedings, vol. 2024, no. 30, pp. 580–585, Mar. 2025. DOI: https://doi.org/10.1049/icp.2025.0313
O. US EPA, "Air Quality Index Report," Aug. 11, 2016. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report.
Downloads
How to Cite
License
Copyright (c) 2025 Ahmed Fahim, Ahmed M. Osman, Zahraa Tarek, Ahmed M. Elshewey

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
