Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques

Ahmed Fahim; Ahmed M. Osman; Zahraa Tarek; Ahmed M. Elshewey

doi:10.48084/etasr.13875

Authors

Ahmed Fahim Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia | Department of Computer Science, Faculty of Computers and Information, Suez University, Suez, Egypt
Ahmed M. Osman Department of Information Systems, Faculty of Computers and Information, Suez University, Suez, Egypt https://orcid.org/0009-0002-0527-533X
Zahraa Tarek Department of Computer Engineering and Information, College of Engineering, Wadi Ad Dwaser, Prince Sattam Bin Abdulaziz University, Al-Kharj 16273, Saudi Arabia | Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, 35561, Egypt
Ahmed M. Elshewey Department of Computer Science, Faculty of Computers and Information, Suez University, P.O.Box:43221, Suez, Egypt | Applied Science Research Center. Applied Science Private University, Amman, Jordan https://orcid.org/0000-0002-3048-1920

Volume: 15 | Issue: 6 | Pages: 29325-29333 | December 2025 | https://doi.org/10.48084/etasr.13875

Received: 5 August 2025 | Revised: 8 September 2025 | Accepted: 15 September 2025 | Online: 8 December 2025

Corresponding author: Ahmed Fahim

Abstract

The accurate classification of Air Quality Index (AQI) is critical for environmental monitoring and public health protection. In this paper, we utilized a publicly available daily air quality dataset from U.S. counties, comprising six classification categories: Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, and Hazardous. The dataset underwent preprocessing through missing value imputation and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Several machine learning and deep learning models were trained and evaluated on the dataset, including Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP) neural network. The models were assessed using cross-validation accuracy, test set accuracy, macro-averaged recall, F1-Score, and ROC-AUC metrics. Ensemble methods (RRF and ET) and the MLP classifier achieved superior results compared to traditional models. The RF model achieved a test accuracy of 99.3%, while the MLP classifier achieved 99.0% . The stacking ensemble model achieved a test accuracy of 99.99 %, a macro-averaged recall of 87.12 %, and an ROC-AUC of 1.0000, highlighting the strong potential of ensemble learning techniques in enhancing the performance of AQI multi-class classification.

Keywords:

air pollution, Air Quality Index (AQI), environmental monitoring, machine learning, air quality classification, ensemble machine learning

References

P. Mullangi et al., "Assessing Real-Time Health Impacts of outdoor Air Pollution through IoT Integration," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13796–13803, Apr. 2024. DOI: https://doi.org/10.48084/etasr.6981

H. Alkabbani, A. Ramadan, Q. Zhu, and A. Elkamel, "An Improved Air Quality Index Machine Learning-Based Forecasting with Multivariate Data Imputation Approach," Atmosphere, vol. 13, no. 7, Jul. 2022, Art. no. 1144. DOI: https://doi.org/10.3390/atmos13071144

S. V. Razavi-Termeh, A. Sadeghi-Niaraki, and S.-M. Choi, "Spatial Modeling of Asthma-Prone Areas Using Remote Sensing and Ensemble Machine Learning Algorithms," Remote Sensing, vol. 13, no. 16, Jan. 2021, Art. no. 3222. DOI: https://doi.org/10.3390/rs13163222

M. T. Udristioiu, Y. EL Mghouchi, and H. Yildizhan, "Prediction, modelling, and forecasting of PM and AQI using hybrid machine learning," Journal of Cleaner Production, vol. 421, Oct. 2023, Art. no. 138496. DOI: https://doi.org/10.1016/j.jclepro.2023.138496

J. K. Sethi and M. Mittal, "An efficient correlation based adaptive LASSO regression method for air quality index prediction," Earth Science Informatics, vol. 14, no. 4, pp. 1777–1786, Dec. 2021. DOI: https://doi.org/10.1007/s12145-021-00618-1

R. S. Rao, L. R. Kalabarige, B. Alankar, and A. K. Sahu, "Multimodal imputation-based stacked ensemble for prediction and classification of air quality index in Indian cities," Computers and Electrical Engineering, vol. 114, Mar. 2024, Art. no. 109098. DOI: https://doi.org/10.1016/j.compeleceng.2024.109098

A. S. Mohan and L. Abraham, "An ensemble deep learning approach for air quality estimation in Delhi, India," Earth Science Informatics, vol. 17, no. 3, pp. 1923–1948, Jun. 2024. DOI: https://doi.org/10.1007/s12145-023-01210-5

O. Farooq et al., "An enhanced approach for predicting air pollution using quantum support vector machine," Scientific Reports, vol. 14, no. 1, Aug. 2024, Art. no. 19521. DOI: https://doi.org/10.1038/s41598-024-69663-2

S. Ma, J. He, J. He, Q. Feng, and Y. Bi, "Forecasting air quality Index in yan’an using temporal encoded Informer," Expert Systems with Applications, vol. 255, Dec. 2024, Art. no. 124868. DOI: https://doi.org/10.1016/j.eswa.2024.124868

M. Ahmadi, M. Khashei, and N. Bakhtiarvand, "Enhancing air quality classification using a novel discrete learning-based multilayer perceptron model (DMLP)," International Journal of Environmental Science and Technology, vol. 22, no. 5, pp. 3051–3062, Mar. 2025. DOI: https://doi.org/10.1007/s13762-024-06017-5

S. Singh and G. Suthar, "Machine learning and deep learning approaches for PM2.5 prediction: a study on urban air quality in Jaipur, India," Earth Science Informatics, vol. 18, no. 1, Dec. 2024, Art. no. 97. DOI: https://doi.org/10.1007/s12145-024-01648-1

Z. Xu, H. Zhang, A. Zhai, C. Kong, and J. Zhang, "Stacking Ensemble Learning and SHAP-Based Insights for Urban Air Quality Forecasting: Evidence from Shenyang and Global Implications," Atmosphere, vol. 16, no. 7, Jul. 2025, Art. no. 776. DOI: https://doi.org/10.3390/atmos16070776

Y. Özüpak, F. Alpsalaz, and E. Aslan, "Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction," Water, Air, & Soil Pollution, vol. 236, no. 7, May 2025, Art. no. 464. DOI: https://doi.org/10.1007/s11270-025-08122-8

Y. Choi, B. Kang, and D. Kim, "Utilizing Machine Learning-based Classification Models for Tracking Air Pollution Sources: A Case Study in Korea," Aerosol and Air Quality Research, vol. 24, no. 7, 2024, Art. no. 230222. DOI: https://doi.org/10.4209/aaqr.230222

R. Srinivasa Rao, L. Rao Kalabarige, M. R. Holla, and A. Kumar Sahu, "Multimodal Imputation-Based Multimodal Autoencoder Framework for AQI Classification and Prediction of Indian Cities," IEEE Access, vol. 12, pp. 108350–108363, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3438573

A. Barthwal and A. K. Goel, "Advancing air quality prediction models in urban India: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification," Modeling Earth Systems and Environment, vol. 10, no. 2, pp. 2935–2955, Apr. 2024. DOI: https://doi.org/10.1007/s40808-023-01934-9

M. A. I. Rafi, M. R. Sohan, G. J. Rayhan, M. S. I. Khairul, A. A. Noman, and M. H. Nadid, "Air pollution prediction and classification with a hybrid ANN-LSTM model in modern cities: a comparative study," IET Conference Proceedings, vol. 2024, no. 30, pp. 580–585, Mar. 2025. DOI: https://doi.org/10.1049/icp.2025.0313

O. US EPA, "Air Quality Index Report," Aug. 11, 2016. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report.