Prediction of Vehicle-induced Air Pollution based on Advanced Machine Learning Models


  • Caroline Matara Department of Civil & Construction Engineering, University of Nairobi, Kenya | School of Civil and Resource Engineering, Technical University of Kenya, Kenya
  • Simpson Osano Department of Civil & Construction Engineering, University of Nairobi, Kenya
  • Amir Okeyo Yusuf Department of Chemistry, University of Nairobi, Kenya
  • Elisha Ochungo Aketch Department of Civil, Faculty of Engineering and Technology (FoET), Multimedia University, Kenya
Volume: 14 | Issue: 1 | Pages: 12837-12843 | February 2024 |


Vehicle-induced air pollution is an important issue in the 21st century, posing detrimental effects on human health. Prediction of vehicle-emitted air pollutants and evaluation of the diverse factors that contribute to them are of the utmost importance. This study employed advanced tree-based machine learning models to predict vehicle-induced air pollutant levels, with a particular focus on fine particulate matter (PM2.5). In addition to a benchmark statistical model, the models employed were Gradient Boosting (GB), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), Extra Tree (ET), and Random Forest (RF). Regarding the evaluation of PM2.5 predictions, the ET model outperformed the others, as shown by MAE of 1.69, MSE of 5.91, RMSE of 2.43, and R2 of 0.71. Afterward, the optimal ET models were interpreted using SHAP analysis to overcome the ET model's lack of explainability. Based on the SHAP analysis, it was determined that temperature, humidity, and wind speed emerged as the primary determinants in forecasting PM2.5 levels.


air pollutants, machine learning, SHAP analysis


Download data is not yet available.


P. H. Avogbe et al., "Hematological changes among Beninese motor-bike taxi drivers exposed to benzene by urban air pollution," African Journal of Environmental Science and Technology, vol. 5, no. 7, pp. 464–472, 2011.

Y. Zhu, W. C. Hinds, S. Kim, and C. Sioutas, "Concentration and size distribution of ultrafine particles near a major highway," Journal of the Air & Waste Management Association (1995), vol. 52, no. 9, pp. 1032–1042, Sep. 2002.

S. Bhandarkar, "Vehicular Pollution, Their Effect on Human Heatlh and Mitigation Measures," Vehicle Engineering, vol. 1, no. 2, pp. 33–40, 2013.

M. M. Jackson, "Roadside Concentration of Gaseous and Particulate Matter Pollutants and Risk Assessment in Dar-Es-Salaam, Tanzania," Environmental Monitoring and Assessment, vol. 104, no. 1, pp. 385–407, May 2005.

M. Krzyżanowski, B. Kuna-Dibbert, and J. Schneider, Eds., Health effects of transport-related air pollution. Copenhagen, Denmark: World Health Organization Europe, 2005.

N. Künzli et al., "Public-health impact of outdoor and traffic-related air pollution: a European assessment," The Lancet, vol. 356, no. 9232, pp. 795–801, Sep. 2000.

G. Hoek, B. Brunekreef, S. Goldbohm, P. Fischer, and P. A. van den Brandt, "Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study," The Lancet, vol. 360, no. 9341, pp. 1203–1209, Oct. 2002.

M. Rosenlund, S. Picciotto, F. Forastiere, M. Stafoggia, and C. A. Perucci, "Traffic-Related Air Pollution in Relation to Incidence and Prognosis of Coronary Heart Disease," Epidemiology, vol. 19, no. 1, pp. 121–128, 2008.

E. Nordling et al., "Traffic-Related Air Pollution and Childhood Respiratory Symptoms, Function and Allergies," Epidemiology, vol. 19, no. 3, pp. 401–408, 2008.

E. Garshick et al., "Lung Cancer and Vehicle Exhaust in Trucking Industry Workers," Environmental Health Perspectives, vol. 116, no. 10, pp. 1327–1332, Oct. 2008.

A. Ghorani-Azam, B. Riahi-Zanjani, and M. Balali-Mood, "Effects of air pollution on human health and practical measures for prevention in Iran," Journal of Research in Medical Sciences : The Official Journal of Isfahan University of Medical Sciences, vol. 21, Sep. 2016, Art. no. 65.

G. C. Kisku, S. Pradhan, A. H. Khan, and S. K. Bhargava, "Pollution in Lucknow City and its health implication on exposed vendors, drivers and traffic policemen," Air Quality, Atmosphere & Health, vol. 6, no. 2, pp. 509–515, Jun. 2013.

J. A. Araujo et al., "Ambient Particulate Pollutants in the Ultrafine Range Promote Early Atherosclerosis and Systemic Oxidative Stress," Circulation Research, vol. 102, no. 5, pp. 589–596, Mar. 2008.

K. A. Salami, "Emission Control Technology by Automotive Industry: Trends and Challenges," Inaugural lecture series, vol. 10, pp. 8–9, 2007.

S. Dey and N. S. Mehta, "Automobile pollution control using catalysis," Resources, Environment and Sustainability, vol. 2, Dec. 2020, Art. no. 100006.

A. Aggarwal, A. K. Haritash, and G. Kansal, "Air pollution modelling-a review," International Journal of Advanced Technology Engineering Science, vol. 2, pp. 255–264, 2014.

A. Wang, J. Xu, R. Tu, M. Saleh, and M. Hatzopoulou, "Potential of machine learning for prediction of traffic related air pollution," Transportation Research Part D: Transport and Environment, vol. 88, Nov. 2020, Art. no. 102599.

S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," in Advances in Neural Information Processing Systems, 2017, vol. 30.

K. Koc, Ö. Ekmekcioğlu, and A. P. Gurgun, "Developing a National Data-Driven Construction Safety Management Framework with Interpretable Fatal Accident Prediction," Journal of Construction Engineering and Management, vol. 149, no. 4, Apr. 2023, Art. no. 04023010.

S. Lu, R. Chen, W. Wei, M. Belovsky, and X. Lu, "Understanding Heart Failure Patients EHR Clinical Features via SHAP Interpretation of Tree-Based Machine Learning Model Predictions," AMIA Annual Symposium Proceedings, vol. 2021, pp. 813–822, Feb. 2022.

P. N. Ramkumar et al., "Association Between Preoperative Mental Health and Clinically Meaningful Outcomes After Osteochondral Allograft for Cartilage Defects of the Knee: A Machine Learning Analysis," The American Journal of Sports Medicine, vol. 49, no. 4, pp. 948–957, Mar. 2021.

A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. (Kouros) Mohammadian, "Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis," Accident Analysis & Prevention, vol. 136, Mar. 2020, Art. no. 105405.

A. Khattak, P.-W. Chan, F. Chen, and H. Peng, "Prediction and Interpretation of Low-Level Wind Shear Criticality Based on Its Altitude above Runway Level: Application of Bayesian Optimization–Ensemble Learning Classifiers and SHapley Additive exPlanations," Atmosphere, vol. 13, no. 12, Dec. 2022, Art. no. 2102.

H. Qi, Y. Yao, X. Zhao, J. Guo, Y. Zhang, and C. Bi, "Applying an interpretable machine learning framework to the traffic safety order analysis of expressway exits based on aggregate driving behavior data," Physica A: Statistical Mechanics and its Applications, vol. 597, Jul. 2022, Art. no. 127277.

A. Khattak, P. W. Chan, F. Chen, and H. Peng, "Time-Series Prediction of Intense Wind Shear Using Machine Learning Algorithms: A Case Study of Hong Kong International Airport," Atmosphere, vol. 14, no. 2, Feb. 2023, Art. no. 268.

S. Ben Jabeur, R. Khalfaoui, and W. Ben Arfi, "The effect of green energy, global environmental indexes, and stock markets in predicting oil price crashes: Evidence from explainable machine learning," Journal of Environmental Management, vol. 298, Nov. 2021, Art. no. 113511.

A. Analitis et al., "Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London," Atmospheric Environment, vol. 240, Nov. 2020, Art. no. 117757.

U. Pak et al., "Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China," Science of The Total Environment, vol. 699, Jan. 2020, Art. no. 133561.

C. Srivastava, S. Singh, and A. P. Singh, "Estimation of Air Pollution in Delhi Using Machine Learning Techniques," in 2018 International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, Sep. 2018, pp. 304–309.

K. P. Singh, S. Gupta, and P. Rai, "Identifying pollution sources and predicting urban air quality using ensemble learning methods," Atmospheric Environment, vol. 80, pp. 426–437, Dec. 2013.

J. Zhang and W. Ding, "Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong," International Journal of Environmental Research and Public Health, vol. 14, no. 2, Feb. 2017, Art. no. 114.

X. Y. Ni, H. Huang, and W. P. Du, "Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data," Atmospheric Environment, vol. 150, pp. 146–161, Feb. 2017.

J. Chen, H. Chen, Z. Wu, D. Hu, and J. Z. Pan, "Forecasting smog-related health hazard based on social media and physical sensor," Information Systems, vol. 64, pp. 281–291, Mar. 2017.

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, May 2016, pp. 785–794.

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.

M. W. Ahmad, J. Reynolds, and Y. Rezgui, "Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees," Journal of Cleaner Production, vol. 203, pp. 810–821, Dec. 2018.

G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," in Advances in Neural Information Processing Systems, 2017, vol. 30.

J. Snoek, H. Larochelle, and R. P. Adams, "Practical Bayesian Optimization of Machine Learning Algorithms," in Advances in Neural Information Processing Systems, 2012, vol. 25.

N. A. Alsharif, S. Mishra, and M. Alshehri, "IDS in IoT using Machine ‎Learning and Blockchain," Engineering, Technology & Applied Science Research, vol. 13, no. 4, pp. 11197–11203, Aug. 2023.

K. Wang and A. W. Dowling, "Bayesian optimization for chemical products and functional materials," Current Opinion in Chemical Engineering, vol. 36, Jun. 2022, Art. no. 100728.

M. Vega García and J. L. Aznarte, "Shapley additive explanations for NO2 forecasting," Ecological Informatics, vol. 56, Mar. 2020, Art. no. 101039.

J. Gu, B. Yang, M. Brauer, and K. M. Zhang, "Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models," Atmospheric Environment, vol. 246, Feb. 2021, Art. no. 118125.

A. K. Dubey, A. K. Sinhal, and R. Sharma, "An Improved Auto Categorical PSO with ML for Heart Disease Prediction," Engineering, Technology & Applied Science Research, vol. 12, no. 3, pp. 8567–8573, Jun. 2022.

M. A. Alsuwaiket, "Feature Extraction of EEG Signals for Seizure Detection Using Machine Learning Algorthims," Engineering, Technology & Applied Science Research, vol. 12, no. 5, pp. 9247–9251, Oct. 2022.

S. Nuanmeesri, "A Hybrid Deep Learning and Optimized Machine Learning Approach for Rose Leaf Disease Classification," Engineering, Technology & Applied Science Research, vol. 11, no. 5, pp. 7678–7683, Oct. 2021.


How to Cite

C. Matara, S. Osano, A. O. Yusuf, and E. O. Aketch, “Prediction of Vehicle-induced Air Pollution based on Advanced Machine Learning Models”, Eng. Technol. Appl. Sci. Res., vol. 14, no. 1, pp. 12837–12843, Feb. 2024.


Abstract Views: 91
PDF Downloads: 58

Metrics Information