An Outlook of Ozone Air Pollution Through Comparative Analysis of Artificial Neural Network , Regression , and Sensitivity Models

Air pollution and atmospheric ozone can cause damages to human health and to the environment. This study explores the potential approach of the artificial neural network (ANN) model and compares it with a regression model for predicting ozone concentration using different parameters and functions measured by the Climate Prediction Center of US National Weather Service. In addition, this study has compared the economic viability of ANN and other measuring methods. Results showed that the ANN-based model exhibited better performance. Such model types can be beneficial to government agencies. By predicting ozone concentration government agencies can take preventive measures to avoid significant health effects, protect local populations, and help preserve a sustainable environment. Keywords-ozone pollution; environment; sustainability


INTRODUCTION
Ozone is a reactive gas, considered a subsidiary pollutant as it is not discharged directly into the atmosphere.Ozone is observed in two different regions of the atmosphere, ground level ozone ("bad ozone") in troposphere and "good ozone" in stratosphere, both with the same chemical composition of O 3 .Ozone at stratosphere protects from harmful sun rays and troposphere ozone is the main component of smog [1], caused by automobile emissions, especially in urban areas.Ozone is formed by the chemical processes of oxides of nitrogen (NO x ) and volatile organic compounds (VOCs) in the presence of sunlight.The ozone concentration in urban areas is relatively high compared to rural areas, and it rises in the morning, reaches its peak in the afternoon, and decreases at night time [2].US Environmental Protection Agency (USEPA) has set standards called air quality index (AQI) for ozone (Table I) [3].The ozone concentration significantly depends upon the atmospheric temperature, UV index, and emissions from different sources.USEPA has defined the overall percentage of the different sources of emission.Figures 1 and 2 show the sources of NO x and VOC in Mississippi State [4].Ozone affects lungs, causes asthma and lung cancer [5].The extent of respiratory illness depends on various factors such as concentration and duration of exposure, climate characteristics, individual sensitivity, preexistent respiratory diseases, and socioeconomic status [6,7].This study aims to predict ozone for Jackson, Mississippi.Jackson is the largest urban area in Mississippi with a population of 173,514.Data were collected from the Climate Prediction Center of the National Weather Service [8], and were used in an ANN to predict ozone and to find out the correlation between some variables.From the daily measured ozone data, the average was taken to observe the trend of the ozone pollution.As shown in Figure 3 a comparison to the AQI shows very unhealthy and hazardous level of health concerns [8].II.ANN MODEL DEVELOPMENT An ANN model was developed based on the available data.Initially, ANN architecture was developed and used data from 2010 to 2014.These data were given numbers to identify the intensity of the UV index based on the number of each day of each year.However, some data was missing.Due to the large number of data sets available, it has been decided to remove the missing data sets in order to maintain accuracy.The model architecture was developed in order to work as a computational model using Neuronets [9].ANNs are often used as a black-box model and show any continuous function to discretional accuracy, where the number of nodes is large [10].In the model development, dependent and independent categories were found and formed in the required ANN format.Data was also classified as training, testing, and validation sets.Correlation and regression were carried out to determine the correspondence between the variables involved.
In the first phase, the ANN was set to run on training and testing data sets to obtain the required hidden nodes and training iterations for the optimal model.For the next phase, the best network obtained from the previous phase, was verified on the database validation sets.The best performing network obtained from the previous phase has been then retrained on all available patterns in the database in order to account for all information embodied in the database.This retraining provides reliable prediction and better accuracy.Research studies carried out using this approach had shown that the train-all stage is recommended to obtain a better performing model [11].

III. ANN MODEL ARCHITECTURE
Data considered in this study was taken from the Climate Prediction Center of National Weather Service.Additional data of air operation was taken from Federal Aviation Authority, USA (FAA) [12].The data was then organized for utilization by the ANN.In this case, the database was set to include seven dependent variables and one independent variable as listed below: • Total ozone concentration Measured ozone was based on 5 inputs: clear sky UVI, cloudy sky UVI, cloud transmission, solar zenith angle, and aerosol transmission.In this study, air operation (flights operated at Jackson international airport) was also used as an input to identify the impact of air operation on the ozone pollution.The ANN model was developed and selected based on statistical accuracy measures such as maximum values of coefficient of determination (R 2 ), averagedsquared-error (ASE) and minimum values of mean absolute relative error (MARE).Collectively, 1059 data sets were used for the ANN modeling, namely 479 for training, 232 for testing, 232 for validation, and 116 were kept for true validation.Data was used for training and testing in order to obtain the optimal network.In this case, optimum network was achieved using 2 hidden nodes at 20,000 iterations.Accuracy statistics found on the optimum network were MARE-tr (MARE on training data)=0.329%,MARE-ts (MARE on testing data)=0.318%,R 2 -tr=0.997,R 2 -ts=0.997,ASE-tr=0.000034and ASE-ts=0.000032.
Figures 4-7 show the comparison of train, test, validate and train-all modeling stages respectively.These graphs show the best prediction models obtained for all modeling stages.Fig. 4.
Training predicted results versus measured values for ozone concentration.IV.REGRESSION MODEL Regression model has been developed using Microsoft excel data analysis toolkit.While developing the required linear regression prediction model, all 1059 data sets were used.Same input and output variables with the ANN model were used.Linear regression approach was used, and the following regression equation was developed: From the linear regression model, it can be observed that aerosol transmission has a very high impact in the output.On the other hand, the clear sky UVI is placing moderate impact on output, while the air operation data of Jackson does not have any significant impact.The developed linear regression model produced a statistical accuracy measure R 2 of 0.79 with standard error of 13.29.In comparison, an R 2 value as 0.998 was obtained via the developed ANN model.This translates to about 25% increase in prediction accuracy when compared to the regression model.The regression model comparison graph is shown in Figure 8.As can be noted, a wider cloud is shown in Figure 8 in contrast to the very thin cloud depicted in Figures 4-7.

V. SENSITIVITY ANALYSIS
Sensitivity analysis is a systematic study of the behavior of the model that reacts over ranges in the variation of inputs and parameters [1].It can be used to understand whether the predicted model behavior is constant under all parameters or it is changing with respect to change of independent variables.The basic objective of research was to identify the factors affecting ozone concentration and to develop an appropriate model for predicting ozone levels of urban areas of Jackson, Mississippi based on the UVI index data and air operation data.From the correlation matrix, it was observed that some of the independent input variables were highly correlated, namely day, clear sky UVI, cloudy sky UVI, cloud transmission, solar zenith angle.Therefore, sensitivity analysis was not carried out on these variables.Aerosol transmission and air operation were the only two independent input variables found to have the least correlation values.Therefore, sensitivity analysis was carried for these two variables by keeping all other variables constant.Accordingly, ANN output vs. aerosol transmission and air operation sensitivity plots were obtained as shown in Figures 9 and 10, respectively.It can be observed that both variables do not impact on ozone concentration levels.Sensitivity analysis of aerosol transmission on ozone concentration.

VI. RESULTS DISCUSSION
Results obtained from ANN and regression models clearly depict the performance and accuracy of both models.The comparison of the prediction accuracy measures (Table II) shows that the ANN model excels.Sensitivity analysis indicates that there is no impact found on the ozone levels from prevailing values of aerosol transmission and/or air operation.However, ANN model implicates that the coefficient of determination has apparently the same value.Thus, train, test, validate and train-all stages of ANN also speculate that data is smooth.ASE and MARE show that there is very little error between observed and predicted values.A Microsoft Excel interface (based on developed ANN prediction models) was developed to project the ozone concentration levels from a given set of input variables to facilitate the prediction process (Figure 11).It is worth noting that two ANN models were implemented in the interface where model 1 is based on training, testing and validation stages only, and model 2 is based on the train-all data.The prediction accuracy of model 2 is expected to be higher than that obtained from model 1.However, since both models yield very high prediction accuracy, the difference in this case will be minimal.Both models are included in the interface in order to enable the user to assess the predicted values from two different perspectives.It is worth noting that both models have similar architecture of 2 hidden nodes and similar number of connection weights.However, the values of the connection weights are different because training data sets for each model are different.

VII. ECONOMICAL ANALYSIS
There are many ways of measuring ozone in the atmosphere, such as aircrafts, high altitude balloons, satellites, and on ground instruments [13].Table III shows the relative equipment, cost, expertise and accuracy for atmospheric ozone measurements.Table III implies that the various methods used to measure ozone air pollution cost much more than the ANN calculating technique.ANNs have been widely used in various sectors and have played a vital role in modernizing computer assisted techniques by handling large data sets and predicting with quite small error.

VIII. CONCLUSION AND RECOMMENDATIONS
In this study, ANN approach was used to predict, compare and verify ozone concentration at Jackson, Mississippi.ANN-based models were observed to perform better than linear regression model.This research also illustrated the economic viability of all models for calculating and measuring ozone in the atmosphere.The developed models can be beneficial (via the use of a user friendly interface) to government agencies and other stakeholders.By predicting ozone concentration, government agencies can take preventive measures.This study can also be extended further by including additional input variables such as VMT (vehicle miles travel) and emissions from the industry within the Jackson region that might impact the ozone concentration levels.