Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized Intersections Using Artificial Neural Networks

In 2015, about 20% of the 52,231 fatal crashes that occurred in the United States occurred at unsignalized intersections. The economic cost of these fatalities have been estimated to be in the millions of dollars. In order to mitigate the occurrence of theses crashes, it is necessary to investigate their predictability based on the pertinent factors and circumstances that might have contributed to their occurrence. This study focuses on the development of models to predict injury severity of angle crashes at unsignalized intersections using artificial neural networks (ANNs). The models were developed based on 3,307 crashes that occurred from 2008 to 2015. Twenty-five different ANN models were developed. The most accurate model predicted the severity of an injury sustained in a crash with an accuracy of 85.62%. This model has 3 hidden layers with 5, 10, and 5 neurons, respectively. The activation functions in the hidden and output layers are the rectilinear unit function and sigmoid function, respectively. Keywords-crashes; unsignalized intersection; artificial neural network; injury severity


INTRODUCTION
Even though intersections constitute a relatively low proportion of the facilities of transportation systems, a significant number of crashes occur at these locations, especially in urban areas.In California for instance, an annual average of 1.5 crashes occur at unsignalized intersections in rural locations, compared to an average of 2.5 crashes per year in urban locations [1].Data from the World Health Organization (WHO) reveal that 1.25 million people die annually worldwide in road crashes.The economic cost of these deaths is estimated to be approximately $260 billion per year [2].In the United States, there were a total of 37,456 fatalities in road-related crashes reported in 2016 [3].Though most of these crashes occurred on road segments, a significant number occurred at or near intersections.Out of the total of 52,231 fatal crashes in the United States in 2015, approximately 4.4% (2,298) of the crashes occurred at STOPcontrolled intersections, while 7.5% (3,917) of the crashes occurred at intersections controlled by traffic signals.
Intersections without any type of traffic control device recorded the highest number of fatal crashes (4,227) [4].
Several studies have investigated the causes of these crashes.These causes are either driver-induced, or occur due to road geometry, road defects, vehicle defects and atmospheric or weather conditions.Various countermeasures have been proposed and/or implemented to reduce the occurrence of crashes at intersections, which in some instances have been successful.In order to effectively reduce the frequency and mitigate the severity of intersection related crashes, it is necessary to explore the predictability of these crashes based on the pertinent factors and circumstances that might have contributed to the occurrence of these crashes.Several studies have resulted in the development of mathematical models that predict crashes on roadways in general and, in a few instances, at unsignalized intersections in particular.These mathematical models include linear regression and machine learning methods.Given the varying characteristics of intersections, it is necessary to develop models that are focused and specific to a particular set of conditions.This study therefore focuses on the development of models to predict the severity of right-angle crashes involving two vehicles at unsignalized intersections in urban centers using ANNs.

A. Contributory Factors for Intersection-Related Crashes
There are many factors that determine the degree of injury sustained by people involved in crashes at unsignalized intersections.However, it is shown that only certain factors are statistically significant predictors.Authors in [5] assessed the degree of injury sustained by drivers involved in angle collisions in relation to the fault status of drivers.The results of the study showed that drivers who were not at fault tended to sustain more severe injuries than those who were at fault.It was further determined that injury severity was affected by factors including time of year, speed limit, age, gender, restraint/helmet use, and alcohol/drug use.Authors in [6] concluded that the road surface condition (wet or dry) was a significant predictor of injury severity.Additionally, female drivers are more likely to sustain severe injuries than male drivers.Crashes at urban areas were determined to result in less serious injuries than crashes at rural areas [6].Also, traffic volume on a major road is a significant predictor of crashes at unsignalized intersections [7].

www.etasr.com
Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … The geometric characteristics and features of unsignalized intersections have also been found to be potential explanatory variables in crash prediction models.Authors in [8] predicted the frequency of accidents at unsignalized intersections in urban areas using negative binomial models.It was concluded that besides traffic exposure functions such as traffic flow, which usually significantly predict crashes, intersection geometrics, absence of street lighting and dedicated left-turning lanes are positively correlated with accident frequency at intersections.Typical geometric characteristics included number of lanes on major road, width of lanes, and presence of median on intersecting roads.The study further revealed that Tintersections with Yield control had a much lower accident potential than those with Stop control.

B. Crash Prediction Models
Several modeling techniques have been employed to predict crashes at intersections.

1) Linear Regression Models
Linear regression modeling is an approach to establish a relationship between scalar responses, also called dependent variables, and other explanatory (or independent) variables.Model parameters are estimated using a data set of values of the response and explanatory variables.The model is usually fitted to the observed data set using the least square approach.Linear regression models take the form: where, y i is the i th dependent variable, β 1 , β 2 … β p are estimated parameters, x i1 , x i2 …x in are the predictor variables of the i th dependent variable and ؏ is the error term.The error term is an independent and normally distributed random variable with mean of zero and a variance greater than zero.Linear regression modelling has been applied in several studies to establish various relationships between the frequency of injury crashes and other traffic characteristics.Authors in [9] investigated the relationship between the number of injuries or property damage only (PDO) crashes that occur annually at intersections and traffic and environmental factors.The crash records (ranging from 1984 to 1987) of 2,488 intersections in California were sampled.The linear regression analysis employed in this study was conducted in two levels.In the first level, a simple linear regression model was developed with injury/PDO crashes per year as the response variable and traffic intensity, expressed in millions of vehicles entering the intersection per year from all approaches, as the predictor variable.In the second model, additional information such as design, traffic control, proportion of cross street traffic, and environmental features were included as predictor variables.
The results of the analysis showed that the accuracy of the model improved as more predictors variables were added.
Though linear regression models are easy to use and interpret, it has been shown that they are not ideal for crash predictions.Crashes are usually sporadic and random in nature and hence are not best fitted by linear relationships.Also, the assumption that the error term is normally distributed is not accurate for crash predictions which are usually discrete and non-negative.Further, some factors have been determined to strongly correlate with each other, thus introducing multicollinearity thereby invalidating such linear models [10].In overcoming the shortcomings of the linear regression models, generalized linear models (GLMs) have been used to model crashes at intersections.GLMs are a flexible generalization of the ordinary linear regression that can accommodate the non-normal distributed error terms.The most common forms of generalized linear models used in crash prediction models are the negative (NB) model and the ordered probit model (OPM) 2) Negative Binomial Model NB models are a generalization of the Poisson regression.Unlike the Poisson models where the variance of the distribution of the response variables is equal to its mean, in NB models, the variance differs from the mean.NB models have been found to be suitable for crash predictions due to the nature of the dependent variables in such analysis.Usually the response required is the number of crashes at a specific location.Such responses are nonnegative integers and generally follow the NB distribution.The distribution is given by the following Poisson-Gamma distribution: where, u is the mean of the dependent variable y, β is an estimated parameter to be estimated, α is the heterogeneity parameter, and x i is the i th the predictor variable.Authors in [11] investigated the relationship between crash frequencies and factors such as traffic conditions, geometric and operational characteristics or roadways, and weather conditions using data of crashes that occurred from 2004 to 2010 on a motorway in Auckland.The NB regression model developed had a goodness of fit, ρ 2 of 0.119.Additionally, several individual predictors such as length of road segments, AADT, number of lanes and shoulder width were found to be significant predictors of the model.

3) Ordered Probit Models
The ordered probit model (OPM) is used in developing models which have an ordered response.This approach in modeling data employs the use of the probit link function.The latent continuous metric underlying the ordinal responses observed are partitioned into a series of regions corresponding to the ordinal categories.Generally, the probability of obtaining a particular outcome is given by: where, y i is an observable ordinal variable, X i is a vector of exogenous variables, β is a vector of unknown parameters to be estimated and and τ j is the threshold associated with the j th ordinal partition interval which are assumed to be of ascending order.OPM has been applied in the development of several crash prediction models which seek to predict injury severity based on several factors.Authors in [12] developed an OPM that sought to relate the severity of crashes experienced at freeway exits.Crash data for 326 locations in Florida were sampled.The results of the study indicated that the factors which significantly influenced crash severity included mainline 4) Empirical Bayes Refinement of the GLM Crash estimates made with GLMs are susceptible to regression-to-the-mean.The regression to mean occurs when a randomly large number of accidents during a period is normally followed by a reduced number of accidents during a similar after period, even if no measure has been implemented.The GLMs do not account for this effect.Hence, to improve the accuracy of the predictions made with GLMs, the empirical Bayes (EB) method is usually applied.The EB method compensates for the regression-to-the-mean bias by pulling the crash count towards the mean.Thus, prior data (observed crash counts) are combined with the predicted crash frequency from the GLM to calculate a corrected value.The corrected value is expected to lie somewhere between the observed crash frequency and the predicted frequency from the GLM.This is expressed as: ( ) where, E is the corrected value, and µ is the average number of crashes (determined from the GLM) [13].

5) Artificial Neural Networks (ANNs)
ANNs are mathematical models inspired by the biological neural networks in the human brain.ANNs are used in engineering to perform complex tasks such as pattern recognition, forecasting, data compression and classification.The effectiveness of an ANN is based on its ability to approximate both linear and nonlinear functions to a required degree of accuracy using a learning algorithm, and to build ''piece-wise'' approximations of the functions [14].Classification or forecasting using ANNs involves training and learning procedure, where, historical data (a set of input data with known outputs) is presented to the network.Usually large amounts of such data are required for the training of the network.The network goes through a learning process by constructing a network of inputs and outputs, and weights assigned to each mapping are adjusted at each iteration.The method by which these weights and bias levels of a network are updated is determined by the learning rule used.Thus, the learning rule helps a neural network to learn from the existing conditions and improve its performance.There are several learning rules used in training neural networks.Notable among the rules are the hebbian, perceptron (error-correction), delta, correlation and outstar learning rules [15].However, the most common known rule is the multilayer perceptron (MLP).MLP basically consists of three layers: input layer, hidden layer, and output layer.MLP is a feed forward network in which information flows from the input layer through the hidden to the output layer to produce the outcome.These layers have interconnected nodes (neurons).The interconnections are assigned weights (representing information flow) which are computed using mathematical functions.The outputs for specific inputs are obtained by adjusting the weights to minimize the errors between the output produced and the desired output by error-back propagation.The MLP is known to be a universal approximator because of its ability to approximate continuous functions on a compact set of real numbers with little assumption made.Activation functions, also called transfer functions, are an essential component of ANNs.Activation functions are models in the output neurons of the ANN which introduce non-linearity into the network.They function by calculating the weighted sum of their inputs and adding a bias, then deciding whether a neuron should be activated or not.The three most common types of activation functions used in an ANN are the sigmoid, the hyperbolic tangent, and the rectified linear unit [16].Authors in [17] utilized ANNs to develop a model to show the relationship between crash severity on urban highways, and traffic variables such as traffic volume, flow speed, human factors and road, vehicle and weather conditions.The study showed that MLP with feed forward back propagation networks provided the best results compared to other learning methods.Network architecture with 2 hidden layers with 17 and 7 neurons respectively were determined to be the best.Mean square errors (MSE) within acceptable range of 3% to 4% were achieved.Also, correlation coefficients of 86% to 87 % were achieved.

A. Study Area
This study is based on data obtained in the District of Columbia (DC).The capital of the USA, Washington, DC is divided into four (equal) quadrants areas: Northwest (NW), Northeast (NE), Southeast (SE), and Southwest (SW) which are further divided into eight wards.As of July 2018, the population of DC was about 702,455 with a growth rate of approximately 1.41% [18].The city is highly urbanized, and it's ranked the sixth most congested city in the United States with each driver spending an average of 63 hours in traffic annually [19].It has a land area of 68.34mi 2 and a total of 1,503 miles of roadway comprised of local roads, collector roads, minor arterials, principal arterials, freeways and interstates [20].Also, the city has about 7,700 intersections of which 1,450 are signalized [21].The American Society of Civil Engineers' 2017 infrastructure report card reported that about 95% of the roads in DC are in poor condition [22].

B. The Crash Database System
Crash prediction models are data dependent and as a result the accuracy of the developed models depends largely on the quality of the available crash data.To ensure that a reliable model is developed, this research utilized traffic crash data from the District Department of Transportation's (DDOT's) crash database called Traffic Accident Reporting and Analysis Systems Version 2.0 (TARAS2).The District of Columbia Metropolitan Police Department (MPD) records traffic crash information at the scene of crashes electronically on a Police Department Form number 10 (PD-10) crash reporting form.The crash data is then downloaded through secure servers from MPD into DDOT's database and are then processed and made available in TARAS2, which is an Oracle-based application.TARAS2 contains data fields that can be broadly categorized www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … under vehicle characteristics, environmental conditions, roadway characteristics, traffic exposure characteristics, as well as crash location, date, time, crash type, crash severity and information on of persons involved.

C. Data Extraction and Encoding
Nine years of crash data (2008-2015) were queried and extracted from TARAS2.The data were then filtered to obtain angle crashes involving two vehicles at unsignalized intersections.Further, the extracted data were cleaned by identifying and removing duplicate and incomplete crash records and irrelevant data fields.In all, 3,307 data points were extracted and used for analysis.The extracted data set contained the following fields: accident complaint number, main street name, side street name, year of accident, month of accident, time of accident, day of week, quadrant of accident occurrence, type of collision, road surface condition, street lighting condition, lighting condition, weather condition, traffic condition, traffic control type, drivers' age, drivers' gender, contributing circumstances, and injury severity.Only numerical data can be analyzed by ANNs.Hence, qualitative data must be converted to quantitative data.Thus, both input and output data must be encoded into either real or integer values.Secondly, binary method (0 and 1) of encoding has been determined to yield better results since it minimizes the loss functions values with respect to the models' parameters.The loss value determines how well the model fits the data set.The lower the loss function value the better the model fits the data set.Table II presents the variables and coding scheme used in this study.

D. Types of Collision
The crash types considered for this study are angle collisions.Three types of angle collisions are specified: rightangle, right turn, and left turn collisions.
• Right-angle collision: This type of collision occurs when the side of one vehicle is impacted by the front of another vehicle which is traveling in a direction at right angle to the direction of the former vehicle.Figure 1 depicts a rightangle collision at an intersection.
• Right turn collision: This type of collision occurs when a vehicle turning right at an intersection is impacted by a vehicle from the other intersecting road.Figure 2 depicts a right turn collision.
• Left turn collision: This type of collision occurs when a left turning vehicle at an intersection is impacted by a vehicle from the oncoming traffic.Figure 3 depicts a left turn collision.

E. Injury Severity
The outcome variable describes the degree of injury severity sustained by persons involved in a crash.The crash database specifies five degrees of injury severity: No injury, complain, non-disabling injury, disabling injury and fatal.Due to the insignificant percentage of fatal and disabling injury crashes in the data set, all complain, injury and fatal crashes were categorized as injury crashes.Table I shows the levels of crashes used in the analysis.

F. Data Standardization
To achieve accurate predictions from machine learning models it is necessary that variables used in developing the models are of equal scale.Also, most optimization algorithms minimize the loss function converge faster when variables are of the same scale.The method of scaling used on this data set is standardization.The raw scores (of the encoded data) are converted to standard scores by subtracting the mean of each variable from the raw score of each observation and then dividing the difference by the standard deviation of the variable.By doing so, the variables are transformed to have a mean of zero and a unit variance.The standardized value, Z, of each score of each variable is given by (5): where, ܺ ത is the mean of the variable, X is the encoded score of each observation of a variable and σ is its standard deviation.

G. Development of Models
The process of classification by ANN is an iterative process of weight adjustments based on information flow that mimics the functioning of neurons in the human brain.The steps below describe in detail how models for crash injury severity classification were developed using ANN: • Selection of network architecture.
• Training of neural network.
• and evaluation of model.

1) Selection of Network Architecture
The network architecture was first set up.A multi-layer perceptron (MLP) feedforward ANN was adopted to develop classification models.An MLP consists of at least three layers: an input layer, hidden layer(s) and an output layer.Each layer consists of nodes or neurons.The neurons of each layer are interconnected with those of the succeeding layer.Also, the neurons of the hidden and output layers are embedded with nonlinear activation functions.The MLP ANN architecture used in this research consists of an input layer with 44 neurons (each neuron represents each of the input variables, X i in Table II) and an output layer with 1 neuron, which is the target or dependent variable, Y.The number of hidden layers and neurons varied for several iterations until the optimal numbers of hidden layers and neurons which produced the best model were obtained.Figure 4 shows the MLP ANN architecture used in developing the model.selecting weights for all interconnections between the neurons of the input and hidden layers.
• Forward Computation: The forward propagation was then implemented by multiplying the weights with the values of the input neurons and the sum products are stored in the corresponding neurons of the hidden layer.The weighted sums are subsequently transferred into an activation function and based on the output of the functions, the neuron is either activated or not.Mathematically this can be expressed as: where, ܸ is the weighted sum in j th neuron of the l th hidden layer, ‫ݓ‬ is the weight coefficient of the j th neuron of the l th layer that is fed from the i th neuron in layer l-1, ‫ݔ‬ (ିଵ) the output of th i-th neuron in the previous layer l-1, ‫ݕ‬ is the output of the of the j th neuron in layer l-1, ф is the activation function which is a rectilinear unit function in the hidden layers and a sigmoid function in the output layer.Hence for the last layer (output layer) l=L, where, ܱ is the output of the n-th iteration.
• Computation of error: The error of the j th neuron of the n th iteration is then computed as where, ‫ݐ‬ is the target output.
• Backward computation: The weights in the network are adjusted based on a local gradient, σ, which is a function of the error, e, and computed as follows: for neuron j in the output layer L, and for neuron j in the hidden layer L, where, k is the succeeding neuron in layer l+1 and ф ᇱ (•) is the derivative of the function ф(•).The weights in the network are then adjusted by the given relation: where η is the learning-rate parameter and α is the momentum constant.
• Iteration: The procedures in the three previous steps are repeated for batches of 3 observations per iteration until the stopping criteria of 100 epochs is met. Figure 5 illustrates the training process.

3) Model Testing and Evaluationl
After the training of the network for the required number of epochs (100), the model was tested using the test dataset.The accuracy of the model was evaluated by the confusion matrix.
The number of hidden layers and neurons in the network architecture was varied and the training process was repeated.This iterative process was done until the model with the best performance was achieved.ANN training process

4) Model Evaluation
The performance of each of the models was assessed using the test dataset.The results were then evaluated by using the data generated by a confusion matrix (CM).A CM contains information about actual and predicted classifications done by a classification system.Each row of the CM represents the instances of an actual class and each column represents the instances of a predicted class.Table III shows the confusion matrix for a two-class classifier.The entries of the CM are defined as follows: True Positive (TP) instances are positive and correctly classified as positive, True Negative (TN) instances are negative and correctly classified as negative, False Positive (FP) instances are negative but wrongly classified as positive, and False Negative (FN) instances are positive but wrongly classified as negative.Based on the CM, the following measures were computed to evaluate the models developed.
• Accuracy (AC): The accuracy is the proportion of the total number of predictions that were correctly classified.It is computed as: • Error Rate (ER): The error rate is the rate at which predictions will be misclassified: • Sensitivity (S): It is the proportion of positive cases that were correctly identified: S=TP/(FN+TP) (14) www.etasr.comArhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … • Precision (P): It is the proportion of the predicted positive cases that were correct: • F-measure (F): It is a measure of the accuracy of the test model computed using S and P. The value of F ranges from 0 to 1, where 1 shows an excellent model and 0 shows a bad model.F-measure is calculated as:

H. Analysis Software
The classification models of all three machine learning techniques were developed by using the high-level generalpurpose programming language Python.Especifically, the Anaconda Python distribution was used.This is an open source distribution with standard and robust libraries for data processing, analysis and machine learning applications.The NumPy and Pandas libraries were imported to facilitate data preprocessing.Also, Tensorflow and Keras libraries were imported to develop the ANN models.In addition, the descriptive statistics of the data were obtained using IBM Statistical Software for Social Scientist (SPSS).

A. Descriptive Statistics
Tables IV and V present the descriptive statistics of the data set.The frequencies of categorical variables are presented in Table IV, while Table V presents the mean and standard deviation of the continuous variable Age.It can be observed from Table IV that the highest number of crashes (1,252) occurred during the off-peak period, from 10:00A.M. to 3:00P.M., while the least number of crashes (176) occurred at night, between 12:00AM to 6:00AM.Most of the crashes occurred on Tuesdays, Wednesdays and Thursdays while Sundays recorded the least number of crashes.The Northwest quadrant of Washington D.C. recorded the highest number of crashes (1,167).Right-angle collision was the most frequent occurring crash type.Most of the crashes occurred under daylight, clear weather and light level traffic conditions.Though most crashes were as a result of no violation on the part of one or both drivers, distracted driving and Stop/Yieldsign violation were also reported as comparatively high contributing circumstances.Among the drivers involved, 3,936 were male and 2,678 were female.Of the 3,307 recorded crashes, 1,274 resulted in injury.It is observed that the rate of injury crashes was highest during the night (41.24%), on Fridays (41%), and in the northeast quadrant (40.44%).Most were right turn collisions (40.69%), absent street lights (39.52%), rainy weather (50.57%), under light traffic conditions (54.78%).Intersections controlled by Yield signs also recorded the highest rate (70.59%) of injury crashes.This is complemented by the fact that the highest rates of injury crashes were a result of at least one driver's failure to comply with a Stop/Yield sign.Thus, the contributing circumstance which resulted in the highest rate (69.94%) is Stop/Yield sign violation.Crashes in which at least on driver was a female recorded the highest rate of injury crashes.A correlation analysis was conducted to investigate the relations between age and injury severity.The results are presented in Table VI.The Spearman's Rho of -0.52 was found to be statistically significant (p=0.03).This implies that, the severity of a crash increased with decreasing age of drivers involved in the crash.

B. Spatial Distribution of Crashes
This section presents the results of the spatial analysis of the crashes using ArcGIS Pro software program.The spatial analysis performed included the spatial distribution of crashes based on injury severity and a kernel density analysis for injury crashes.The spatial distribution and density of crashes are shown in Figures 6 and 7, respectively.Figure 7 shows that most of the crashes were located in the NW quadrant.This covers the downtown and central business district of Washington DC. Figure 7 also shows that higher densities of injury crashes are in the same region of Washington DC.The Tables show the number of models explored and the structure of the neural network.The performance measures (accuracy, error rate, sensitivity, precision and F-measure) of each model were computed and are also presented.The accuracy, sensitivity, precision and F-measure (F) performance measures range from 0 to 1, with values closer to 1 showing models with better performance measures and conversely values closer to 0 showing worse performance measures.In contrast, models with error rates (ER) closer to 0 are better than models with error rate closer to 1.The results of the analysis in Table VII show that after the training of the models, the accuracy ranged from 84.87% to 96.89%.Model 17 produced the best classification accuracy (96.89%) with a corresponding error rate of 3.11%, while Model 15 produced the worse accuracy (84.87%) with a corresponding error rate of 15.73%.Model 7 had the highest sensitivity (S) measure, while Model 15 had the least sensitivity measure.With regards to the precision measure, Model 17 was the most precise (P) model with a precision of 0.9565, while Model 15 was the least precise one.Model 16 recorded the highest F-measure of 0.9490, while the lowest F-measure was recorded by Model 6.The variation of performance measures with varying models is shown in Figure 8. Table VIII presents the results of evaluation of the trained models using the test data set.The results show that the accuracy (after testing) of the models ranged from 76.54% to 85.62%.Model 22 produced the best classification accuracy (85.62%) with a corresponding error rate of 14.38%, while Model 6 produced the worse accuracy.Model 14 had the highest sensitivity measure, while Model 16 had the least sensitivity measure.With regards to the precision measure, Model 15 was the most precise model with a precision of 0.7850, while Model 18 was the least precise model with a www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … precision of 0.6882.Model 15 recorded the highest F-measure of 0.7875, while the lowest F-measure was recorded by Model 6.The variation of performance measures with varying models is shown in Figure 9.The study sought to develop classification models to predict injury severity of angle crashes involving two vehicles at unsignalized intersections using ANNs.A total of 3,307 reported crashes from 2008 to 2015 were extracted from a crash database and used in the analysis.Of the total number of crashes, 1,272 resulted in injury and/or fatality, while the remaining 2,035 crashes were non-injury crashes.The spatial distribution of the crashes showed that the downtown area of Washington DC experienced the highest frequency of crashes.Also, most of the crashes occurred during off-peak periods and under light traffic conditions.Right angle collisions were the most frequent collision type.The combination of driver contributing circumstances which result in injury were Stop/Yield sign violation by one driver, and no violation on the part of the other driver.
The accuracy of classification models developed using ANN generally tends to increase as the number of hidden layers increases.Models with higher accuracies were attained with three hidden layers.Model 22 was the most accurate (85.62%) for predicting injury severity of angle crashes at unsignalized intersections.This model has 3 hidden layers with 5, 10, and 5 neurons respectively.The activation function in the hidden layers is the rectilinear unit function and the activation function in the output layer is the sigmoid function.The confusion matrix of this model is presented in Table IX.We can see that 51.5% of the crashes were correctly classified as non-injury crashes, while 10.3% were wrongly classified as injury crashes.Similarly, 29% of the crashes were correctly classified as injury crashes while 9.2% were wrongly classified as non-injury crashes.F-measure, is a combined measure for both precision and sensitivity.F-measures of the ANN models generally ranged between 0.7 and 0.8, and the higher values of F-measure were achieved with two hidden layers.Models 15 and 22 are the most accurate ANN models for predicting injury severity of angle crashes at unsignalized intersections.

VI. CONCLUSION AND RECOMMENDATION
In conclusion, the most accurate ANN model for predicting the severity of an injury sustained in a crash is a model with 3 hidden layers with 5, 10, and 5 neurons.The activation functions in the hidden and output layers are the rectilinear unit function and sigmoid function.This research explored the ANN machine learning technique.Future research can explore other techniques such as decision trees, K-nearest neighbors and linear discriminants.Also, other types of crashes can be explored at unsignalized intersections.Further, these analyses could be extended to signalized intersections.

Fig. 4 . MLP ANN 2 )
Fig. 4. MLP ANN 2) Training of Neural Network Training of the neural network by backward propagation was carried out in the following sequence: • Presentation of training dataset to the network: The training dataset was imported into the network to commence training.The vector of independent variables was fed into each input neuron connected to neurons of the first hidden layer.The training process was initialized by randomly

Fig. 8 .
Fig. 8. Variation of performance measures for training dataset using ANN

Fig. 9 .
Fig. 9.Variation of performance measures for testing dataset using ANN

TABLE I .
LEVELS OF INJURY SEVERITY

TABLE VIII .
RESULTS OF TESTING ANN

TABLE IX .
CONFUSION MATRIX OF MODEL 22