Classification of Electrical Power System Conditions with Convolutional Neural Networks

Undesirable operation of a distant relay at the occurrence of stressed conditions is a reason for blackouts. There are a few computational intelligent methods available in the literature for avoiding relay maloperations. However, because of variations in the system parameters and expansions of the network, the performance of these techniques can be degraded. To solve this issue, data mining approaches have been introduced. The existing data mining approaches need improvement in terms of accuracy and error rate while discriminating fault and stressed conditions. In this paper, a Convolutional Neural Network (CNN) based classifier is proposed for identifying various faults and differentiating fault and power swing situations. The data are collected from the IEEE-9 bus system by Phasor Measurement Units (PMU) and the proposed CNN classifier model classifies system conditions like normal, fault, and power swing. The outcome shows that the classifier has high accuracy and low error rate compared to other classification models such as Naïve Bayes, Decision Tree, and KNearest Neighbor. Furthermore, the proposed CNN model is validated with the use of TensorFlow framework to demonstrate its superior performance. Keywords-convolutional neural networks; distance relay; phasor measurement units; power swing

INTRODUCTION Increasing power demand leads to the development of proper infrastructure to transmit power from generation centers to areas located at large distances with long distance transmission lines. The main objective of all power systems is to maintain continuous power supply and minimize power transmission losses. However, natural events or equipment failure may lead to faults. If the fault persists, it may lead to long term power loss, blackouts, and permanent damage to some equipment. To prevent such undesirable situations, temporary isolation of the live system has to be done as soon as possible. Distance relays are generally used for primary and back-up protection of transmission lines. A distant relay takes a decision based on local voltage and current measurements. Under stressed conditions, the relay finds it difficult to differentiate a fault and a stress condition and consequently there is a possibility it maloperates. Power system networks are recently equipped with synchro phasor based Wide Area Monitoring Systems (WAMSs). Issues like growing input data size, uncertainty associated with them, complex and nonlinear behavior of power systems, required support in the decision making process of the distance relay, and the complexity of the network led to revisions in the transmission line protection. One of the major reasons for power system instabilities is the lack of the protective relay's ability to discriminate faults from other stressed conditions [1]. Outages like generator outage, planned line outage, line faults, and relay maloperations are some of the factors that cause stress in the system [2]. The malfunction and stress caused in the system leads to power system outages [3,4]. Algorithms depending on the rate of progress of voltage and high-frequency substance of signals are proposed to recognize faults from other stressed occasions [5,6]. Such methodologies require exceptional observations for the approximation of the high-frequency substance of waves. Hence, it becomes a necessity to be able to discriminate between fault and stressed conditions occurring in the transmission lines, which is a major challenge in the domain of power system protection. In this research, one of the stressed conditions is addressed.
To perceive the power swing situation and avoid undesirable tripping of lines, Power Swing Blocking (PSB) is mostly used [7][8][9]. Several algorithms based on PMU synchro phasor data to accomplish backup protection have been proposed [10][11][12][13][14][15][16]. However, these methods have been furnished with PMUs on both sides. Numerous power swing detection schemes have been addressed by observing the instantaneous change and the rate of progress of power and current waveforms [17][18][19][20]. Authors in [21] introduced a method for discriminating fault occasions and stressed conditions based on an examination of voltage in various ways. An algorithm to reduce the quantity of synchronized devices in the system is discussed in [22]. Authors in [23] exhibited a method for the back-up protection of transmission lines by utilizing synchro phasor estimations with a smaller number of PMUs contrasted with conventional wide-area protection methods [23]. Various techniques to discriminate stressed events from fault occasions and avoid zone-3 maloperation by using PMUs are presented in [24][25][26][27][28][29][30][31][32].
Based on parameters like voltage, angle, frequency, and damping data from WAMS, fuzzy logic and ground-breaking techniques have been presented for avoiding relay malfunctions at the time of a power swing [31][32][33][34][35]. However, the drawback of the fuzzy logic based method is that it utilizes explicit knowledge base fuzzy rules and gives insufficient details about the fault-symptom relationship. Although the above mentioned methodologies are able to accomplish backup protection, the complex nonlinear behavior of a power system demands computationally efficient methods. This can be accomplished by data mining techniques. Data-mining is a non-parametrical statistics-based method [36] which is well suitable for analyzing power systems' behavior. Data mining approaches for classifying fault and stressed conditions based on the DC offset attributes, the fundamental component of the current signal, and differential features [34,39] and on various parameters for classifying fault and stressed conditions [38,39] have been proposed. Data mining algorithms addressed in the literature are Random Forest, Support Vector Machine (SVM) and Decision Tree. Random Forest algorithm is more complex, requires more computational resources, and is timeconsuming. SVM is efficient but not suitable for large datasets and also it may underperform if the target classes overlap. The Decision Tree gives higher accuracy, however its value is less than 100% for the specified application. This shows the necessity of testing the suitability of other data mining algorithms for the identification of faults and the discrimination of faults and stressed conditions. CNNs are very powerful and efficient in terms of memory, complexity, accuracy, minimum error rate, and efficiency in handling large datasets. Therefore, a data mining approach using a CNN-based classifier for the identification of faults and differentiation of faults and power swings is proposed in this paper. Also, an attempt has been made to observe how a statistical-based classifier (Naive Bayes), a distance-based classifier (K-Nearest Neighbor) and tree-based classifier (Decision Tree) are suitable for the above-mentioned research problem.
II. MODELING OF SYSTEM COMPONENTS The system considered for study is the IEEE 9-bus system, which comprises of 9 buses, 3 generators, 3 loads and 3 transformers. Figure 1 shows the single line diagram of the IEEE 9-bus system. The bus data of are represented in Table I. PMU is an ideal measurement system for protecting, monitoring, and controlling a power system. Discrete Fourier Transform (DFT) based PMU is modelled/designed in this work. DFT-based PMU is employed to extract the fundamental frequency components from the complex form of the signals. DFT evaluates the Fourier coefficients for the given data sequence [40]. In the system under study, 6 PMUs are positioned in the buses 4, 5, 6, 7, 8 and 9 and full observability is considered. Voltage and current are obtained from PMUs at the specified busses and are used to calculate the required information at all buses and lines. The data from the PMU are received at 20kHz sampling frequency.

III. PROPOSED METHODOLOGY
The proposed methodology is presented in Figure 2. Voltage and current at the buses are measured using the PMUs placed at the IEEE 9-bus system. These measurements are used to calculate information at all the buses. This information is nothing but the dataset which is preprocessed and given to the CNN classifier (for training and testing). This in turn classifies the system parameters into Normal, Fault, or Power Swing conditions. If the condition is a fault condition, then the CNN will identify the fault type.

A. Dataset
The data are collected for system conditions like Normal, Fault and Power Swing. The PMUs placed at the respective buses provide the information of voltage and current of each bus. From these values, eight attributes/parameters, namely the voltage magnitude, the phase angle of the voltage, the current magnitude, the phase angle of the current, the impedance magnitude, the phase angle of impedance, the real and the reactive power are calculated for each bus. A total of 6 buses are selected for PMU placement and a total of 8 parameters are calculated for each bus. Therefore, a total of 48 (6×8) attributes are obtained in this process contributing to 48 columns of stored data. The rows of the dataset depend on the simulation time and roughly range around 40000 for a simulation time of 2s for 20kHz sampling frequency. This shows that the dataset collected is very large and it cannot be considered directly for classification. Therefore, it needs to be preprocessed.

B. Data Preprocessing
The dataset collected is preprocessed using Probabilistic Principal Component Analysis (PPCA) technique combined with Partial Least Squares Regression (PLSR) algorithm, for data reduction where the size of the data is reduced. In the PPCA model, every piece of data is stated with probabilistic distribution and a new model is reconstructed followed by the linear combination among the principal component and the data. Then, the dimension of the new model is reduced from the original data. Finally, the output from the PPCA is further reduced in dimension by using PLSR algorithm. This preprocessed dataset is split into two sub-datasets: training and testing. The training dataset along with its corresponding labels is fed to the CNN network for training. Then the test dataset is sent as input to the fully trained CNN network for classification.

C. CNN Based Classifier
A CNN is a very powerful data mining machine learning algorithm for classification problems. The main benefit of using CNN is that it can function on large datasets with higher classification accuracy. A CNN is a neural network including a feed-forward structure which comprises of three kinds of layers: Convolutional Layer (CL), Sub-sampling Layer (SL), and the Fully-linked Layer (FL). CL convolves the contribution. The SL is connected directly after every CL. SL diminishes the dimension of information input features and the total parameters in the system. FL is a conventional feedforward neural system that utilizes soft-max function as a stimulation function in the outcome [41,42].
After preprocessing, the values extracted from the simulation are labeled as Normal, Fault, Power Swing and are given for training to the CNN. The trained CNN classifier then is called to classify the system condition fed with the test samples.

A. Dataset Collection
The proposed system is modeled and simulated with MATLAB/SIMULINK. The SIMULINK Model is presented in Figure 3 which shows the modeling of the IEEE 9-bus system with six PMUs placed at the buses 4, 5, 6, 7, 8 and 9. Initially, the system is simulated under normal condition and data for Normal state are collected and labeled accordingly. Then data for the following fault conditions are collected: at time durations of 0-0.5s, 0.5 to 1s, 1-1.5s and 1.5 to 2s. The fault conditions are: • Single line to ground fault, i.e. A-G, B-G, and C-G.
• Double line to ground fault, i.e. AB-G, BC-G, and AC-G.

B. Pre-processing
A combination of the PPCA method and the PLSR algorithm is used for preprocessing the proposed system. By using the coefficients of the PPCA model, the size of the data attributes is reduced or even minimized. In the studied system, the output of the PPCA is 2 rows and 48 columns. i.e. the data gets reduced from 39601 rows to only two with the lower and higher coefficients of each column. Since, there are 48 attributes, the size of the dataset is reduced to 2×48. After extracting the output from the PPCA, the PLSR algorithm is used to mine the feature values into the sum score. This sum score values are added to the eight attributes/columns respectively. As a result, 6 values/columns (48/8) per data sample are produced. These 6 values for each case sample are efficient and are further used in the CNN for classification. The seventh/last column consists of the class labels. There are three class labels (Normal, Fault, and Power Swing) A snap-shot of the pre-processing output is shown in Figure 10.

C. Discrimination of Fault and Power Swing with the Proposed CNN-based Classifier
All the required data attained from the IEEE 9-bus system which utilizes PMUs are ready after the preprocessing stage to be given to the CNN classifier. The implementation and training of the CNN is depicted in Figure 11 which shows its architecture diagram. The architecture includes an input with 6 data values from the output of the preprocessing for each case. There are total 78 observations/cases as mentioned above. Next, this input is multiplied by a randomly generated weight. The multiplied input is added with bias for biasing the activation function. There are 10 hidden nodes used in the hidden layer. From the hidden layer, the values are then passed to the output layer for classification. In the output layer, the random weight is multiplied and bias is added in the same manner. The output layer consists of three nodes (class labels) for Normal, Fault, and Power Swing conditions. Fig. 11. Implementation and training of the CNN Training and testing data comprise of all the functioning situations stated above with various time frames. Figure 12 shows the output of the presented classifier during Normal condition, which is assumed as '0' in this system. It indicates that there is no fault or power swing conditions in the system. The display for fault conditions is 1 and power swing is -1. Figure 13 shows the output of the presented classifier during a Fault condition. The proposed classifier also identifies the type of the fault, i.e. single phase fault, two phase fault, and three phase fault, when a fault condition occurs. One such result is shown in Figure 14. A sample screenshot shown in Figure 15 clearly shows the implementation of the proposed The performance of the CNN classifier is analyzed and evaluated with measures such as accuracy, error rate, and specificity. In general, the ratio of correct predictions over the total number of instances evaluated is given by accuracy and the ratio of incorrect predictions over the total number of instances evaluated is given by error rate. Specificity is the ability of the classifier to identify negative results. The statistical summary of the proposed classifier is shown in Figure 16. The classification accuracy achieved using the CNN model in MATLAB is 98.72%, the error rate is 0.0128 and the specificity is 1. Also, the Figure shows the detailed process of classification with statistics for all the 78 observations that are obtained after preprocessing (as mentioned above) which in turn are given as input to the classifier. We can conclude that the classifier is able to classify/discriminate the system conditions effectively. The preprocessed dataset was also given as input to the Decision Tree (DT), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) classifiers in order to compare the performance of the proposed CNN classifier in terms of accuracy, specificity, error rate, and execution time with their performance. NB is a simple, effective, statistical/probabilistic based classification approach based on Bayes' theorem. KNN is the simplest machine learning algorithm and it is utilized for classification applications. A DT has the structure of a tree and is a simple machine learning approach where the data are classified on attribute values by continuously splitting according to a certain parameter [43,44]. The results of these classifiers for the same dataset are shown in the form of statistical summaries in Figures 17-19.  Figure  23 shows the comparison of the proposed CNN with the other classification methods in terms of execution time in seconds. It shows that the CNN takes lesser time to execute than Decision Tree, KNN, and Naive Bayes classifiers. An ROC (Receiver Operating Characteristic) curve is a measure in the form of a graph that shows the performance of a classification model [43,44]. This curve plots two parameters namely True Positive Rate and False Positive Rate. The comparison of the proposed CNN with NB, KNN and DT in terms of ROC curve is shown in Figure 24. It can be seen that the CNN has better ROC performance compared to the other classification methods.
From the results, it is shown that the proposed CNN classifier has higher accuracy, higher specificity, and lower error rate in comparison with the existing classification methods in the discrimination of fault and power swing situations. By the ROC curve graph, it can be concluded that CNN has a very good performance, because in general, a perfect classifier should have an ROC curve which will go straight up the Y axis and then along the X axis. Also, the ROC curve will sit on a diagonal for a less powerful classifier while generally for most classifiers it lies in between.  Figure 25. The model summary depicts the CNN layer design with various layer types and sizes. The performance of the classifier is found to be efficient using the above framework. The class statistics which shows the detailed process of classification is shown in Figure 26. The confusion matrix (a metric which summarizes the performance of the classification algorithm in the form of a table) and the overall statistics of the performance of CNN are shown in Figure 27. From Figures 26  and 27 it is seen that the accuracy achieved for classification using the TensorFlow model of the CNN is 98.7179%. A plot of accuracy and losses with the variations in the number of epochs is displayed in Figure 28. This plot shows that accuracy increases and loss reduces with epochs.
The accuracy achieved for classification using the TensorFlow and the MATLAB CNN models is 98.7179% and 98.72% respectively. Thus, we can conclude that the proposed model of the CNN classifier is proved to be better compared to other data mining models for the addressed problem. observability. The classifier discriminates whether the system is in normal, fault, or power swing condition. Additionally, it identifies the type of the fault that occurs in the system. The proposed CNN-based classification technique gave accurate results for all tested conditions without compromising accuracy. The represented outcomes established that the classification accuracy of the proposed classifier is high and has lower error rate compared to other known classification methods, namely Decision Tree, KNN and Naïve Bayes classifiers. Furthermore, the proposed classifier has been validated by the TensorFlow framework. Thus, the main objective of this research which was achieving better efficiency, compared to the ones reported in the literature, using the data mining approach has been accomplished. Moreover, the present work can be extended to design a new technique that can be applied to discriminate fault and other stressed conditions. This would enhance the performance of the distance relay and can prevent its maloperation.