Classification of Volcanic Rocks based on Rough Set Theory

Classification of volcanic rocks is a fundamental task in the geologic studies. Volcanic rocks are igneous rocks that cooled rapidly above the surface of the Earth's crust. They are classified according to their oxide chemical content. Furthermore, volcanic rocks can also be classified numerically by statistical means. But these methods are mostly dependent on human expert decision making and have a high cost. In this paper, a novel approach in the classification of volcanic rocks is proposed. This method is based on the rough set mathematical theory. The continuous data of the information system are firstly discretized using the information loss method. Secondly, the discretized decision table is reduced and the decision rule sets are extracted. The results are consistent with previous methods and show that the proposed method reduces time and calculation costs. Keywords-decision rules; information loss-discretization; rough set; volcanic rocks

INTRODUCTION One established area in geology is the research of volcanic rocks. Mineral composition in volcanic rocks is affected by the chemical magma structure and the physical-chemical conditions during crystallization. Different approaches are used for volcanic rock classification based on genetic, textural or chemical composition [1]. The genetic method classifies rocks according to their form. This method is only an initial solution, it does not say anything about mineralogy, rock chemistry, and cannot discriminate between basalt and andesite. Texture methodology depends on the shape, size, and structure of the grain of different rock minerals. This method has the same limitations as a genetic classification [2]. Chemical classification requires a complete chemical analysis of the rock. Volcanic rocks are categorized on the basis of their mineral or chemical composition. They are classified into basic rocks, acidic rocks, intermediate rocks, and various types of ultrabasic rocks [3]. On the basis of mineral composition, igneous rocks are classified into silicic, intermediates, mafic, and ultramafic rocks [4]. Several methods to develop classification architectures have been suggested including Artificial Neural Networks (ANNs) [5][6][7], decision trees [8], statistical techniques [9][10], and decision-making rules [11][12]. Rough Set (RS) theory is one of the most motivating areas of computational intelligent research, having become increasingly popular with geologic applications. One of the main advantages of RS is that additional information on data such as probability distribution or grade participation is not needed [13], and it doesn't need mean and covariance matrices calculation. It takes time for ANNs to achieve acceptable accuracy, and the decision tree takes more time for computations because it depends on entropy, while such training in RS is not required.
In this paper, the collected features were firstly discretized through information loss technique. Then, RS was implemented as a feature reduction approach and rules extraction method was utilized on the discretized decision table to execute classification. The suggested technique finishes with a minimum of chemical composition that has a direct effect on the classification of volcanic rocks. The results indicate that the introduced method reduces significantly feature dimensions and increases classification accuracy while the results are consistent with previous approaches. The obtained results show that the proposed method reduces time and calculation cost.

II. ROUGH SET THEORY
To extract volcanic rocks information effectively, a large number of characteristic data must be objectively filtered out. When the best combination of characteristic parameters is achieved, it can be used to identify volcanic rocks precisely. After evaluating many non-linearity computational methods, no further data or previous knowledge were found to be needed for RS theory [14]. It can exclude individual or unimportant characteristics to effectively reduce decision systems with the same database classification ability [15]. The study of geological and volcanic rock information based on RS is a sort of new solution to the mainly geological high-dimensional complex NP (Nondeterministic Polynomial) problems.

A. Information System
We take the identification and extraction problems of mineral material, such as classification of the volcanic rocks, as a restricted method expressed as , where, U is the non-empty finite set of samples called universe, A is nonempty finite set of parameters,

B. Indiscernible Relation
then i x and j x are indiscernible and the equivalence relation R B is given by:

C. Attribute Reduction
One way to reduce dimensions is to keep only the attributes that preserve the relationship of indiscernibility, i.e. the accuracy of classification. The same set of equivalence classes are provided by the selected set of attributes which can be accessed with the entire attribute set. The other attributes are redundant and can be reduced without affecting the precision of classification. Typically there are many subsets of such attributes known as reducts, mathematically, The core is the set of all attributes of decision table, which cannot be removed from knowledge in the reduction process where III. DISCRETIZATION OF DATA The real value of attributes should be quantified data when using the RS theory in dealing with information systems. Discretization means dividing the continuous attribute into numerous sections, replacing each with a discrete value. There are different methods for discretization of data such as frequency algorithm, clustering method, the Naive scalar algorithm, etc. We will use the discretization method in this paper dependent on loss of information. The steps of the algorithm are: Step 1: Let S be the universe set and X a feature set, S is ordered by ascending order according to X values. The result after sorting is 1 2 3 , , ,..., n x x x x .
Step 2: Construct the initial interval distribution 1 2 3 , , ,..., n I I I I according to the equation . Then merge into an interval some neighboring intervals with the same parameter value of classification.
Step 3: Evaluate the loss of information for each m neighboring intervals according to: Step 4: Select one neighboring merger interval that has the least information loss and thus get a new interval.
Step 5: Go to Step 3 when the information loss from the present step is less than k times the last stage. Then obtain the discretized samples.

A. Extraction and Assessment of Parameters
The presented volcanic rocks are parts of the Egyptian basement along the Red Sea coast and southern Sinai. In this research, 710 samples of the volcanic rocks were collected. According to the average chemical composition of the samples, the average chemical composition component was selected. Materials such as silicon dioxide (SiO 2 ) A1, titanium dioxide (TiO 2 ) A2, and water (H 2 O) A13 were selected as the condition attributes, while there are 6 decision classes, namely basalt, basaltic andesite, andesite, dacite, rhyodacites, and rhyolites. The used samples are given in Table I. As volcanic rock parameters have continuous values, the original information should be transferred to quantized data. After that, the values of the condition factors were divided into several intervals in accordance to the information loss, for instance the parameter's A1 break points are 51.865, 54.785, 61.725 and 71.92. The discretized attribute values are shown in Table II. Table I is  converted into Table III according to the algorithm for quantization based on information loss. The rock types as decision attributes are expressed as {1, 2, 3, 4, 5, and 6}, as shown in Table II.

B. Volcanic Rock Type Classification-Rules
Once the decision table is established, the parameters should be reduced by applying the proposed approach and the rules for correlating information between features and volcanic rock types should be obtained. Based on the approach of roughsets, the most important variables can be filtered from the initial variables, with the goal being to obtain the optimal combination of parameters (parameter structural optimization). Without losing the essential information that has a direct and indirect relation to study objects, this filtration can reduce the dimensions of space and simplify the system. Based on the RS analysis, the core parameter is { }  Table IV. Consider the third rule of Table IV as an instance to illustrate the meaning of this approach. When attribute values are 51.865≤A1<54.785, 3.525≤A10 and A11<2.015, the rock type is classified as 2 (basaltic andesite). "-" in the

V. COMPARING RS AND LINEAR REGRESSION METHOD
The linear regression approach represents the linear relationship between independent and dependent parameters.
This approach presents the interaction as an equation that combines the condition attributes with the decision parameter. The decision variable is given by Y and the condition attributes by 1 2 3 , , ,..., n x x x x , where n gives the number of condition attributes [15]. The relationship among Y and 1 2 3 , , ,..., n x x x x , is estimated by: ... n n Y a a x a x a x a x = + + + + where, 0 1 2 , , ,..., n a a a a represent the regression coefficients. In this article, an attempt was made to define the best chemical combination of minerals for rock type classification, so the dependent parameter is the volcanic rock type classification and the independent parameters are volcanic rock characteristics. Nevertheless, problems remain in the formulation of regression equations used to choose the best chemical combination, since it is technically challenging or even impractical to use all parameters and variables to construct the regression equation. Therefore, the stepwise approach is used to find the best arrangement of attributes. In this methodology, various parameters are utilized to build up the best linear relationship with the most significant estimation of 2 R by different condition variables. In this strategy, at first the magnitude of the correlation coefficient is determined between every independent and dependent variable. This is done to find which condition parameter can provide the greatest degree of correlation with the dependent parameter. This situation continues until the best second variable is established for the independent attributes. This procedure proceeds until the expansion of another condition parameter to the model negligibly affects 2 R . Therefore the provided parameters are considered to be the most important defined parameters for volcanic rock classification in a linear regression equation that is developed in these steps. Table V shows the results of the regression equations. The results show that the RS model has the highest 2 R . Therefore, in the RS method the accuracy and precision of the approximations are better.    A1  A3  A10  A11  1  0  ---------1  1  0.75  2  1  0  ------1   VI. CONCLUSIONS This study suggested a methodology for classifying volcanic rocks based on rough set theory. According to rough set theory, the factors that influence the classification of volcanic rocks were examined through reduced attributes and the four principal variables which influence the classification of volcanic rocks were silicon dioxide (SiO 2 ), titanium aluminum oxide (Al 2 O 3 ), sodium oxide (Na 2 O) and potassium oxide (K 2 O). The implementation of this approach to volcanic rocks has shown that the method is realistic, workable and gives guidelines for future projects. In addition, the obtained results demonstrate that the proposed method reduces time and calculation cost.