An Enhanced Binary Classifier Incorporating Weighted Scores

In this study, an approach is being proposed which will predict the output of an observation based on several parameters which employ the weighted score classification method. We will use the weighted scores concept for classification by representing data points on graph with respect to a threshold value found through the proposed algorithm. Secondly, cluster analysis method is employed to group the observational parameters to verify our approach. The algorithm is simple in terms of calculations required to arrive at a conclusion and provides greater accuracy for large datasets. The use of the weighted score method along with the curve fitting and cluster analysis will improve its performance. The algorithm is made in such a way that the intermediate values can be processed for clustering at the same time. The proposed algorithm excels due to its simplistic approach and provides an accuracy of 97.72%. Keywords-weighted score; classification; clustering; deviation; threshold; SVM; decision tree


INTRODUCTION
Classification is a data mining technique in which a collection of data is categorized into classes in which the training dataset may or may not have class labels.A dataset may have two or more class labels.In this work we are focusing on binary classification using clustering technique based on curve analysis and weighted score method followed by verification.To illustrate with an example, let us suppose that we have a dataset containing data about spam from a repository.We want to identify the data points above and below the threshold level which are classified as spam and not spam respectively.The threshold level can be obtained through sorting and processing of the dataset.The proposed algorithm preprocesses the dataset to find the weighted score as well as to predict the threshold value, which is then represented graphically.After this step, verification is done using cluster analysis.Observations on various datasets were found to be accurate to a high degree.Deviations of various data points from the threshold values were obtained and various inferences were found.We have also calculated the individual effect of an attribute with respect to its effect on classification.Data provided by datasets contain hidden information that might not be known to the user.An effort has been made to develop a new algorithm to facilitate mining techniques.The simplistic approach of the algorithm is easy to understand and implement.
The proposed algorithm is based on clustering which acts as a stable preprocessing method for binary classification.Weighted score method assigns different importance degrees to the instances of a dataset.The proposed classifier calculates the mean of each sample which is multiplied with each attribute's value summed up to assign a weight to that sample.A threshold value is taken and plotted data points fall below or above it.The minimum and maximum values among the weighted sample sums are subtracted from the threshold value which is halved to obtain the centers of two clusters.Clustering is performed using these centers and by taking maximum distance into consideration which will be the same with the distance between a center and the threshold value.The clusters obtained correspond to the binary class labels which classify the dataset.Observations are cross verified using the clustering method.Weighted score is a simple technique and also incorporates the individual contribution of an attribute consisting of its weighted score in its contribution to the deviation of the data point from the threshold.

II. LITERATURE REVIEW
Various studies have been proposed for classifying datasets into two categories.Previous researchers utilized different classification approaches.SVM (support vector machines), is basically used for classification and regression analysis and employs supervised learning techniques.In SVM algorithm, new examples are assigned or classified into categories and therefore it is regarded as a non-probabilistic classifier.SVM can be thought of as a clustering algorithm in space in which points belonging to a cluster are distant from points of other clusters.In that space a hyper-plane divides the points in groups.A particular hyper-plane with the characterization of minimizing the total distance of the data points on either of its sides is selected.This is also called a linear classifier.There are various variations to the basic approach of the SVM namely linear kernel SVM, polynomial kernel SVM and radial kernel SVM.The most efficient method for fitting SVM is the sequential minimal optimization (SMO) method.It breaks the problem down into sub-problems that can be solved analytically rather than numerically.There are various SVM applications like the recognition of standing people in a picture.Authors in [1] used SVM along with k nearest neighbor (KNN) for visual category recognition.Authors in [2] used variations of SVM to predict the future popularity of social media messages.The disadvantages of SVM are that the theory only covers the determination of the parameters for a given value of the regularization and kernel parameters and is dependent on the kernel choice.As a result, SVM comes up with the problem of overfitting from optimizing the parameters to model selection.Kernel models can be quite sensitive to overfitting the model selection criterion [3].In [4], local space time features were used for recognizing complex motion patterns using SVM.
A decision tree is a predictive model to go from observations and related choices about an item to possible outcomes about the item's target value.It has various applications in statistics, data mining and machine learning.In this structure, each node denotes a test on an attribute, leaves represent class labels and branches represent conjunctions of features that denote the test outcome.Besides being simple to interpret and understand, decision trees are able to handle both categorical and numerical data [5].To solve the problem of fragmentation and replication, a notion of decision graphs has been introduced which allows disjunctions or joins.There are assumptions taken into consideration regarding decision tree algorithm.At the beginning, the whole training set is considered as the root.Feature values are preferred to be categorical.If the values are continuous then they are discretized prior to building the model.Records are distributed recursively on the basis of attribute values.The decision tree algorithm is sensitive to root selection.If the dataset consists of n attributes then the decision of which attribute to place at the root or at different tree levels as internal nodes is a complicated step.Any random node cannot be placed at the root.If the random approach is followed, it may give bad results with low accuracy.Placing attributes is done by statistical approach.A variation of this weighted class based decision tree [6] has been proposed in which weights are easy assigned according to the importance of class labels which are further classified using a decision tree.The dataset is split in this approach which might potentially introduce bias where small changes in the dataset can introduce big impact.Decision-tree can lead to overcomplex trees that do not generalize well the training data.
Studies have shown that classification issues are often more precise when using a combination of classifiers which outperform a highly specific classifier [7].Using a combination of classifiers noisy data can be handled in a better way with augmented accuracy and speed even though complexity issues may emerge [8].Weighted score method assigns different importance degrees to instances of a dataset and is often used as a pre-processing method.Automated weighted sum (AWSum) uses a weighted sum approach where feature values are assigned weights that are summed and compared to a threshold in order to classify an example.It provides insight into the data [9].Authors in [10] dealt with the weighted score fusion method which involves the classification of a fruit based on the diverse and complementary features that can be used to describe it.The algorithm has various steps which involve preprocessing, multiple feature selection, optimal feature selection and SVM.However, the approach requires improvements in the real world environment.A quadratic classifier is used in statistical classification to separate measurements of two or more classes of objects or events using a quadric surface.It is a more general version of the linear classifier.Statistical classification considers a set of vectors of observations x of an object or event, each of which has a known type y referred to as the training set.The problem is then to determine the class of a new observation vector.The correct solution is quadratic in nature.In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be conic sections, thus the quadratic model is the generalization of the linear approach developed to incorporate the conic separating surfaces for classification.Quadratic discriminant analysis (QDA) is closely related to linear discriminant analysis (LDA), where it is assumed that the measurements from each class are normally distributed.Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical.Classification error rate ranges around 20%-30%.
An artificial neural network consists of units (neurons), arranged in layers, which convert an input vector into some output.Each unit takes an input, applies a (most probably a nonlinear) function to it and then passes the output on to the next layer.The networks are defined to be feed-forward, which means that a unit feeds its output to all the units on the next layer, but there is no feedback to the previous layer.Weightings are applied to the signals passing from one unit to another.These weightings are tuned in the training phase (learning phase) to adopt the neural network to the particular problem.The network processes records one at a time, and learns by comparing their classification with the known actual classification.The errors from the initial classification are fed back into the network and used to modify the network's algorithm for further iterations.Neurons are organized into layers: input, hidden and output.The input layer is composed not of full neurons but rather consists simply of the record's values that are inputted to the next layer of neurons.Next there are one or more hidden layers.s with an orang the data poin above the thre osen datasets sh is unique in he datasets, lik data points an e.In the case ccuracy was a along the grap s in the datase d dataset_1 [14].t_1 [14] ed dataset_2 [15] t_2 [15].

h V Jain et al:
l.The o parts ow the nd the scores ed and ded in .The ge dot nts are eshold how a itself ke [12] nd the e of a a little ph, the et and pa elow it.N(x i , , T H ,r i, n) //m hreshold value n x i /r i //influenc ted

Fig. 7
Fig. 7. U Dow The final layer is the output

TABLE Error (
RESULTSS h V Jain et al: