SLA Management For Virtual Machine Live Migration Using Machine Learning with Modified Kernel and Statistical Approach

—Application of cloud computing is rising substantially due to its capability to deliver scalable computational power. System attempts to allocate a maximum number of resources in a manner that ensures that all the service level agreements (SLAs) are maintained. Virtualization is considered as a core technology of cloud computing. Virtual machine (VM) instances allow cloud providers to utilize datacenter resources more efficiently. Moreover, by using dynamic VM consolidation using live migration, VMs can be placed according to their current resource requirements on the minimal number of physical nodes and consequently maintaining SLAs. Accordingly, non optimized and inefficient VMs consolidation may lead to performance degradation. Therefore, to ensure acceptable quality of service (QoS) and SLA, a machine learning technique with modified kernel for VMs live migrations based on adaptive prediction of utilization thresholds is presented. The efficiency of the proposed technique is validated with different workload patterns from Planet Lab servers .


INTRODUCTION
Resource optimization has been improved significantly by virtualization.It introduced isolation between application and the physical resource [1], it allows live virtual machines (VMs) to seamlessly move between physical hosts.This allows service providers to host high availability applications and to better commit to their level of service governed by a service level agreement (SLA).Most applications in telecommunication industry permit small fraction of downtime or no downtime at all [2].SLA violation may occur due to server's resources being over utilized.Therefore, high availability and faulttolerant systems are crucial in order to maintain such a demanding policy which is costly in terms of capital and operational expenses.Tightly controlled live migration can provide solution to this problem by moving VMs with little or no interruption but that is often against the agreed SLA.However, these interruptions can cause performance degradation which varies between applications [3][4].Thus, predicting live migration at the earliest time possible will significantly contribute to reducing any performance degradation due to SLA violation or from the duration of any interruption.Our objective is to provide a machine learning and statistical based predictive model to predict VM migration and consequently maintaining the SLA.It is a heuristic based predictive model where future SLA violation is to be predicted, then migration decision will be made by a machine learning algorithm classifier.CPU utilization, inter VM bandwidth utilization and memory utilization will be used as potential classifiers.

II. RELATED WORK
Resource optimization in cloud based data center has been extensively investigated in recent years.Authors in [5] suggested that live migration is to be handled by global policies applied to redistribute the VMs, suggestion based on resources classification into local and global policies.Authors in [6] have adopted priority based approach to allocate the resources in the virtualized clusters.In [9] dynamic consolidation problem was addressed by using a heuristic based approach for the bin packing problem.In [8] a threshold-based reactive approach to dynamic workload consolidation has been proposed.However it was applicable for certain types of applications.Popular approaches such as VMware distributed power management [7] have the drawback that they operate on fixed threshold values which is not suitable for dynamic and unpredictable workloads [9].In our proposed model, we introduce an approach to set the threshold values dynamically, depending on VMs historical predicted data of the resource usage by each VM and machine learning as a decision making approach.Static and dynamic resource assignment policies in virtualized data centers is discussed in [10,11].Authors in [12][13][14][15] classified VM consolidation as centralized and decentralized.It was suggested VM migration trigger point to be based on predefined threshold, where on other hand other approaches [13,15] trigger migration after workload analyzed based on learnedintelligent QoS-based threshold and predictive heuristic methods [16].Nevertheless, a few set of approaches [14,15] studied workload-independent QoS-based threshold approaches for the purpose of SLA violation avoidance and efficient migration management.In [16] VM placement problem with traffic-aware balancing (VMPPTB) has been discussed and a longest processing time based placement algorithm (LPTBP algorithm) is designed to solve it.In addition to that, localityaware VM placement problem with traffic-aware balancing (LVMPPTB) is proposed.Authors in [17] proposed a VM placement algorithm named ATEA (adaptive three-threshold energy-aware algorithm) to reduce the energy consumption and SLA violation.It is based on historical data collected from resource usage by VMs to migrate VMs on heavily loaded to lightly loaded hosts.All previous works did not consider inter VM bandwidth and memory utilization effects on VM consolidation problem and on SLA definition especially in applications that do not tolerate any downtime or performance degradation like the telecommunication applications [18].

III. SLA VIOLATIONS DETECTION
The VMs experience dynamic variable workloads, in a way that hardware resources consumed by a VM arbitrarily change over time.During this variation SLA can be breached or the host can be over utilized, i.e. if all the VMs request their maximum allowed physical resources.In such case, the algorithm must have perfect knowledge of the time when the SLA violation will occur before it actually occurs.Live migrations can have negative impact on the performance of applications in a VM during a migration.The length of a live migration depends on the total amount of memory used by the VM and the available network bandwidth.The migration time and performance degradation experienced by a VM j is expressed in (1) [15].
where U dj is the performance degradation by VM j during migration, t 0 is the time when the migration starts, Tm j is the time taken to complete the migration,U j is the CPU utilization by VM j , M j is the amount of memory used by VM j and B j is the available network bandwidth.[15] In (3) x dj is the performance degradation when the allocated resource utilization for VM j is not aligned with the agreed SLA and S j is the total performance degradation by VM j .Thus, from (3) in order to minimize the total performance degradation, either U dj or x dj should be minimized.In this work we concentrate on minimizing x dj as U dj is not only influenced by the CPU utilization U j , but is also depended on the amount of memory used and the available network bandwidth as well.Moreover, it is more likely that VM j will experience performance degradation while the host resources utilization is above the agreed SLA more than during the actual live migration.In order to avoid SLA violation and performance degradation, host should perform regular check on the system utilization where an SLA violation detection algorithm should be executed.One of the earliest methods relied on setting static CPU utilization threshold to differentiate between overload and non-overload states of the host.It is simple but inefficient for dynamic workloads, particularly when different types of applications share a physical node.In such case the system should be able to automatically adjust its behavior based on the workload patterns adopted by the applications [15].

A. Local Regression
Our approach, as depicted in Figure 1, relies firstly on work load prediction based on statistical analysis of historical data collected during the VMs' lifetime.Local regression (LR) has proved its efficiency as predictor method [17].It is a model used to build up a curve from localized subsets of data that approximate the original input with the original data.LR algorithm, derived from local regression algorithm.For each new observation a new trend line is found [15] This trend line is used to predict the next observation ( ) .The new observation can be a host resource utilization such as CPU and memory where t m is the maximum time required for a VM migration

B. Classification Trees
When a class is already known on prior in the training samples, classification trees are effective.Let t p be a parent node and t l , t r left and right child nodes of the parent node respectively.Assume the learning sample with variable matrix X with M number of variables x j and N observations.Let class vector Y consist of N observations with total amount of K classes.Classification tree is based on splitting rule that performs the splitting of learning sample into smaller parts.We already know that each time data have to be divided into two parts with maximum homogeneity of left and right child nodes will be equivalent to the maximization of change of impurity function Δi(t): [19]   ( ) [ ( )] where t c represents the left and right child nodes of the parent node.Assuming that the P l , P r probabilities of right and left nodes, we get: [19]   ( ) ( ) ( ) Therefore, at each node classification trees solves the following maximization problem: [20] arg max , j=1,..., M [i(t ( ) From (8) all possible values of all variables in matrix X for the best split question will be searched through x j <x k K which will maximize the change of impurity measure Δi(t) [19].

C. Support Vector Machine
Support vector machines (SVMs) support nonlinear classification and can find the hyper plane of maximal margin.Given a training set of N data points  is the kth output pattern, the support vector method approach aims at constructing a classifier of the form: Input (10) shows an example of hard-margin SVM with noise free training data to be correctly classified by a linear function.Data points D (or training set) are represented mathematically [20,21]  where x i is a n-dimensional real vector, y i is either 1 or -1 indicating the class to which the point x i belongs to.The SVM classification function F(x) takes the form [20] where b is the bias and is the weight vector , which will be calculated during the training process.First, to correctly classify the training set F(x) (or w and b) we must return negative numbers for negative data points and positive numbers otherwise, for every point x i in D in ( 12) [20] wx-b>0 if y i = 1 , wx-b<0 if y i = -1 (12)

D. K-Nearest Neighbors
The k-nearest neighbors (KNN) is one of the simplest methods for pattern classification.When combined with prior knowledge it can be used to produce significant results [22].In the KNN each unlabeled example is classified by the majority label among its k -nearest neighbors in the training set.Therefore its classification performance depends on the distance metric used to identify nearest neighbors.In the case of missing prior knowledge, Euclidean distances between examples represented as vector inputs are used to measure the similarities in most KNN classifiers.Let Our goal is to learn a linear transformation which we will use to compute squared distances as [21]:

IV. RESULTS AND ANALYSIS
In this work, workload was collected from CoMon project [16].Extracted data was part of more than a thousand VMs's resource uilization distributed across the world.Samples were selected from 6 servers for a period of one week with 5 minutes measurement interval.CPUs and other resources were adjusted manually with different resource utilizations for the purpose of this experiment.The collected data did not contain memory and inter VM bandwidth utilization.In this work average TPR, Friedman rank summations and average ranking were used as performance metrics.TPR is defined as VM migration being correctly classified due to high utilizations.Since the provided training set is not very large, cross validation has been used to train, test and validate the classification techniques, the provided data is divided into 5 folds and each fold is held out in turn for training and testing.
At the initial stage, data is predicted using local regression provided that prediction window is less than or equal to the migration time as in the bond x k+1 -x k ≤ t m and this is crucial to maintain the SLA then different classification techniques are investigated.Sample data were collected for the six servers named 146CS4, cs-planetlab3, cs-planetlab4, Fobos, jupiter_cs and node1.Figure 2 shows sample resource utilization for 146CS4 for one week.For the purpose of the experiment, SLA has been identified as 90%, 80% and 70% for the 146CS4, cs-planetlab3 and cs-planetlab4 servers and 80%,70% and 60% for Fobos, jupiter_cs and node1 severs for the CPU, memory and Inter VM bandwidth utilization respectively.Using the workload data described above, all algorithms mentioned in Table I have been applied and analyzed.TPR results are shown in Tables II and III respectively and final ranking is shown in Table IV.Friedman test was conducted to assess the statistical significance for the obtained results.Friedman test was chosen for multi classifier performance assessment due to it is non parametric nature and wide use in multi domain analysis [22].The null-hypothesis tested is defined as that all classifiers perform the same and the obtained differences are not significantly random.Algorithms will be ranked for each data set separately.Under the null-hypothesis, all the algorithms are equivalent and so their ranks R j should be calculated as [22]:   The Friedman statistic is distributed according to X F 2 with k−1 degrees of freedom.Level of significance at p < 0.05 is chosen.If the null-hypothesis is rejected, then we can proceed with a post-hoc test.The Nemenyi test is used to compare classifiers to each other.If the corresponding average ranks differ by at least the critical difference, the performance of two classifiers will be considered significantly different.In this work average TPR, Friedman Rank summations and average ranking are used as performance metrics.From Table IV we see that the tree based algorithms show better performance compared to other algorithms, whereas KNN shows the worst performance.Medium tree has average TPR of 94.63% with Friedman Rank summations of 13 and overall Friedman ranking of 2.166667.Now to measure statistical significance using Friedman test X F 2 was calculated as 154.66 and found to be larger than the critical value of p=23.68 which indicates the statistical significance of the obtained TPR values.Accordingly null hypothesis that tested classifiers have the same performance is rejected.Thus, Nemenyi test is used to pinpoint where the significance lies.q a is found to be 3.353 and based on two pair testing ,we failed to reject the null hypothesis between medium, simple and complex tree in addition to SVM Cubic, SVM Coars, SVM quad and KNN Fine.However, medium tree shows better performance in terms of average TPR, Friedman Rank summations and average Friedman ranking.In addition to that, medium tree algorithm results were consistent among all tested data.

V. CONCLUSION
In this work, machine learning based approach with modified kernel along with Friedman rank summation and average ranking have been used for dynamic live migration based on adaptive prediction of utilization thresholds.From the analysis of the proposed approach different classification techniques have been used to predict VM migration.It is shown that the regression trees have more accuracy compared to SVM and KNN.This approach can be used to manage SLA in virtualized cloud based data centers for critical applications like telecommunication ones, especially applications with strict SLA.Further analysis can be made on other dynamic consolidation problems such as VM placement following the approach presented in this paper.

1 {
our case binary) class labels y i .We use the binary matrix y ij є {0,1} to indicate whether or not the labels match.