Statistical Modeling via Bootstrapping and Weighted Techniques Based on Variances An Oral Health Case Study

Multiple logistic regression is a methodology of handling dependent variables with a binary outcome. This method is becoming increasingly widespread as a statistical technique that represents a discrete probability model. Many studies have focused on the application but less on the methodology building. This study aims to provide an applied method for multiple logistic regression which is called modified Bayesian logistic regression modeling as an alternative technique for logistic regression analysis that focuses on a combination of the bootstrap method using SAS macro and weighted techniques based on variances using SAS algorithm. Data on oral cancer were applied to illustrate a real scenario of oral health data. This data will be applied to the multiple logistic regression algorithm and modified Bayesian logistic regression. Results from both cases are strongly supported by clinical studies. Through the proposed algorithm, the researcher will have an option whether to analyze the data with the usual or an alternative method. Final results indicate that the modified procedure can provide more efficient results especially for the case which involves statistical inferences. Keywords-multiple logistic regression; bootstrap; Bayesian and weighted techniques


INTRODUCTION
The logistic regression, analyzes the relationship between multiple independent variables and categorical dependent variables [1].The multiple logistic response functions is exp( ) The multiple logistic regression models can, therefore, be stated as follows: Y i are independent Bernoulli random variables with expected values E{Y i }=π i where: . The X observations are considered to be constants.Alternatively, if the X variables are random   i E Y is viewed as a conditional mean, given the values of X i1 +X i2 +…+X i,p-1 .

A. Bootstrapping, Weighted Techniques and Bayesian
Approach with SAS Authors in [2] introduced the bootstrap method which emphasizes on an empirical density function (EDF).The basic concept of bootstrap is that it is initiated with an original sample which is taken from the studied population.The second step is to copy the original sample a number of times in order to create a pseudo-population.Then, it draws several samples considering random sampling approach thus providing a new comprehensive sample from the original sample.It stores the new set of data from the original dataset and creates a new distribution for further analysis [2,3].The Bayesian analysis involves the posterior distribution.In the stage of Bayesian estimation procedures, the posterior distribution will play an important role especially in statistical inferential.While running the analysis the summary statistics for the posterior distribution samples are produced by default.The SAS statements of OUTPOST provide an option that saves the samples in the SAS data set for further processing.PROC GENMOD procedure fits generalized linear models with Bayesian methods (considering Bayesian estimation procedures) with a normal error term [4].In SAS programming procedure, the SEED option is to maintain reproducibility.By default, the uniform prior is a flat prior with a distribution that reflects ignorance of the location of the parameter.Its placing an equal likelihood for all possible values which regression coefficients can take.PROC GENMOD also produces convergence diagnostics where ODS Graphics is enabled in SAS statements which provides a section of assessing Markov chain convergence diagnostics and their interpretation [5].
Weighting is a very important technique which involves adjusting data to reflect dissimilarities in the number of population units that represent each respondent [6].There are several techniques that can be used as a weighted method like weighted by mean, standard deviation or variances to apply to the model according to the sample population of interest.Moreover, a weighted technique allows assigning different weights to the different cases in data analysis.The aim of the weighted method is to correct the skewness and to make the sample more representative of a true population.

II. METHODOLOGY: ALGORITHM BUILDING AND RESULTS
We used secondary oral medical data which involved 23 oral cancer patients from Universiti Sains Malaysia (USM).The selected variables are nerve invasion (nerv_inv), gender (gen), betel quid (bet), tumour site (tum_site) and tumour size (tum_size).To explore the underlying association between nerve invasion and the selected explanatory variables, a set of the regression model is fitted in this section.Let us define the following dichotomous variables for the model: Y ij =0 has not nerve invasion and Y ij =1 has nerve invasion.The proposed model is given in equation form as follows: Then the logistic regression model for (1) is given as: Then we obtain (2) as follows: The estimated model for our case is given in (2).Before we apply the equation there are two main steps needed to be done which are bootstrapping and weight data.

A. Multiple Logistic Regression Modeling on Nerve Invasion
 Cancer cell data should be entered as follows in SAS algorithm to calculate the multiple logistic regressions.. Data cancer; input Nerv_inv Gen Bet Tum_site Tum_size ; run; ods rtf file='abc.rtf'style=journal;  Run the analysis using multiple logistic regression.Below is the syntax of multiple logistic regression./* Run The Logistic Regression Through Proc Logistic*/ ods graphics on; proc logistic descending data=cancer; model Nerv_inv(event='1') = Gen Bet Tum_site Tum_size / rsquare expb lackfit; roc 'Gen Bet Tum_site Tum_size ' Gen Bet Tum_site Tum_size ; run; ods graphics off; ods rtf close;

B. Results: Using Multiple Logistic Regression Modeling on Nerve Invasion
In Figure 1 and Table I the results of using the above syntax are shown.None of the variables in the list is significant.The area under the ROC curve is 0.70556.The model can accurately discriminate 71.0% of the cases (it significantly discriminates more than half of the cases).

C. Modified Multiple Bayesian Logistic Regression on Nerve Invasion
 Adding bootstrapping to the calculation.
The following syntax calculates the data using a bootstrap method and prints them out:

Comparison o Modified Mul
Table III sh thods, multipl yesian logisti thod's result pear to be mo nificant.This pported in [7] asion were fou sk zone of the el quid indepe s the same wit related to the nificant associ ue.This show ults by fixing And second gression:   There is a need to explore more on the methodology improvement in order to optimize the gained output.This could include a higher level of a combination of theoretical, methodology building and computation which may lead to higher precision and accuracy of the results.
 Performance measurements can be taken into consideration when measuring quality of the recommended algorithms.
This knowledge will empower researchers and serve as a roadmap to improve future studies.

TABLE III .
results, it is necessary to have a good way of calculation with some improvement of the proposed strategy.The approached method can have a better predicting result in future for the decision making.In this paper, the algorithms have a good potential to determine the potential factors that lead to oral cancer.Some recommendations are raised in the following study findings: