Software Risk Analysis with the use of Classification Techniques: A Review

—Risk analysis and management is a critical aspect of the software development process. Various risks are associated with every phase of the software development lifecycle. The early identification of risks in each phase of software development coupled with mitigating plans can help to reduce the cost of the product and increase software quality. This study aims to explore various tools and techniques used in the literature of analyzing and managing risks. Most risk analysis techniques have been applied in the requirement analysis phase, so there is a scarcity of tools supporting automated risk analysis. Accommodating various types of risk factors to predict the software risks reduces the accuracy of the classifier.

INTRODUCTION A software risk is a threat that could have a negative effect on software quality, delay the project or exceed the budget. It is necessary to identify various types of risk to a software project during the early phases of Software Development Life Cycle (SDLC). Software risk poses serious threats to the software quality as the desired functionality may be compromised. The specific areas of SDLC which are affected by risks are software development, testing, and maintenance phases. Software quality can be ensured by forecasting the risks that an existing software project can face. For this purpose, we use defect prevention, defect reduction, and defect containment. Defect prediction is necessary to forecast the presence of defects in software modules. For defect prediction, we use historical data of the past projects and assess how many software defects are likely to present in the current projects. The purpose of this study is to find the merits and the demerits of the different classification techniques that have been applied for defect prediction. A critical review showing the usefulness of the various classification techniques on different types of software project datasets has been conducted in this report.
Developing an efficient software fault/risk prediction method is a highly demanding challenge. Software fault/risk prediction plays an important role in analyzing software quality and balancing software development cost. Early and accurate prediction of software risk has become a critical issue to project success. Many researchers have proposed different prediction approaches such as an Artificial Neural Networks, Fuzzy Logic, Decision Tree, Bayesian Network (BN), etc. to overcome the problem of risk in software development. As the demand for good software quality is increasing, increased software size and complexity may lead to increasing software faults which require identification at the early stages of SDLC.

II.
LITERATURE REVIEW Authors in [1] address the way risks affect different software development efforts. The study explores the causes of different risks with the use of a BN. The study combines expert knowledge and V-structure discovery algorithm to generate a BN that performs causality analysis to manage software project-related risks and to improve prediction accuracy. Causality analysis is more useful than correlated analysis as the later approach does not help in effective risk planning. The objective of the study is to combine risk analysis and risk control for effective software risk management. Authors employ expert knowledge to build a Bayesian network in order to identify risks. The study is focused on three critical risk factors, namely inadequate requirements, lack of user cooperation, and poor project planning. However, 27 equally important risk factors were identified and Bayesian Network is efficient only for small datasets. Involving many variables makes it more complex and different to understand.
The way risks affect the analysis and design stages of SDLC is addressed in [2], in which a methodology called State-Based Risk Assessment approach is explored, which is used to estimate risks for different states of a component and estimates the risk for the overall scenario on the basis of the entire component. The study introduced an inter-component State Dependence Graph that is used to estimate the complexity and severity of risk assessment. At first, complexity estimates, for the state of the component within the system, are made. Then, the severity for the component within the scenario is decided using the three hazard technique: functional failure analysis, software failure mode and effect analysis, and software fault tree analysis. The objective of the study is to estimate the overall system risks that are based on the scenario risk and the State Collaboration Test Model of the System (SCOTEM). Authors in [3] address how risk analysis and planning are difficult to manage effectively. The authors explore an empirical integration of intelligent decision support model for software project risk analysis and planning. The study has two main contributions: An Integration Framework for Intelligent Software Risk Planning (IF-ISPRP) and for minimizing the impact of project risks and a method called MMAKD (Many to Many Actionable Knowledge Discovery) for complex risk planning. Both the framework and the method are integrated to support effectively control software project-related risks. The objective of the study is to combine risk analysis and risk planning modules to control risks. The study assumes that different risk actions can be executed simultaneously where practically this is not possible as risks are managed in some order.
Authors in [4] explore different types and categories of risk and their effects on the software project. The authors elaborate on different aspects and characteristics of risk systematically. They also suggest a software tool for risk assessment for supporting better and quicker decisions. The study combines different types of risk and categories of risk with their characteristics. A tool called RISK is used for determining how an event occurs and to evaluate what consequences it can have on project performance. The study explores risk analysis, focusing on quantitative risk analysis, for making a better decision of what kind of risk has occurred. The objective of the study is to identify different types of risks and categorize them. The way project cost and duration are making the budget estimation difficult for software risk management when trying to develop a high-quality software system is addressed in [5]. The study improves software process risk management by using the software process model with risk management and cost. The study proposes a measurement model that combines process risk management and trustworthiness metrics. The study suggests two metrics, one for software risk management performance and another for trustworthiness that help the project manager to control risks for every software development process. The objective of the study is to improve software risk management, enhancing trustworthiness by avoiding cost overrun and delays in software development and also, to use cost control modules that help improving the software risk management process.
Authors in [6] explored the field of risk management using a knowledge management conceptual framework, called Knowledge-Based Risk Management for identifying how knowledge management helps improving effectiveness. Timely identification of risk helps improving the overall implementation and the quality of the product. Handling risks early in software development helps to reduce the cost of the product while minimizing delays. The study suggests integrating knowledge management activities for risk management and for this purpose, a framework is proposed. Risk occurrence at the early stages of software development and its effect on the success of the final product is explored in [7]. To achieve quality software products in the shortest time, the Goal-Driven Software Development Risk Management Model (GSRM) model is presented that is used to identify and analyze goals, risk, and treatment actions at the early stages of the SDLC. The study presents a layer-based modeling framework that supports software development risk management. It performs activities by using suitable tasks, methods and techniques. The framework is categorized into five dimensions. The objective of the study is to explore risk at early SDLC by using a method called Goal-Driven Software Development Risk Management (GSRM), that performs a treatment action at the early stages of the process, particularly in the requirement phase.
The way faults affect software quality and cost is addressed in [8]. The study combines fuzzy measures to generate concepts connected to the fuzzy integral to achieve better prediction performance of the software module. The objective of the study is to minimize cost and improve the software development process by early prediction of fault-prone software. The authors employ object-oriented metrics and method level metrics for constructing a model that establishes the relationship between software metrics and fault-prones and does not require expert knowledge. Using fuzzy integral for software fault prediction provides significant advantages due to its ability. The major limitation of the proposed approach is that it is computationally extensive and building a model using this technique requires a lot of effort.
Author in [9] addresses how risky software projects affect the overall project success and software risk results, schedule and cost overload. The study explores accuracy evaluation criteria and performance charts to predict whether the project is risky by using a three-layered neural network with back propagation. The study proposes an algorithm which can understand complex patterns with a three-layered neural network architecture. The purpose of using complex datasets is to build a predictive model for the risky nature of projects. The objective of the study is to control risk at an early stage of the software development and minimize cost and time of software construction, while improving the performance of the project. Authors in [10] explore different classification approaches for the purpose of software defect prediction. The objective of the study is to create a cost-sensitive Artificial Neural Network (ANN) by using the ABC algorithm for efficient software defect prediction. The study suggests the Expected Cost of Misclassification (ECM) technique for converting an ANN into a cost-sensitive learner. The study utilizes N-fold crossvalidation technique to determine the performance of the proposed classifier. Authors divide the performance of classifiers into two subsections, regardless and regarding cost sensitivity. Both metrics are used for comparison with other studies. The study proposes a classifier for the software defect prediction problem.
Authors in [11] explore how the estimation of risk at the early stages of software development affects reliability and software cost. The study suggests a new fuzzy rule base algorithm for the validation of the proposed fuzzy rule base model while data of 20 actual software projects have been used. The objective of the study is to predict faults at the early stages of SDLC, i.e. requirement analysis phase or design phase for developing highly desirable software with optimal cost. Authors in [12] address how risk and potential failure affect systems, products, and processes. The limitation of the proposed approach is that it involves both subjective and objective analysis.
Author in [13] explores how the fault proneness affects software quality. A logistic regression method and six machine learning methods (Decision Tree, Support Vector Machine, Group Method of Data Handling Method, Artificial Neural Network, Cascade Correlation Network, and Gene Expression Programming) are analyzed and compared in order to find the relationship between static code metrics and fault proneness. The objective of the study is to develop good quality software by using machine learning methods for analyzing and predicting faults in the early phases of SDLC. The author compared the predictive facility of the models by using Area Under Curve (AUC) that measures Receiver Operating Characteristic (ROC). The results of LR and Machine Learning (ML) methods were compared in the use of two datasets. As a result, the ML method was found to perform better than LR.
Authors in [14] address the way defects affect software maintenance, accuracy, trustworthiness, and evolution. The study explores a defect prediction classification model to reduce the defects that affect the performance of a project or product. This study finds an extension of the ordinal association to relational association rules that predict whether the software module is or not defective and performs an experiment to evaluate the proposed classification model, which is based on relational association rule mining and compares it with other software detection approaches. The objective is to build defect-free high-quality software by identifying and predicting defects.
Authors in [15] address how a two-way decision method causes high misclassification error rates and costs. The study suggests three-way decisions to overcome the problem of twoway decision that includes two kinds of actions based on software defect prediction. In each experiment, 10-fold crossvalidation has been performed to show that the proposed method reduces errors and cost. Authors in [16] proposed the TGR model to define risk features in requirement engineering, based on the original Tropos and risk analysis methodologies. The approach encourages the development of numerous costbased, risk-based and goal-based solutions. This analysis could find the best solution that meets the demands of lower costs and risks. The study also implemented a prototype application to prove the concept. The application takes as input various goals, events and responses and produces various combinations of solutions to enable end-users to select the best one.
Risk management in Open Source Software (OSS) is addressed in [17]. One of the key technologies for achieving shorter time-to-market and better quality of the software system is the reuse of software components from third-party vendors. Such modules, also known as Off-the-Shelf components, come in two types: Commercial-Off-The-Shelf and OSS components. To make an effective use of OSS components, it is important to find how to change the development processes and methods.
The study suggests many measures that may be implemented for process improvement and risk reduction depending on the Open-Source Software components, though not all are necessary for a project. Machine learning techniques are employed to build a model predicting potentially faulty modules with respect to their metric data within a given set of software modules [18]. To that end, the Naive Bayesian Classification is used as authors have argued that in many complex real-world situations, Naive Bayesian performs better than techniques such as decision trees. The data sets used in the experiments are organized into two categories, i.e. learning datasets and prediction datasets. Risk factors are analyzed using the proposed model to enhance the risk assessment process.
Risk management has also been used in project management. The main aim of the authors in [19] was to understand the principles of risk assessment in order to establish a model for risk management of IT projects. The proposed conceptual model is based on a synthesis of the PMBOK guide and the risk management framework of PMI by incorporating different areas of expertise extracted from these manuals. Integrating these principles allows the early start of the risk management phase in the project by including key stakeholders in the lifecycle, regularly assessing project threats during the project lifecycle, and creating risk reduction strategies to better match projects with the company's strategy. The effects of uncertainty on the project and the risk event as a consequence of uncertainty are analyzed in [20]. The uncertainty index is proposed as a quantitative measure for assessing project uncertainty. This is done by using entropy as an indicator of system disorder and lack of information. Using this index, the uncertainty of each activity and its increase due to risk effects and changes in project uncertainty as a function of time can be assessed. The proposed solution is implemented and analyzed as a case study. The results can be useful for project managers and other stakeholders in selecting the most effective methods for controlling uncertainty and risk management.

III. CRITICAL EVALUATION
A critical analysis of selected studies is provided in Table I. IV. CONCLUSION In this study, we reviewed the latest research pertaining to software risk assessment, software fault prediction, and risk management. The aim of the study was to discover various classifiers that have been tried for defect prediction. Commonly used classification techniques are ANN, ABC algorithm, fuzzy integral, fuzzy hybrid TOPSIS approach, BN, fuzzy rule base model, etc. For classification, various types of software project datasets have been used. It is worth mentioning that some researchers combined risk analysis and risk control mechanisms for effective software risk management