Healthcare Analytics: A Comprehensive Review

-Big data have attracted significant attention in recent years, as their hidden potentials that can improve human life, especially when applied in healthcare. Big data is a reasonable collection of useful information allowing new breakthroughs or understandings. This paper reviews the use and effectiveness of data analytics in healthcare, examining secondary data sources such as books, journals, and other reputable publications between 2000 and 2020, utilizing a very strict strategy in keywords. Large scale data have been proven of great importance in healthcare, and therefore there is a need for advanced forms of data analytics, such as diagnostic data and descriptive analysis, for improving healthcare outcomes. The utilization of large-scale data can form the backbone of predictive analytics which is the baseline for future individual outcome prediction.

INTRODUCTION Data have become a topic of particular interest because of their high but often hidden potential [1]. Their importance has been proven in almost all aspects of life. Data give the ability to study the past and predict the future while developing the present. Although profit is not the primary motivator, healthcare organizations should consider acquiring knowledge for available techniques, tools, and infrastructure to leverage big data [2]. In most sectors data grow exponentially. Healthcare large scale data describe the large data volume [3] that overburdens a healthcare institution on a daily basis. Existing logical methods can be applied to the tremendous amount of existing patient-related well-being and clinical information, offering more profound comprehension and improving healthcare [4]. In real world scenarios, individual information could help each doctor in deciding the most appropriate treatment for a given patient. Healthcare has taken too long to utilize big data compared to other sectors. Healthcare big data activities include collecting, analyzing, holding of consumer, patient, physical, and clinical data that are too large and complex to be understood by the traditional data processing methods. An example of this big data processing is the use of complex and sophisticated sensors in healthcare. As a result, big data are analyzed and processed by machine learning algorithms and specialized data scientists [5]. Healthcare facilities require new methods to work out their recurrent challenges, especially concerning patients' diagnosis and treatment information. As a solution to these challenges that are related to data volume, velocity, variety, and veracity, healthcare facilities and organizations have to incorporate new technologies capable of data collection, storage, and information analysis to produce actionable insights [3]. Different situations call for the application of different data strategies. The idea is to use healthcare analytics to create actionable information that can be utilized for improving patient care and the efficiency of processes. The paper at hand examines previous studies and evaluates the utilization of descriptive, diagnostic, and predictive healthcare analytics.
II. HEALTHCARE ANALYTICS The utilization of data analytics is helpful to healthcare professionals, especially in terms of predicting, diagnosing, and treating diseases, improving the quality of services and minimizing healthcare costs [6]. It is estimated that the U.S. healthcare system saves more than $450 billion annually through data mining [7]. Many in-depth studies have examined the application of big data in mental health and other areas such as the methodological challenges of data mining perspectives [8]. This large volume of data is generated from fast-growing industries attracting clinicians' and scientists' attention [9], emerging the need for advanced forms of data analytics. Healthcare professionals could significantly benefit from these techniques, reducing considerable effort, resources, and time, smoothening the running. These methods could also satisfy professionals by offering them more valuable knowledge and information [10]. Big data analytics includes analysis, observation, influence, and prediction.

A. Observation
Noting and taking records is essential for generating critical values from large scale data, and a new scientific field called "data science" has evolved [11]. A critical factor is data visualization in a quite friendly way which is vital for the development of society. Management of these big data becomes challenging due to the organization and reorganization of the same data [12]. Therefore highly advanced techniques are often employed for efficient and costeffective data management. Healthcare services can be offered with analytics tools through proper storage [13].

B. Influence
Data collection is important for every company which depends in future predictions [14]. More information causes more deliveries. Gigabytes or terabytes are usually collected structurally in health monitoring toolkits, including sensors, structure, and data acquisition. Large data volumes are often caused by either the number of sensors or the monitoring data structure size [15]. The scope is given by structure, while sensors provide resolution.

C. Prediction
Prediction becomes more useful when knowledge can be transformed into action. Prediction and intervention need to be integrated into the same system in order to increase system's efficiency. This is a process of making future predictions based on historical data [16]. This analysis improves healthcare by improving decisions and personalizing care for each individual. Healthcare organizations use predictive analysis drawn from new data streams directly from patients [17].

D. Analysis
Analyzing big data has a workflow similar to what is described in [8]. A data warehouse is the storage of massive amounts of data produced from various sources. It uses analytic pipelines for smarter and affordable healthcare options. Modern techniques have evolved to aid in this, increasing the resolution in which biological events are observed and recorded. Therefore such big data analytics from medical healthcare systems can help providing new strategies [18].
III. DESCRIPTIVE HEALTH ANALYTICS Healthcare analytics are very important in medicine, as they offer insights into data, including hospital patient records, costs, and diagnoses, among others. Descriptive analytics is a sub-branch of healthcare analytics [19], aiming to analyze raw data and come up with viable solutions to help patients. Big data are firstly turned into actionable insights, mostly relying on public health and pharmacy among other factors [9]. Descriptive analytics include various methods, entailing stages or processes whose outcomes enable easy manipulation and utilization to figure out solutions in real life. Descriptive analytics use control data to categorize, characterize, aggregate, and classify data into valuable information for the healthcare professionals, in understanding and analyzing decisions, performance, and outcomes [20]. Besides, data from descriptive analytics are prepared using tables and graphs showing the average length of stay in hospitals [4], the rate of discharge, and the rates of hospital occupancy among other indicators [21]. Descriptive analytics is usually the easiest level of data to understand. Moreover, control data are utilized in categorization, characterization, aggregation, and classification of the data, transforming them into valuable information for healthcare professionals in the process of understanding and analyzing decisions, performance, and outcomes. Data visualization can answer specific questions related to patient care patterns, which ultimately highlight a broad perspective of evidence-based clinical practice [22]. As a result, real-time or near real-time data management has to be enabled in the process. Using this method, literature predicts that previously unnoticed observed patterns in terms of hospital readmission are enabled to enhance a balance between cost and capacity [14]. Descriptive analytics help because they connect previous decisions and their implications. It answers the question: "what happened?" giving a good basis of decision making.

A. Data Classification
Classification falls under supervised machine learning methods. Artificial neural networks consist one of the most common underlying structures exploited [23]. Training and validation sets are used whereas adequate output for given inputs is generated. This is the process of organizing healthcare data into relevant groups and categories that are easier to understand and analyze. This step is important as it allows easy data retrieval when needed. In healthcare applications the lack of data classification could result on much duplication complicating the tracking down of certain events and occurrences related to patients and health facilities. Data classification process in healthcare widely revolves around content, context, and users. Context refers to the location where it is applicable. User refers to the person who is using the data being classified. People in charge of the classification process have to take good care of the classification criteria, as wrong criteria can lead to wrong classifications. The categories should include information about the type of data included. Moreover, security and privacy considerations should be examined with rules for data retrieval, transmission, storage, and possible risks associated with security policy violations [24].

B. Data Mining Algorithms Algorithms
Heuristic-based data mining algorithms in hospitals can be helpful in the stipulation of the relationship between different symptoms and a particular disease. The same algorithms can be used to stipulate the relationships between cause and effects of human behaviors and diseases [24]. One of their main roles is to ensure the standardization of healthcare practices. Medical algorithms are evidence-based and data-driven [8].

C. Methods/Tools
The methods applied are regression, association, clustering, data warehousing, and sequential pattern mining. Data can be analyzed electronically or manually using traditional methods such as pen and paper through graphical methods [25]. This information is used by healthcare providers to shed light on facts such as occupancy rates and financial state of hospitals, number of in-and out-patients, discharge, and length of stay among other entities or variables of study. Other data analysis methods are utilized including statistical analysis, which features descriptive, inferential, and diagnostic analysis among others. In healthcare, analysis of data is performed via statistical analysis using appropriate tools [7]. Data from hospital financial records, number of patients, and staff can be inferred through statistical analysis of collected data. Current data analysis tools include SPSS, Matlab, SQL, Java, Weka, Rapidminer, R analytics suite, and Python Scikit-learn [7,20].

D. Problem Domain
Problem domain analytics involve the process of creating a model that describes the problem to be solved. In healthcare, this deals with modeling solutions and curative measures for different diseases [26]. Once there is a healthcare problem domain analysis for a given facility, this can be applied to multiple implementations at different periods as technology advances. Problem domain analysis is fully independent of the platform where the solution is implemented [4].

E. Data Visualization
Data visualization is the process of representing data graphically. The visual elements used in this stage are charts, graphs, and maps. These elements along with data visualization tools enable one to see and understand patterns, trends, and outliers of data. Data visualization is important for users in order to understand the enormous amounts of daily generated data quickly and effectively. In healthcare, where data are vital and the volumes of circulating data are huge, data visualization tools and technologies are essential to analyze information in order to enable data driven decisions. Efficient use of data visualization techniques in healthcare enhances usability, hence the provision of quality care [27]. This stage should be taken seriously into consideration, as the advantages or benefits of effective data visualization are many, and they lead to better decision-making, hence improvement of the health sector. Data visualization is incorporated in healthcare in an effort to answer specific questions or patterns of care, providing a broad view of evidence-based clinical practice. The activity of data analysis gives sufficient space for proper data management, use and study of operational content, and helps in capturing visual data and information including patient records. Henceforth, data previously not observed and patterns not studied before are brought into light via data visualization [28]. Differences brought up in patient records, financial records, and healthcare provision are captured and recorded through visualization tools. It is hard to imagine a healthcare institution that does not visualize data for its decision making [29].
V. DIAGNOSTIC HEALTH ANALYTICS Diagnostic health analytics is the second major health analytics method. This method is used in the analysis of healthcare organizations in order to transform them to fully data-driven institutions, offering competitive advantages. Diagnostic analysis answers the "why" in a phenomenon [30], aiming at finding the causes of events. Diagnostic analytics explain "why it happened" [30][31], such as "why did a patient got worse even with the best care". These answers enable healthcare providers to figure out problems and solutions. Diagnostic health analytics take a critical look on data, trying to understand why events and behaviors occur [31].

A. Data Discovery, Exploration, and Analysis
Exploration, discovery, and analysis of data are major processes involved in diagnostic health analytics. Discovery is bringing to light concealed structures and patterns [27]. This process is the act of collecting and examining data from different sources to gain insight from hidden patterns, trends, and outliers. It is the first step to fully exploit a health organization's data and inform on critical clinical decisions [14]. Through data discovery, data are gathered, combined, and analyzed in a sequence of steps. Exploration of data is the initial step in data analysis where healthcare providers explore an extensive data set in an unstructured manner to uncover original patterns and trends, characteristics, and points of interest [32]. During clinical research or undertaking, even during decision making by healthcare providers, patient or organizational records are analyzed to get an insight of previous patterns and behaviors. This process is not meant to reveal any patterns in detail, but rather to create a broad picture of important trends, as data exploration is extracting data without knowing what it is looking for [33]. Data exploration uses manual methods and automated tools, such as visualization tools. This process is vital in healthcare as it enables healthcare providers to cut down large sets of data to manageable sizes and focus on the efforts to optimize their analysis. Analysis of data follows just after the discovery and exploration of data to emerge viable solutions to a healthcare organization's problems [20].

ANALYTICS
These are the two major classifications of diagnostic analytics. Qualitative refers to interpreting meanings or processes, while quantitative aspect deals with numerals and statistical calculations of analysis [41]. Qualitative diagnostic analytics focuses on meanings rather than quantifiable phenomena. Qualitative diagnostic analytics puts more emphasis on the impact of the analysis on patients and healthcare providers as opposed to presumptions of the possibility of value-free inquiry [20]. Therefore, qualitative diagnostic analytics aims at gaining an understanding of underlying motivations and reasons, while uncovering dominant trends in thoughts and opinions. Quantitative diagnostic analytics, on the other hand, aim at quantifying data and generalizing results from a sample to the population of the study group [42].

A. Focus of Qualitative and Quantitative Methods
Qualitative research is widely used in research. It does not use statistics and mathematical operations to process data. Qualitative research focuses on using research methods to address phenomena by analyzing experiments [34][35]. Qualitative research focuses on the use of words or notes [36]. However, quantitative research uses methods, statistical processes, mathematics, and numerical data processing in a systematic experimental research of phenomena [37][38].

B. Data Collection
During the qualitative research, both primary and secondary data collection methods are used. Data could be collected from secondary sources such as social media platforms. LinkedIn was used in [39], where authors looked for healthcare professionals, in order to get relevant data from prominent practitioners in the medical field. The research focused on big data analysis on LinkedIn as well as the management and surveillance of the diseases, while primary data were collected from interviews. People with relevant sources of information were interviewed in order to extract first-hand information. Some questions were developed before the study, while other questions emerged during the study in order to adequately cover all relevant areas.

C. Objective
The primary objective of health analytics is to provide insights in predicting and detecting the outcome of a disease, such as the prediction of heart-related diseases, HIV, and cancer [39]. Moreover, it provides opportunities to health professionals for improvement in their field of operation, through the appropriate patient management service and their diagnosis. Patients are served in the right manner, following the most suitable procedures.

D. Analysis
Healthcare analytics provide the most reliable information for a given patient. It also provides a given healthcare unit the right track about how its resources are being used [40], boosting accountability. The system is also used in detecting people who are most vulnerable to be infected by a particular infection. As a result, healthcare professionals are able to specify the most appropriate procedures to prevent the spread of a specific contagious disease. Moreover healthcare analytics play a vital role in communication process within a particular healthcare center.

E. Findings
Big data strategies under healthcare analytics have major contribution to the research for new drugs [39]. The future of the health sector relies on big data strategies, since they enhance effectiveness, improve relations among the various stakeholders in a given healthcare unit, and help in reducing operational costs. Some studies utilizing both qualitative and quantitative methods are shown in Table I.

VII. CLASSIFICATON OF DIAGNOSTIC ANALYTICS
A. Data Classification Diagnostic data discovery aims at making noisy data clear for classification, free of inconsistencies, and easy to understand. Data discovery in healthcare is beneficial as it enables healthcare providers to gather actionable insights for better decision making, saves time, and makes data reusable. Data discovery consists of five steps: connecting and blending, cleansing and preparation, sharing, analysis and generation of insights, and visualization [24]. Data classification in diagnostic analytics aims on understanding the causes of events and behaviors.

B. Algorithms
Algorithms aim at answering why an event happened. Due to their changing nature, algorithms change very fast. Typically, algorithms utilize machine learning methods.

C. Methods/Tools
Statistical methods are used in diagnostic analytics, and data can be analyzed in the same manner as in descriptive analytics. Data can be analyzed electronically or manually using traditional methods [25]. This information is used by healthcare providers to shed light on entities or variables of study. In healthcare, data analysis is performed via statistical analysis using appropriate techniques [32].

D. Problem Domain
In diagnostic analytics, the problem domain analytics has to be elaborated more than in descriptive analytics. It also involves the process of creating a diagnosis model to describe the problems that are needed to be solved. In diagnosis, this deals with modeling solutions and curative measures for different diseases [32]. Once there is a healthcare problem domain analysis for a given facility, this can be applied to multiple implementations at different periods and as the technology advances. Problem domain analysis is wholly independent of the platform where the solution is implemented. Apart from descriptive analytics and diagnostics, there are predictive analytics and prescriptive analytics. Predictive analytics is used to predict the future values for identification of unknown events, while prescriptive analytics is used for providing the best solution [19].
VIII. PREDICTIVE HEALTH ANALYTICS Predictive analytics involve mining information out of raw data in order to predict future trends [43]. Data mining and predicting is increasing progressively in the medical field. Support systems based on different prediction frameworks have become key tools in disease diagn osis in order to ensure prevention and the quality of prediction for a possible disease [44]. Disease prediction involves the extraction of hidden information and forecasting of the course of a disease development from medical data. Many studies have been conducted to develop frameworks for predicting diseases based on machine learning techniques [44]. Predictive health analytics is therefore the process of getting this information in the health sector and processing it for exploiting in future predictions. Health is one of the key areas that predictive analytics has been utilized [43]. Predictive Analytics applies known parameters to produce a model that can predict results for a different set of data. Modeling gives this product as a prediction representing a probability of the variable being predicted based on estimates from input variables.
Most of the predictive modeling techniques in use include: decision trees, regression and neural networks. It has been noted that predictive analytics has been very useful in forecasting the pandemic of COVID-19 [43]. Healthcare institutions apply predictive analytics in various ways, such as in allocating resources based on previous trends, scheduling staff, gauging patient-risk to manage readmission costs, and manage pharmaceutical requirements. More than half of healthcare managers used predictive analytics in their organizations, and 57% of them believed that they were beneficial for their institutions [45]. In fact, 26% of them predicted saving over 25% of their total budget, as a result of applying such methods.

IX. DISCUSSION
This section discusses and compares descriptive and diagnostic health analytics. Diagnostic health analysis mostly answers the question "why something happened". Therefore, directed analysis and extensive exploration are required with the help of other tools, such as the use of visualization techniques that help uncover the genesis of problems, in order to help users to determine the extent to which these problems affect them [9]. For instance, it is possible to track down an increase in waiting times to provide several healthcare services, showing several factors that relate to patient, provider, or related organizational factors [8]. Descriptive analysis usually gives an overview of what happened. It often deals with the discovery and exploration of new information in a given dataset. It connects previous decisions and their implications providing the best basis for decision making. It is a branch of health analysis aiming to analyze data and come up with relevant solutions that can help in proper decision making about a given patient. Descriptive analysis quantifies events and transforms them into readable forms. In the health sector, descriptive analysis relies on clinical health support, public health, and pharmacy mental health among others, during the decision making process. On the other hand, diagnostic analysis enables any given healthcare unit to rely on data during the process of decision making, gaining competitive advantages over competing organizations. The diagnostic analysis is very essential since it gives detailed information about why certain things happened, for instance why a given patient situation worsened. This shows that diagnostic analysis explains the cause of events and behaviors, involving the discovery and exploration of any new data available. Both descriptive and diagnostic analyses are outstanding and very essential in the decision making process. Descriptive analysis relies on new information while the diagnostic analysis relies on already existing data. Consequently, the most reliable conclusion emerges from combining these two methods.

X. CONCLUSION
In general, healthcare analytics gives a deeper inspection of data in medical institutions. Generation and access of data has been made easier these days, since they are global and mobile. It is evident that big data offer the ability to capture and manage visual data and records. An effective application of health analytics and large scale data improves health results. The recommended descriptive and diagnostic analysis methods could be used by healthcare sectors in collecting and analyzing, patient and clinical data offering outstanding results and being essential in the decision-making process. Predictive analysis uses data in order to predict future trends. Descriptive analysis forms the easiest level of data by categorizing, characterizing, aggregating, and classifying them. Utilization of techniques such as data visualization in medical fields improves usability, which in turn enhances healthcare quality. Instead of replacing health professionals in order to advance medical facilities, the use of large scale data could form the backbone of predictive analytics, which is also the base line for future individual outcome prediction. Medical institutions should incorporate big data systems for improving health prediction and results.