Sentiment Aware Stock Price Forecasting using the SA-RNN-LBL Learning Model

−Stock market historical information is often utilized in technical analyses for identifying and evaluating patterns that could be utilized to achieve profits in trading. Although technical analysis utilizing various measures has been proven to be helpful for forecasting and predicting price trends, its utilization in formulating trading orders and rules in an automated system is complex due to the indeterminate nature of the rules. Moreover, it is hard to define a specific combination of technical measures that identify better trading rules and points, since stocks might be affected by different external factors. Thus, it is important to incorporate investors’ sentiments in forecasting operations, considering dynamically the varying stock behavior. This paper presents a sentiment aware stock forecasting model using a Log BiLinear (LBL) model for learning short term stock market sentiment patterns, and a Recurrent Neural Network (RNN) for learning long-term stock market sentiment patterns. The Sentiment Aware Stock Price Forecasting (SASPF) model achieves a much superior performance compared to standard deep learning based stock price forecasting models. Keywords-sentiment analysis; data mining; machine learning; social networking platform; stock price forecasting; time series

INTRODUCTION Analyzing and forecasting future stock prices has attracted much research interest. Success in stock trading totally relies on determining the right time to sell or buy stocks. However, predicting accurate stock prices is a very challenging task due to the congenitally noisy environment and the high volatility of price variations [1]. Stock price movement is related to a number of elements including government policies, organization's performance, inflation rates and economic environment. Technical analysis examines the variations of a stock through mathematical formulas and charts, called technical indicators, aiming to discover patterns that could be utilized for profitable gains in trading, and it is a usual approach in financial engineering for stock market prediction [1]. Despite the advances in technical analysis, it is difficult to accurately predict stock prices and trading points in financial markets, mostly because of the volatile investors' sentiments during uncertainties. Authors in [2] proclaimed that the stock market is very efficient and trading prices totally reflect the available data, which is referred as the Efficient Market Hypothesis (EMH), while fundamental or technical analysis would not lead to profit that is consistently over-average for investors. Nonetheless, not all researchers agree with EMH [3], as some conventional methods in financial economics proclaim that stock markets have anomalies [4][5]. Most studies use technical analysis, which includes older stock volumes and prices, in order to demonstrate trading profits and predict future stock market prices [6]. Research studies focused on discovering native parameters, mostly used web data such as Social Networking Services (SNS) [7], investor's sentiments [8], news articles [9], or search engines [10], and explained that search query frequencies and investors' sentiments from SNS platforms provide vital information for predicting upcoming stock prices, especially during global uncertainties such as the recent CoViD-19 pandemic. Initially, during December 2019 to March 2020 CoViD-19 cases were significantly high in the Republic of China affecting its stock market. Due to the rigid expenditure on rents, wages, and bank interests the stability of the banking system was affected and small-medium sized organizations were significantly impacted as the Chinese stock index nearly dropped by 8% on February 3 rd [11][12][13]. On the other side, Reliance Industries Limited (RIL) stock prices were continuously growing as Facebook and Google obtained certain percentage of its stake during the CoViD-19 pandemic.
During the previous years, researchers mostly focused on volatility modeling, evaluated as the standard deviation of an asset's return, showing the variability in Finance Time Series (FTS) data [14], which play a crucial role in effective stock forecasting. Extensive surveys were carried out showing limitations of the existing Machine Learning (ML) based stock forecasting models, based on random forest, neural networks, and support vector machines, as they proved inefficient in highly volatile market behaviors. Furthermore, in order to confront the dynamic nature of stock markets, a number of deep learning models such as RNNs with gated recurrent and long-short term memories, Convolutional Neural Networks (CNN), Deep Neural Networks (DNN), and General Adversary Networks (GAN) [15] have been presented. Neural networks and Reinforcement Learning (RL) methods are the most widely utilized ML methods [16,17]. Existing deep learning models have limitations, as although they are efficient in modeling long-term market sentiment behaviors they do not detect shortterm market sentiment behaviors. Modern ML methods have been adapted in various domains such as speech recognition, image classification, and natural language processing, and they are designed using either RNNs or CNNs [18]. Long Short Term Memory (LSTM) [19][20][21][22] networks outperform random forests and DNNs. In [23], an RL method was presented using RNN architecture for addressing gradient descent problems during the training process. Correspondingly, in [15] a forecasting error loss and direction prediction accuracy method was proposed, noting that Generative Adversalial (GA) training can be used to integrate the loss parameter for acquiring the long-term outcomes. The GAN-FD used CNN and LSTM [24][25][26] for predicting more accurately stock prices. Nonetheless, it failed to get a good learning tradeoff performance of modeling both short and long-term market behavioral contexts. Therefore, the GAN-FD suffers considerably when market's behaviors are high volatile in nature, while market's short-term and long term sentiment pattern is not considered [15].

II. THE SENTIMENT AWARE STOCK PRICE FORECASTING
MODEL (SASPF) The highly dynamic stock forecasting environment during the pandemic requires the modeling of both short and longterm sentiment behaviors of the market. To the best of our knowledge, no prior work has considered capturing market sentiment in dynamic manner for forecasting stock price. This paper presents a Sentiment Aware Stock Price Forecasting (SASPF) model combining short and long term sentiment features. At first, a sentiment analysis was carried out on review datasets, marked it as "-1" for negative, "0" for neutral and "1" for positive sentiment. This sentiment forecast data were added to the stock information datasets and trained using a sentiment aware stock forecasting model. The model was built utilizing both an RNN and an LBL. The RNN was used for detecting long-term market features, while the LBL was used for detecting short-term features. The main contributions of this paper are: • A sentiment aware RNN-LBL (SA-RNN-LBL) learning model is presented for forecasting stock prices considering dynamically varying stock sentiment.
• The SA-RNN-LBL model can capture both short-term and long-term market sentiment features more efficiently.
• The SA-RNN-LBL model can obtain multiple market sentiment behaviors within short-term context for current trading sessions. Thus, it can predict the stock value fluctuation time more efficiently and can be used for forecasting the price of short sells.
• The experimental outcome shows the proposed sentiment aware RNN-LBL based stock forecasting model attain better RMSRE and DPA performance than existing [15] deep learning based stock price forecasting model.
High-quality one-step forecasting provides significant information for risk assessment and management in trading environments, especially during pandemics. This work aims to forecast the price fluctuation of individual stocks or the market index one step ahead, using historical data and market sentiments derived from tweets.

A. Mathematical Formulation
This problem can be mathematically formalized as follows: ܺ ௧ represents a set of basic indicators including sentiment parameters and ܻ denotes the closing price of a stock for 1minute intervals at time ‫,ݐ‬ described as: Given the historical basic indicators, ܺ is described as: The closing price ܻ is described as: The goal is to predict the closing price ܻ ்ାଵ for the next minute considering the market sentiment factor. Lets consider a set of stock market and set of stock within stock market as follows: This work considers stock market environment, which is composed of following stock volatility behavior such as: Similarly, in the stock performance pattern there exist: behaviors. The final task is to predict the future price of a particular stock in a stock market using the sentiment aware stock price forecasting methodology.

B. Sentiment Analysis Tool for Predicting Market Sentiments from Tweet Data
The sentiment analysis model detects and marks market sentiments having positive (1), neutral (0), and negative (-1) polarity. However, using a state-of-art sentiment analyzer tool is bulky in nature and induces high computation overhead. For addressing this problem, a sentiment analysis method by employing Partial Text Entailment (PTE) was utilized. The PTE is used for grouping similar tweets by measuring the semantic similarities among different tweets. This aided in improving the speed of market sentiment prediction, meeting the real-time requirements of stock forecasting models.

C. Model's Description
This section presents the sentiment aware stock market pattern analysis model by learning both short and long-term behaviors using LBL and RNN respectively similar to the LSTM model. The RNN model is composed of a set of input layers, multiple hidden layers, and an output layer. The activation parameters of the hidden layers are obtained as: where, ࣻ ℓ आ ∈ ॺ depicts the hidden illustration of stock while the previous status is represented as: ࣱ ∈ ॺ . (10) where ࣞ can obtain stock's present volatility (i.e. behavior pattern) and ࣲ can propagate time series signals. Equation (8) is executed iteratively to obtain or compute the status of each time instance in a time series sequence.
The RNN is composed of multiple hidden layers. Hidden layer information is dynamic in nature with respect to stock market behavior sequence where the pattern is repetitive. Thus, the RNN faces problems in learning short-term patterns in a stock market behavior sequence. For addressing this limitation, an LBL was appended with a single linear hidden layer, and it can be considered as a deterministic model. Using LBL for stock market behavior forecasting, the absolute forecasting representation of time sequence is constructed based on stock market score/price input and TMs at each time instance. The next time instance is a linear forecasting, described as: where ࣞ ࣼ ∈ ॺ Ղ * Ղ depicts the TM for the respective time instance in a stock market score behavior sequence and Մ is the number of elements modeled in a time sequence. Each position in the time sequence is modeled with a precise TM. In general, LBL finds difficulties in efficiently learning long-term contexts in stock market behavior sequence. Furthermore, the market behaves dynamically depending on market sentiments (i.e. the moods of customers and investors). Thus, it is important to capture such information into stock price forecasting. Thus the proposed method considers sentiment aware matrices for obtaining feature sets of various aspects of sentiments. Thus, for the respective stock market आ at ℓ (i.e. hidden description) is computed as: where ࣨ ࣷ ℓషࣼ आ ∈ ॺ Ղ * Ղ are sentiment aware TMs with respect to the ࣼ ௧ stock of a stock market आ. The cold start problem is overcome by assuming ࣻ आ ൌ ‫ݒ‬ . It should be noted that if any change in sentiments is detected, the proposed method can obtain underlying feature sets of different sentiments exhibited in past sessions. However, the existing stock forecasting model in general doesn't consider session variance between features, playing a significant role in improving prediction accuracies. This happens because short-term market sentiment variance plays a significant part in future buying or selling of stocks compared to long-term market sentiment information. Thus, this work improves the previous RNN-LBL model by incorporating time based sentiment feature variance information, modeling a sentiment aware RNN-LBL model. The sentiment aware RNN-LBL stock market forecasting model is shown in Figure 1.
Considering the respective stock market ‫,ݒ‬ the position ݈ is established using: where ‫ݑ‬ ௩ represents current time, ‫ݑ‬ ିଵ ௩ represents time information of each stock of each layer of the model, and ܷ ௨ ೡ ି௨ షభ ೡ is the time-specific TM of sentiment session variation ‫ݑ‬ ିଵ ௩ െ ‫ݑ‬ ௩ . The time-specific TMs aid in collecting time specific sentiment pattern features with respect to recent tweet information. Considering these, (13) is updated as: where ݅ ௩ ൌ ‫ݒ‬ with respect to the initial status of the respective stock within the stock market environment. For obtaining the varying sentiment nature of stock market with respect to different time intervals, this work established sentiment aware TMs in the model, described by: Finally, the prediction of whether a stock ‫ݒ‬ will exhibit certain patterns or sentiments ܿ with respect to stock ‫ݓ‬ at successive position ݈ 1 is calculated by: where ऄ ℓ आ is the current sentiment of the stock market आ with respect to session instance ℓ, with static hidden and dynamic feature illustration being ‫ݒ‬ आ ∈ ॺ Ղ and ࣻ ℓ आ , respectively.

III. RESULTS AND DISCUSSION
This section presents the performance evaluation of the proposed SASPF model over other prediction models. The GAN-FD model [15] was chosen for comparison as it achieved much better results than existing LSTM based stock forecasting methods [21,[24][25][26][27][28], as investors' sentiments are considered as a major contributing parameter. The proposed SASPF model was implemented using python. The dataset for evaluating sentiments was obtained from public available sources [29][30][31]. This work evaluated performance using the same datasets used by GAN-FD [15]. Data were composed of 244 trading points ranging from January 1 to December 31 st 2016. More detailed description of the dataset used can be obtained from [15], while it can be downloaded from [32]. Direction Prediction Accuracy (DPA) and Root Mean Squared Relative Error (RMSRE) were used for performance evaluation. The 80% of the data was used for training, while the remaining 20% was used for testing. Experiments were conducted for different stocks considering time ‫,ݐ‬ a forecasting was performed for the next time interval ‫ݐ‬ + 1 using the proposed learning methods. The total time interval for the forecasting was ܶ . The actual stock value is ܻ ௧ and the forecasted value is ܻ ௧ .

A. RMSRE Performance Evaluation
This section evaluated the RMSRE's performance achieved by SASPF model over existing deep learning stock price forecasting methods. The RMSRE metric was computed as: A low RMSRE value indicates that forecasting is good (i.e. it defines it is as closer to the actual stock price). The RMSRE outcomes of the SASPF model and existing deep learning based stock price forecasting methods are shown in Table I. The graphical representations of RMSRE performance achieved by SASPF and existing deep learning based stock price forecasting methods are shown in Figure 3. In order to further evaluate the performance of the SASPF model, a performance evaluation was performed under highly dynamic and volatile environments. For evaluating such scenarios, this work considered the evaluation using the Reliance Jio communication stock and a corresponding tweet dataset obtained from Yahoo Finance [30] and Yahoo news [31] datasets, respectively. The stock price forecasting performance was evaluated between March 1 st and June 30 th 2020. The RMSRE performance achieved considering the sentiment index is shown in Figure 4. The results show that RNN-LBL and SASPF model achieves an RMSRE of 0.0085 and 0.0059, respectively. SASPF had improved forecasting performance by 30.59% over RNN-LBL considering stock market sentiments. Thus, SASPF can work really well under highly dynamic and volatile environments such as the CoViD-19 pandemic scenario.

B. DPA Performance Evaluation
This section evaluated the DPA performance achieved by the SASPF model over existing deep learning stock price forecasting methods. The higher value of DPA describes higher revenue for the investor. The DPA metric is computed as: where: The DPA outcomes of the SASPF and existing deep learning based stock price forecasting methods are shown in Table II, while the graphical representation of the results is shown in Figure 5.  [30,31]. Its stock price forecasting performance was evaluated with data between March 1 st and June 30 th of 2020. The DPA performance achieved considering the sentiment index is shown in Figure 6. The results show that the SASPF and RNN-LBL achieved DPAs of 0.0085 and 0.0059 respectively. The SASPF had better forecasting performance by 18.26% over the RNN-LBL when considering stock market sentiments. Thus, SASPF can work really well under highly dynamic and volatile environments such as the CoViD-19 pandemic scenario, providing better profits for an investor. Comparitive Analysis of SASPF and RNN-LBL models in terms of DPA.

IV. CONCLUSION
Predicting stock prices is a challenging task, especially when considering external fluctuating factors such as a pandemic situation, since stock markets are volatile in nature as user sentiments, behaviors, and patterns vary over time. During the pandemic situation some stocks such as oil price fell, and contrarily the stock prices of gold and mobile communication sector increased significantly. Thus, it is important to capture market sentiments in order to design a better forecasting model. Very few researches integrated market sentiments into stock prices forecasting models. In order to predict better profits, reduce forecasting errors, and achieve higher returns for short sell, this paper presented a sentiment aware stock price forecasting model. The SASPF divides the current session into multiple bins and compares them to establish stock behavior patterns. Results showed the SASPF improved forecasting performance by 18.26% over RNN-LBL when considering stock market sentiments. Thus, SASPF can work really well under highly dynamic and volatile environments such as the CoViD-19 pandemic scenario, providing better profits for investors. Experimental outcomes showed that the SASPF model achieved better RMSRE and DPA performance than existing deep learning based stock price forecasting methods. Future work should consider designing an efficient stock price forecasting framework combining both micro and macroeconomic data with different machine learning and deep learning methods.