Arabic Sentiment Analysis on Chewing Khat Leaves using Machine Learning and Ensemble Methods

-Sentiment analysis plays an important role in obtaining speakers' opinions or feelings towards events, products, topics, or services, helping businesses to improve their products. Moreover, governments and organizations investigate and solve current social issues by analyzing perspectives and feelings. This study evaluated the habit of chewing Khat (qat) leaves among the Yemeni society. Chewing Khat plant leaves, is a common habit in Yemen and East Africa. This paper proposes a model to detect information about the Khat chewing habit, how people explore it, and the preference for Khat leaves among Arabic people. A dataset consisting of user comments on 18 youtube videos was prepared through several natural language processing techniques. Several experiments were conducted using six machine learning classifiers and four ensemble methods. Support Vector Machine and Linear Regression had almost 80% accuracy, whereas xgboot was the most accurate ensemble method reaching 77%.


INTRODUCTION
Nowadays, the study of user opinions has attracted substantial attention in social perspectives, focusing on services, products, and habits through many data mining applications, recommender systems, and business intelligence applications. The analysis and interpretation of user opinions are altogether known as sentiment analysis, which is an area of natural language processing, also kwon as the voice of the customer in business intelligence [1,2]. Business owners need to be aware of feedback to improve future performance. Such a time-consuming and difficult task is used in the analysis of huge unstructured data gathered through social media or internet comments. Several studies have been conducted, classifying sentiments as positive, negative, or neutral [3,4,6]. More complex sentiment analysis [7][8][9][10], often referred to as fine-grained, classify datasets into five classes, namely very positive, positive, neutral, negative, and very negative. Moreover, aspect-based sentiment analysis [11][12][13][14] classifies datasets by extracting entities from text.
Users' comments are often an outcome of their opinions, and they can be considered as the main factor in evaluating services or products. Some studies focused on education [15,16], while other researches focused on detecting health misinformation on social media users [18,19]. Khat is a type of plant that pleases and stimulates, and chewing Khat leaves is a commonly seen habit in Yemen and East Africa [21][22][23]. Although it is customary in these countries, several experiments showed its direct impact on human organs. This paper presents a model to study consumers' opinions regarding the habit of chewing Khat leaves in Yemeni and East African society. At first, the dataset was collected by extracting user comments from 18 youtube videos. The annotation process classified the data into positive and negative fractions. Several NLP processes were executed to prepare the data for Machine Learning Classifiers (MLCs) and Ensemble Methods (EMs).

II. RELATED WORKS
Several studies utilized sentiment analysis in different ways. Multilingual student comments, obtained through student feedback, were used to evaluate online courses' effectiveness and teachers' performance in [3,[15][16][17]. In [3], the dataset was collected using approximately 4000 student comments through surveys conducted on 25 university courses to evaluate the performance of a professor who had been teaching for 10 years, while the sentiment analysis was directed including positive, negative, and eight more emotions. Similarly, authors in [15] proposed a system to evaluate a lecturer's performance by collecting data through student surveys via a rating system in a form of numerical data. The MLC Naïve Bays was employed to predicate the positive and negative students' sentiments toward the lectures. A recurrent neural network of long and short term memory in deep learning was utilized in [16]. The dataset was collected from 3000 positive, negative, and neutral student comments on 30 courses. The performance improved when using the softmax activation function, reaching 89%, 99%, and 90% during training, testing, and validation, respectively. Deep learning was applied on a course evaluation dataset with 3000 student comments using three predefined classes in [17], while the results showed that relu and softmax performed better. Sentiment analysis is used to identify the main factors affecting the success of businesses, particularly start-ups. In [1], user comments were extracted from Twitter using topic modeling and applying supervised vector machine learning to divide comments into three main classes. The textual analysis was applied based on the entities trained in the previous phase using Nvivo software. In [4], an analysis of a massive amount of user comments (approximately 1.6 million) from the Yelp Challenge Dataset was conducted. The dataset was divided to 20% for testing and 80% for training, using four machine learning classifiers. The best accuracy rate reached 92.6% and 92.3% under Stochastic Gradient Descent and Linear Support Vector Classification respectively. Similarly, the same dataset was utilized in [2] to analyze restaurant reviews through a hybrid classifier ensemble method using Naïve Bayes, Support Vector Machines, and Genetic Algorithms.
Some health sector studies have also been conducted [18,19]. Authors in [18] focused on tweets on breast cancer, collecting user comments from approximately 845 cancer patient accounts with 48,000 posts. The logistic regression classifier and a Convolutional Neural Network was utilized in the process, and the model's performance accuracy was 97.6%. Besides, it was found that positive experiences had more shares, providing more awareness to the general public. Descriptive statistics of text mining and topic modeling were utilized in [19]. Unstructured data from 3 million news articles on Reuters assisted in identifying the 10 major health issues published in news articles from 2007 to 2017. On the contrary, the analysis of user reviews on mobile health applications was prioritized in [8], collecting data from 104 mobile health applications with approximately 88,125 user reviews. The data were categorized based on each comment's functionality (such as usability, content, customer support, and ethics), the polarity concept was divided into three classes, and five machine classifiers were applied. The best accuracy was recorded at 89.42% through Stochastic Gradient Descent.

III. METHODS
This section describes the main model phases, as shown in Figure 1. There are four phases: data acquisition, preprocessing, machine learning classifiers, and model evaluation. Model architecture.

A. Phase 1: Data Acquisition
The dataset was collected using Python 3.8 programming language and YouTube API (googleapiclient package), for information extracting from 18 videos related to chewing Khat. The criteria for selecting videos were: published date between 2015-2020, more than 50K views, more than 10K likes, and focus on Arabic speakers. Moreover, some keywords were used to locate the videos, such as Khat, Khat is dangerous, and disadvantages of Khat. The main attributes for the extracted video information were: commenter_id, commenter_name, comment, video_id, number of views, number of likes, and date. Table I shows the dataset description and the minimum and maximum length of user comments. The initial step of data preprocessing was carried out, removing English or duplicate comments. The next step, data annotation, was a manual process conducted with the assistance of three annotators that were Ph.D. holders, Arabic native speakers, and computer science specialists. Data annotation classified the comments into negative and positive. Some unrelated, unclear, or ambiguous comments were removed. If two annotators classified comments as either positive or negative then comments were considered respectively, otherwise, the comments were removed.

B. Phase 2: Pre-processing
The natural language pre-processing steps were: data cleaning, tokenization, normalization of Arabic words, lemmatization, deletion of special characters, and removal of repeating characters. Then, the annotation was performed by three annotators into positive and negative. These preprocessing steps increased accuracy by removing "TSHKEEL", "TATWEEL", and "HAMZAH" using Python 3.6 and a package called "tashaphyne".
IV. RESULTS AND DISCUSSION This section presents the experiments and the results. Two experiments were conducted on the dataset: classic MLCs and EMs. In both experiments, the dataset was divided into 70% for training and 30% for testing, while 5-fold cross-validation was applied.

A. Classic Machine Learning Classifiers
As mentioned above, six classic MLCs were used. The ngram with Unigram, Bigram, and Trigram was used with all six classifiers to examine their performance. Table II Table IV shows the MLCs performance using trigram. The highest accuracy was 79.51% using SVM, whereas the lowest was noted again for KNN (65.12%). Figure 2 depicts the overall MLCs results for Unigram, Bigram, and Trigram. Although SVM had the highest accuracy, it was followed closely by both LR and SGD at almost 80%. NB's and DT's accuracies were near 70%, whereas KNN was less accurate.

B. Ensemble Methods
The four mentioned above common methods were used. Table IV shows the accuracy of these EMs. The highest accuracy was recorded for XG, while the lowest was noted for GB. Figure 3 demonstrates the accuracy of the DT classifier using Unigram, Bigram, and Trigram compared to RF. As it can be noted, RF outperformed DT. V. CONCLUSION This paper presented a study on users' opinions on chewing Khat in Yemen and East Africa, using a dataset collected from YouTube comments. Several natural language processing steps were carried on the dataset to get the best performance using classifiers. Classic MLCs and Ems were applied. The best performance in terms of accuracy was recorded when using SVM, followed by Linear Regression. The best accuracy using EMs was recorded for XG.