Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques

Authors

  • Wedad Q. A. Saif Faculty of Engineering and Information Technology, Taiz University, Taiz, Yemen
  • Majid Khalaf Alshammari School of Educational Studies, Universiti Sains Malaysia, 11800 Penang, Malaysia
  • Badiea Abdulkarem Mohammed College of Computer Science and Engineering, University of Ha'il, Ha'il 81481, Saudi Arabia
  • Amer A. Sallam Faculty of Engineering and Information Technology, Taiz University, Taiz, Yemen
Volume: 14 | Issue: 5 | Pages: 16471-16477 | October 2024 | https://doi.org/10.48084/etasr.7806

Abstract

The digital age has resulted in a massive increase in the amount of available textual data, including articles, comments, texts, and updates on social networks. The value of analyzing such a large volume of data extends to many other industries and applications, as it provides important insights into the perspectives of customers, strategic decision-making, and market demands. Detecting emotions in texts faces challenges due to linguistic patterns and cultural nuances. This study proposes a system capable of accurately identifying emotions expressed in text using a variety of machine learning models, including logistic regression, extra randomized tree, voting, SGD, and LinearSVC. It also employs different feature extraction techniques, such as TF-IDF, Bag-of-Words, and N-grams, comparing their performance in these models. An evaluation was carried out using two English emotion datasets, namely ISEAR and AIT-2018, using F1 score, accuracy, recall, and precision. The findings demonstrate the ability and effectiveness of the system to detect emotions conveyed within texts. The LinearSVC model with N-grams achieved the highest accuracy of 88.63% on the ISEAR dataset, while the extra randomized tree classifier with N-grams achieved 89.14% accuracy on the AIT-2018 dataset. Furthermore, the SGD model with TF-IDF achieved 88.18% and 84.54% accuracy on the ISEAR and the AIT-2018 datasets, respectively.

Keywords:

emotion detection, textual data, machine learning, feature extraction, data encoding

Downloads

Download data is not yet available.

References

A. R. Abas, I. Elhenawy, M. Zidan, and M. Othman, "BERT-CNN: A Deep Learning Model for Detecting Emotions from Text," Computers, Materials & Continua, vol. 71, no. 2, pp. 2943–2961, 2022.

B. A. Mohammed et al., "Hybrid Techniques of Analyzing MRI Images for Early Diagnosis of Brain Tumours Based on Hybrid Features," Processes, vol. 11, no. 1, Jan. 2023, Art. no. 212.

R. Piryani, D. Madhavi, and V. K. Singh, "Analytical mapping of opinion mining and sentiment analysis research during 2000–2015," Information Processing & Management, vol. 53, no. 1, pp. 122–150, Jan. 2017.

O. Oueslati, E. Cambria, M. B. HajHmida, and H. Ounelli, "A review of sentiment analysis research in Arabic language," Future Generation Computer Systems, vol. 112, pp. 408–430, Nov. 2020.

J. Guo, "Deep learning approach to text analysis for human emotion detection from big data," Journal of Intelligent Systems, vol. 31, no. 1, pp. 113–126, Jan. 2022.

A. Athar, "Sentiment analysis of scientific citations," University of Cambridge, Computer Laboratory, UCAM-CL-TR-856, 2014.

A. M. Abubakar, D. Gupta, and S. Palaniswamy, "Explainable Emotion Recognition from Tweets using Deep Learning and Word Embedding Models," in 2022 IEEE 19th India Council International Conference (INDICON), Kochi, India, Nov. 2022, pp. 1–6.

F. M. Alotaibi, "Classifying Text-Based Emotions Using Logistic Regression," VAWKUM Transactions on Computer Sciences, vol. 7, no. 1, pp. 31–37, Apr. 2019.

F. Mozafari and H. Tahayori, "Emotion Detection by Using Similarity Techniques," in 2019 7th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Bojnord, Iran, Jan. 2019, pp. 1–5.

B. Gaind, V. Syal, and S. Padgalwar, "Emotion Detection and Analysis on Social Media." arXiv, Jun. 12, 2019.

H. Raza, M. Faizan, A. Hamza, A. Mushtaq, and N. Akhtar, "Scientific Text Sentiment Analysis using Machine Learning Techniques," International Journal of Advanced Computer Science and Applications, vol. 10, no. 12, pp. 157–165, 2019.

F. M. Shah, A. S. Reyadh, A. I. Shaafi, S. Ahmed, and F. T. Sithil, "Emotion Detection from Tweets using AIT-2018 Dataset," in 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), Dhaka, Bangladesh, Sep. 2019, pp. 575–580.

A. F. A. Nasir et al., "Text-based emotion prediction system using machine learning approach," IOP Conference Series: Materials Science and Engineering, vol. 769, no. 1, Oct. 2020, Art. no. 012022.

D. Seal, U. K. Roy, and R. Basak, "Sentence-Level Emotion Detection from Text Based on Semantic Rules," in Information and Communication Technology for Sustainable Development, 2020, pp. 423–430.

M. Karna, D. S. Juliet, and R. C. Joy, "Deep learning based Text Emotion Recognition for Chatbot applications," in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Tirunelveli, India, Jun. 2020, pp. 988–993.

A. I. Saad, "Opinion Mining on US Airline Twitter Data Using Machine Learning Techniques," in 2020 16th International Computer Engineering Conference (ICENCO), Cairo, Egypt, Dec. 2020, pp. 59–63.

M. Suhasini and B. Srinivasu, "Emotion Detection Framework for Twitter Data Using Supervised Classifiers," in Data Engineering and Communication Technology, 2020, pp. 565–576.

D. Kher and K. Passi, "Multi-label Emotion Classification using Machine Learning and Deep Learning Methods," in Proceedings of the 18th International Conference on Web Information Systems and Technologies, Valletta, Malta, 2022, pp. 128–135.

A. Chowanda, R. Sutoyo, Meiliana, and S. Tanachutiwat, "Exploring Text-based Emotions Recognition Machine Learning Techniques on Social Media Conversation," Procedia Computer Science, vol. 179, pp. 821–828, Jan. 2021.

M. Krommyda, A. Rigos, K. Bouklas, and A. Amditis, "An Experimental Analysis of Data Annotation Methodologies for Emotion Detection in Short Text Posted on Social Media," Informatics, vol. 8, no. 1, Mar. 2021, Art. no. 19.

S. K. Bharti et al., "Text-Based Emotion Recognition Using Deep Learning Approach," Computational Intelligence and Neuroscience, vol. 2022, no. 1, 2022, Art. no. 2645381.

L. Khan, A. Amjad, K. M. Afaq, and H. T. Chang, "Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media," Applied Sciences, vol. 12, no. 5, Jan. 2022, Art. no. 2694.

M. M. Rahman and S. Shova, "Emotion Detection From Social Media Posts." arXiv, Feb. 11, 2023.

R. Ramanda and M. Affandes, "Emotion Classification Using Support Vector Machine," Appissode: Application, Information System and Software Development Journal, vol. 1, no. 1, pp. 15–19, Dec. 2023.

M. Dai, "Machine Learning Based Sentiment Analysis of Message on Twitter," Highlights in Science, Engineering and Technology, vol. 38, pp. 942–948, Mar. 2023.

H. G. Wallbott and K. R. Scherer, "How universal and specific is emotional experience? Evidence from 27 countries on five continents," Social Science Information, vol. 25, no. 4, pp. 763–795, Dec. 1986.

S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, "SemEval-2018 Task 1: Affect in Tweets," in Proceedings of the 12th International Workshop on Semantic Evaluation, New Orleans, LA, USA, Mar. 2018, pp. 1–17.

H. E. Wynne and Z. Z. Wint, "Content Based Fake News Detection Using N-Gram Models," in Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services, Munich, Germany, Dec. 2019, pp. 669–673.

T. Joachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization," ICML, vol. 97, pp. 143–151, 1997.

W. A. Qader, M. M. Ameen, and B. I. Ahmed, "An Overview of Bag of Words; Importance, Implementation, Applications, and Challenges," in 2019 International Engineering Conference (IEC), Erbil, Iraq, Jun. 2019, pp. 200–204.

M. A. Kausar, S. O. Fageeri, and A. Soosaimanickam, "Sentiment Classification based on Machine Learning Approaches in Amazon Product Reviews," Engineering, Technology & Applied Science Research, vol. 13, no. 3, pp. 10849–10855, Jun. 2023.

M. Maalouf, "Logistic regression in data analysis: an overview," International Journal of Data Analysis Techniques and Strategies, vol. 3, no. 3, pp. 281–299, Jan. 2011.

P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, vol. 63, no. 1, pp. 3–42, Apr. 2006.

Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations." arXiv, Feb. 08, 2020.

C. Y. Suen and L. Lam, "Multiple Classifier Combination Methodologies for Different Output Levels," in Multiple Classifier Systems, Cagliari, Italy, 2000, pp. 52–66.

B. Parhami, "Voting algorithms," IEEE Transactions on Reliability, vol. 43, no. 4, pp. 617–629, Sep. 1994.

A. Özçift, "Medical sentiment analysis based on soft voting ensemble algorithm," Yönetim Bilişim Sistemleri Dergisi, vol. 6, no. 1, pp. 42–50, Jun. 2020.

L. Bottou, "Stochastic Gradient Descent Tricks," in Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K.-R. Müller, Eds. Springer, 2012, pp. 421–436.

T. Gunasekaran and S. Kumar, "Data Classification Using Support Vector Machine," Journal of Theoretical and Applied Information Technology, vol. 12, no. 1.

H. Bhavsar and M. H. Panchal, "A Review on Support Vector Machine for Data Classification," International Journal of Advanced Research in Computer Engineering & Technology, vol. 1, no. 10, pp. 185–189, Dec. 2012.

Downloads

How to Cite

[1]
Saif, W.Q.A., Alshammari, M.K., Mohammed, B.A. and Sallam, A.A. 2024. Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques. Engineering, Technology & Applied Science Research. 14, 5 (Oct. 2024), 16471–16477. DOI:https://doi.org/10.48084/etasr.7806.

Metrics

Abstract Views: 410
PDF Downloads: 284

Metrics Information