AI-Based Approaches for Multi-Class Text Classification in Social Media: A Comparative Study

Oleg Gabrielyan; Mikhail Gasparyan; Ivan Kravchenko; Milada Krapivina

doi:10.48084/etasr.14303

Authors

Oleg Gabrielyan V. I. Vernadsky Crimean Federal University, Simferopol, Crimea
Mikhail Gasparyan V. I. Vernadsky Crimean Federal University, Simferopol, Crimea
Ivan Kravchenko V. I. Vernadsky Crimean Federal University, Simferopol, Crimea
Milada Krapivina V. I. Vernadsky Crimean Federal University, Simferopol, Crimea

Volume: 15 | Issue: 6 | Pages: 29833-29839 | December 2025 | https://doi.org/10.48084/etasr.14303

Received: 26 August 2025 | Revised: 29 September 2025 | Accepted: 9 October 2025 | Online: 22 October 2025

Corresponding author: Oleg Gabrielyan

Abstract

The increasing volume of unstructured textual data in social networks requires automated tools for efficient classification and monitoring. This study presents the design and evaluation of an AI-based system for multi-class text classification using a dataset collected from VKontakte. The dataset was annotated into several semantic categories of the word "hero," serving as a domain-specific case study for testing classification models under real-world constraints such as class imbalance, overlapping categories, and limited training samples. Three approaches were implemented and compared: a Long Short-Term Memory (LSTM) network, the transformer-based DeBERTa model, and an AutoML solution (LightAutoML). Experimental results show that DeBERTa achieves the best-balanced performance with a macro-F1 score of 0.32, while AutoML provides the highest raw accuracy (~65%) with lower resource requirements. LSTM demonstrated limited effectiveness due to the dataset size and complexity. Additional experiments with class balancing and refined labeling improved performance across underrepresented classes. The findings highlight the trade-off between model complexity, computational cost, and classification performance, and confirm the applicability of transformer-based architectures for text analysis in noisy and imbalanced environments. The proposed system can serve as a foundation for automated monitoring tools in social media and other real-world NLP applications.

Keywords:

natural language processing, text classification, machine learning, DeBERTa, LSTM, AutoML, social media monitoring

Downloads

Download data is not yet available.

References

M. I. Salih, S. M. Mohammed, A. K. Ibrahim, O. M. Ahmed, and L. M. Haji, "Fine-Tuning BERT for Automated News Classification," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22953–22959, Jun. 2025. DOI: https://doi.org/10.48084/etasr.10625

H. Boutouta, A. Lakhfif, F. Senator, and C. Mediani, "A Transformer-based Hybrid Model for Implicit Emotion Recognition in Arabic Text," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23834–23839, Jun. 2025. DOI: https://doi.org/10.48084/etasr.10261

O. M. Alyasiri and Y. N. Cheah, "Multi-Class Text Classification using Machine Learning Techniques," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22598–22604, Jun. 2025. DOI: https://doi.org/10.48084/etasr.9994

R. Kansal and C. Diwaker, "Efficiency Determination of Various Machine Learning Techniques for Sentiment Analysis on Social Media Platforms," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25584–25589, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11158

T. Cai and X. Zhang, "Imbalanced Text Sentiment Classification Based on Multi-Channel BLTCN-BLSTM Self-Attention," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2257. DOI: https://doi.org/10.3390/s23042257

M. Mujahid et al., "Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering," Journal of Big Data, vol. 11, no. 1, Jun. 2024, Art. no. 87. DOI: https://doi.org/10.1186/s40537-024-00943-4

M. Krapivina, "Russian hero context dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/miladakrap/russian-hero-context-dataset.

T. Widiyaningtyas, H. Hairani, D. D. Prasetya, U. Pujianto, and W. Caesarendra, "A Modified SMOTE with Noise Filtering and Manhattan Distance Metric Approach to Address Imbalanced Health Datasets," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25452–25459, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11925

R. Suguna, J. Suriya Prakash, H. Aditya Pai, T. R. Mahesh, V. Vinoth Kumar, and T. E. Yimer, "Mitigating class imbalance in churn prediction with ensemble methods and SMOTE," Scientific Reports, vol. 15, no. 1, May 2025, Art. no. 16256. DOI: https://doi.org/10.1038/s41598-025-01031-0

A. V. Kolmogorova, A. A. Kalinin, and A. V. Malikova, "Linguistic Principles and Computational Linguistics Methods for the Purposes of Sentiment Analysis of Russian Texts," Current Issues in Philology and Pedagogical Linguistics, no. 1, pp. 139–148, 2018. DOI: https://doi.org/10.29025/2079-6021-2018-1(29)-139-148

V. D. Oliseenko, M. V. Abramov, and A. L. Tulupyev, "Neural networks with lstm and gru in application to the task of multiclass classification of text posts of social network users," Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, no. 4, pp. 130–141, Dec. 2021.

K. V. Lagutina, "Classification of Russian Texts by Genres Based on Modern Embeddings and Rhythm," Modeling and Analysis of Information Systems, vol. 29, no. 4, pp. 334–347, Dec. 2022. DOI: https://doi.org/10.18255/1818-1015-2022-4-334-347

Z. Ayan, B. Alimjan, M. Olga, Z. Timur, and Z. Toktalyk, "Quality of service management in telecommunication network using machine learning technique," Indonesian Journal of Electrical Engineering and Computer Science, vol. 32, no. 2, Nov. 2023, Art. no. 1022. DOI: https://doi.org/10.11591/ijeecs.v32.i2.pp1022-1030

N. Kunicina, K. Rakhimova, J. Caiko, and M. Mansurova, "Adaptive Multidimensional E-Learning Systems," in 2025 IEEE 12th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), Vilnius, Lithuania, May 2025, pp. 1–6. DOI: https://doi.org/10.1109/AIEEE66149.2025.11050765

B. Zhao et al., "Design and Optimization of an Internet of Things-Based Cloud Platform for Autonomous Agricultural Machinery Using Narrowband Internet of Things and 5G Dual-Channel Communication," Electronics, vol. 14, no. 8, Jan. 2025, Art. no. 1672. DOI: https://doi.org/10.3390/electronics14081672

A. Solochshenko, A. Baikenov, V. Tikhvinskiy, and J. Caiko, "Research of Self – Organizing Networks (SON) Algorithms Efficiency Applying on Fourth – Generation Mobile Networks," Transport and Telecommunication Journal, vol. 22, no. 4, pp. 444–452, Nov. 2021. DOI: https://doi.org/10.2478/ttj-2021-0034

M. Li, Q. Gao, and T. Yu, "Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters," BMC Cancer, vol. 23, no. 1, Aug. 2023, Art. no. 799. DOI: https://doi.org/10.1186/s12885-023-11325-z

W. L. Seow, I. Chaturvedi, A. Hogarth, R. Mao, and E. Cambria, "A review of named entity recognition: from learning methods to modelling paradigms and tasks," Artificial Intelligence Review, vol. 58, no. 10, Jul. 2025, Art. no. 315. DOI: https://doi.org/10.1007/s10462-025-11321-8

N. A. Helal, A. Hassan, N. L. Badr, and Y. M. Afify, "A contextual-based approach for sarcasm detection," Scientific Reports, vol. 14, no. 1, Jul. 2024, Art. no. 15415. DOI: https://doi.org/10.1038/s41598-024-65217-8

M. Krapivina, "Russian hero context dataset (balanced)." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/miladakrap/russian-hero-context-dataset-balanced.