AI-Based Approaches for Multi-Class Text Classification in Social Media: A Comparative Study
Received: 26 August 2025 | Revised: 29 September 2025 | Accepted: 9 October 2025 | Online: 22 October 2025
Corresponding author: Oleg Gabrielyan
Abstract
The increasing volume of unstructured textual data in social networks requires automated tools for efficient classification and monitoring. This study presents the design and evaluation of an AI-based system for multi-class text classification using a dataset collected from VKontakte. The dataset was annotated into several semantic categories of the word "hero," serving as a domain-specific case study for testing classification models under real-world constraints such as class imbalance, overlapping categories, and limited training samples. Three approaches were implemented and compared: a Long Short-Term Memory (LSTM) network, the transformer-based DeBERTa model, and an AutoML solution (LightAutoML). Experimental results show that DeBERTa achieves the best-balanced performance with a macro-F1 score of 0.32, while AutoML provides the highest raw accuracy (~65%) with lower resource requirements. LSTM demonstrated limited effectiveness due to the dataset size and complexity. Additional experiments with class balancing and refined labeling improved performance across underrepresented classes. The findings highlight the trade-off between model complexity, computational cost, and classification performance, and confirm the applicability of transformer-based architectures for text analysis in noisy and imbalanced environments. The proposed system can serve as a foundation for automated monitoring tools in social media and other real-world NLP applications.
Keywords:
natural language processing, text classification, machine learning, DeBERTa, LSTM, AutoML, social media monitoringDownloads
References
M. I. Salih, S. M. Mohammed, A. K. Ibrahim, O. M. Ahmed, and L. M. Haji, "Fine-Tuning BERT for Automated News Classification," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22953–22959, Jun. 2025. DOI: https://doi.org/10.48084/etasr.10625
H. Boutouta, A. Lakhfif, F. Senator, and C. Mediani, "A Transformer-based Hybrid Model for Implicit Emotion Recognition in Arabic Text," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23834–23839, Jun. 2025. DOI: https://doi.org/10.48084/etasr.10261
O. M. Alyasiri and Y. N. Cheah, "Multi-Class Text Classification using Machine Learning Techniques," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22598–22604, Jun. 2025. DOI: https://doi.org/10.48084/etasr.9994
R. Kansal and C. Diwaker, "Efficiency Determination of Various Machine Learning Techniques for Sentiment Analysis on Social Media Platforms," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25584–25589, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11158
T. Cai and X. Zhang, "Imbalanced Text Sentiment Classification Based on Multi-Channel BLTCN-BLSTM Self-Attention," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2257. DOI: https://doi.org/10.3390/s23042257
M. Mujahid et al., "Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering," Journal of Big Data, vol. 11, no. 1, Jun. 2024, Art. no. 87. DOI: https://doi.org/10.1186/s40537-024-00943-4
M. Krapivina, "Russian hero context dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/miladakrap/russian-hero-context-dataset.
T. Widiyaningtyas, H. Hairani, D. D. Prasetya, U. Pujianto, and W. Caesarendra, "A Modified SMOTE with Noise Filtering and Manhattan Distance Metric Approach to Address Imbalanced Health Datasets," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25452–25459, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11925
R. Suguna, J. Suriya Prakash, H. Aditya Pai, T. R. Mahesh, V. Vinoth Kumar, and T. E. Yimer, "Mitigating class imbalance in churn prediction with ensemble methods and SMOTE," Scientific Reports, vol. 15, no. 1, May 2025, Art. no. 16256. DOI: https://doi.org/10.1038/s41598-025-01031-0
A. V. Kolmogorova, A. A. Kalinin, and A. V. Malikova, "Linguistic Principles and Computational Linguistics Methods for the Purposes of Sentiment Analysis of Russian Texts," Current Issues in Philology and Pedagogical Linguistics, no. 1, pp. 139–148, 2018. DOI: https://doi.org/10.29025/2079-6021-2018-1(29)-139-148
V. D. Oliseenko, M. V. Abramov, and A. L. Tulupyev, "Neural networks with lstm and gru in application to the task of multiclass classification of text posts of social network users," Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, no. 4, pp. 130–141, Dec. 2021.
K. V. Lagutina, "Classification of Russian Texts by Genres Based on Modern Embeddings and Rhythm," Modeling and Analysis of Information Systems, vol. 29, no. 4, pp. 334–347, Dec. 2022. DOI: https://doi.org/10.18255/1818-1015-2022-4-334-347
Z. Ayan, B. Alimjan, M. Olga, Z. Timur, and Z. Toktalyk, "Quality of service management in telecommunication network using machine learning technique," Indonesian Journal of Electrical Engineering and Computer Science, vol. 32, no. 2, Nov. 2023, Art. no. 1022. DOI: https://doi.org/10.11591/ijeecs.v32.i2.pp1022-1030
N. Kunicina, K. Rakhimova, J. Caiko, and M. Mansurova, "Adaptive Multidimensional E-Learning Systems," in 2025 IEEE 12th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), Vilnius, Lithuania, May 2025, pp. 1–6. DOI: https://doi.org/10.1109/AIEEE66149.2025.11050765
B. Zhao et al., "Design and Optimization of an Internet of Things-Based Cloud Platform for Autonomous Agricultural Machinery Using Narrowband Internet of Things and 5G Dual-Channel Communication," Electronics, vol. 14, no. 8, Jan. 2025, Art. no. 1672. DOI: https://doi.org/10.3390/electronics14081672
A. Solochshenko, A. Baikenov, V. Tikhvinskiy, and J. Caiko, "Research of Self – Organizing Networks (SON) Algorithms Efficiency Applying on Fourth – Generation Mobile Networks," Transport and Telecommunication Journal, vol. 22, no. 4, pp. 444–452, Nov. 2021. DOI: https://doi.org/10.2478/ttj-2021-0034
M. Li, Q. Gao, and T. Yu, "Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters," BMC Cancer, vol. 23, no. 1, Aug. 2023, Art. no. 799. DOI: https://doi.org/10.1186/s12885-023-11325-z
W. L. Seow, I. Chaturvedi, A. Hogarth, R. Mao, and E. Cambria, "A review of named entity recognition: from learning methods to modelling paradigms and tasks," Artificial Intelligence Review, vol. 58, no. 10, Jul. 2025, Art. no. 315. DOI: https://doi.org/10.1007/s10462-025-11321-8
N. A. Helal, A. Hassan, N. L. Badr, and Y. M. Afify, "A contextual-based approach for sarcasm detection," Scientific Reports, vol. 14, no. 1, Jul. 2024, Art. no. 15415. DOI: https://doi.org/10.1038/s41598-024-65217-8
M. Krapivina, "Russian hero context dataset (balanced)." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/miladakrap/russian-hero-context-dataset-balanced.
Downloads
How to Cite
License
Copyright (c) 2025 Oleg Gabrielyan, Mikhail Gasparyan, Ivan Kravchenko, Milada Krapivina

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
