Multi-Class Text Classification using Machine Learning Techniques
Received: 20 December 2024 | Revised: 20 February 2025, 5 March 2025, and 11 March 2025 | Accepted: 17 March 2025 | Online: 28 March 2025
Corresponding author: Yu-N Cheah
Abstract
The exponential growth of the World Wide Web has led to an overwhelming flood of information from diverse sources. This stream in data underscores the critical need for automated Text Classification (TC) to effectively manage, organize, and facilitate information discovery. TC plays a pivotal role in various real-world applications, spanning society, academia, government, and industry, as it eliminates the reliance on manual data classification, which is both costly and time-intensive. Machine learning models have emerged as key enablers, enhancing TC, prediction accuracy, and efficiency. However, existing models often struggle with multi-class imbalanced TC, where uneven class distributions lead to biased predictions and suboptimal model performance. This issue is further compounded by the lack of comprehensive evaluations on diverse datasets, making it challenging to determine the most effective model under imbalanced conditions. To tackle these challenges, this study systematically evaluates five widely recognized supervised machine learning algorithms: Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), K-Nearest Neighbor (KNN), Decision Tree (DT), and Logistic Regression (LR) across 19 benchmark datasets. Based on the average performance across F1-score, Classification Accuracy, and statistical significance tests, LR achieved the highest rank, closely followed by SVM and MNB. In contrast, KNN and DT demonstrated comparatively inadequate performance.
Keywords:
text classification, text categorization, multi-class dataset, machine learning modelsDownloads
References
O. M. Alyasiri, Y.-N. Cheah, A. K. Abasi, and O. M. Al-Janabi, "Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review," IEEE Access, vol. 10, pp. 39833–39852, 2022.
V. Dogra et al., "A Complete Process of Text Classification System Using State-of-the-Art NLP Models," Computational Intelligence and Neuroscience, vol. 2022, pp. 1–26, Jun. 2022.
W. Q. A. Saif, M. K. Alshammari, B. A. Mohammed, and A. A. Sallam, "Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16471–16477, Oct. 2024.
A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaševičius, "Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review," Algorithms, vol. 16, no. 5, Apr. 2023, Art. no. 236.
O. M. Alyasiri, Y.-N. Cheah, H. Zhang, O. M. Al-Janabi, and A. K. Abasi, "Text classification based on optimization feature selection methods: a review and future directions," Multimedia Tools and Applications, Jul. 2024.
A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, "A Survey on Text Classification Algorithms: From Text to Predictions," Information, vol. 13, no. 2, Feb. 2022, Art. no. 83.
A. Wahdan, M. Al-Emran, and K. Shaalan, "A systematic review of Arabic text classification: areas, applications, and future directions," Soft Computing, vol. 28, no. 2, pp. 1545–1566, Jan. 2024.
O. M. Alyasiri, Y.-N. Cheah, and A. K. Abasi, "Hybrid Filter-Wrapper Text Feature Selection Technique for Text Classification," in 2021 International Conference on Communication & Information Technology (ICICT), Basrah, Iraq, Jun. 2021, pp. 80–86.
S. Anitha, E. Kavi Varshini, N. Haritha Mahalakshmi, and S. Jishnu, "Optimizing Multi-Class Text Classification Models for Imbalanced News Data," in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, Jun. 2024, pp. 1–6.
R. Li, M. Liu, D. Xu, J. Gao, F. Wu, and L. Zhu, "A Review of Machine Learning Algorithms for Text Classification," in Cyber Security, vol. 1506, Eds. Singapore: Springer Nature Singapore, 2022, pp. 226–234.
M. Alloghani, D. Al-Jumeily, J. Mustafina, A. Hussain, and A. J. Aljaaf, "A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science," in Supervised and Unsupervised Learning for Data Science, Eds. Cham: Springer International Publishing, 2020, pp. 3–21.
A. Ali and W. K. Mashwani, "A Supervised Machine Learning Algorithms: Applications, Challenges, and Recommendations," Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, vol. 60, no. 4, Dec. 2023.
S. Naeem, A. Ali, S. Anam, and M. M. Ahmed, "An Unsupervised Machine Learning Algorithms: Comprehensive Review," International Journal of Computing and Digital Systems, vol. 13, no. 1, pp. 911–921, Apr. 2023.
M. Asif, A. A. Nagra, M. B. Ahmad, and K. Masood, "Feature Selection Empowered by Self-Inertia Weight Adaptive Particle Swarm Optimization for Text Classification," Applied Artificial Intelligence, vol. 36, no. 1, Dec. 2022, Art. no. 2004345.
P. Grover and S. Chawla, "Text Feature Space Optimization Using Artificial Bee Colony," in Soft Computing for Problem Solving, vol. 1057, Eds. Singapore: Springer Singapore, 2020, pp. 691–703.
R. Janani and S. Vijayarani, "Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection," in Advances in Electrical and Computer Technologies, vol. 672, Eds. Singapore: Springer Singapore, 2020, pp. 527–539.
R. Joseph Manoj, M. D. Anto Praveena, and K. Vijayakumar, "An ACO–ANN based feature selection algorithm for big data," Cluster Computing, vol. 22, no. S2, pp. 3953–3960, Mar. 2019.
A. Singh and A. Kumar, "Text document classification using a hybrid approach of ACOGA for feature selection," International Journal of Advanced Intelligence Paradigms, vol. 20, no. 1-2, 2021, Art. no. 158.
B. Mahesh, "Machine Learning Algorithms - A Review," International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381–386, Jan. 2020.
M. N. Ashtiani and B. Raahemi, "News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review," Expert Systems with Applications, vol. 217, May 2023, Art. no 119509.
Q. Li, S. Li, S. Zhang, J. Hu, and J. Hu, "A Review of Text Corpus-Based Tourism Big Data Mining," Applied Sciences, vol. 9, no. 16, Aug. 2019, Art. no. 3300.
A. Salau, N. Agwu Nwojo, M. Mahamat Boukar, and O. Usen, "Advancing Preauthorization Task in Healthcare: An Application of Deep Active Incremental Learning for Medical Text Classification," Engineering, Technology & Applied Science Research, vol. 13, no. 6, pp. 12205–12210, Dec. 2023.
S. G. Tesfagergish, R. Damaševičius, and J. Kapočiūtė-Dzikienė, "Deep Fake Recognition in Tweets Using Text Augmentation, Word Embeddings and Deep Learning," in Computational Science and Its Applications – ICCSA 2021, vol. 12954, Eds. Cham: Springer International Publishing, 2021, pp. 523–538.
M. N. Asim, M. U. Ghani, M. A. Ibrahim, W. Mahmood, A. Dengel, and S. Ahmed, "Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification," Neural Computing and Applications, vol. 33, no. 11, pp. 5437–5469, Jun. 2021.
X. Luo, "Efficient English text classification using selected Machine Learning Techniques," Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, Jun. 2021.
N. Aljedani, R. Alotaibi, and M. Taileb, "HMATC: Hierarchical multi-label Arabic text classification model using machine learning," Egyptian Informatics Journal, vol. 22, no. 3, pp. 225–237, Sep. 2021.
X. Liu et al., "Adapting Feature Selection Algorithms for the Classification of Chinese Texts," Systems, vol. 11, no. 9, Sep. 2023, Art. no. 483.
M. F. Ibrahim, M. A. Alhakeem, and N. A. Fadhil, "Evaluation of Naïve Bayes Classification in Arabic Short Text Classification," Al-Mustansiriyah Journal of Science, vol. 32, no. 4, pp. 42–50, Nov. 2021.
M. F. Ibrahim and A. Al-Taei, "Title-Based Document Classification for Arabic Theses and Dissertations," in Advances in Data and Information Sciences, vol. 318, Eds. Singapore: Springer Singapore, 2022, pp. 189–203.
H. Alshammary, M. F. Ibrahim, and H. A. Hussein, "Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification," InfoTech Spectrum: Iraqi Journal of Data Science, vol. 1, no. 1, pp. 42–54, Jun. 2024.
Q. Li et al., "A Survey on Text Classification: From Traditional to Deep Learning," ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–41, Apr. 2022.
M. Thangaraj and M. Sivakami, "Text Classification Techniques: A Literature Review," Interdisciplinary Journal of Information, Knowledge, and Management, vol. 13, pp. 117–135, 2018.
I. H. Sarker, "Machine Learning: Algorithms, Real-World Applications and Research Directions," SN Computer Science, vol. 2, no. 3, May 2021, Art. no. 160.
S. U. Hassan, J. Ahamed, and K. Ahmad, "Analytics of machine learning-based algorithms for text classification," Sustainable Operations and Computers, vol. 3, pp. 238–248, 2022.
S. Joshi and E. Abdelfattah, "Multi-Class Text Classification Using Machine Learning Models for Online Drug Reviews," in 2021 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, May 2021, pp. 0262–0267.
C. M. Suneera and J. Prakash, "Performance Analysis of Machine Learning and Deep Learning Models for Text Classification," in 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India, Dec. 2020, pp. 1–6.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, "A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification," Augmented Human Research, vol. 5, no. 1, Dec. 2020, Art. no. 12.
S. Szeghalmy and A. Fazekas, "A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2333.
19MclassTextWc dataset, 2006, G. Forman. [Online]. Available: https://sourceforge.net/projects/weka/files/datasets/text-datasets/19MclassTextWc.zip/download.
R. G. Rossi, R. M. Marcacini, and S. O. Rezende, "Benchmarking text collections for classification and clustering tasks," Institute of Mathematics and Computer Sciences, Nov. 2013.
E. H. Han and G. Karypis, "Centroid-Based Document Classification: Analysis and Experimental Results," in Principles of Data Mining and Knowledge Discovery, vol. 1910, Springer Berlin Heidelberg, 2000, pp. 424–431.
D. G. Pereira, A. Afonso, and F. M. Medeiros, "Overview of Friedman’s Test and Post-hoc Analysis," Communications in Statistics - Simulation and Computation, vol. 44, no. 10, pp. 2636–2653, Nov. 2015.
J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," The Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.
Downloads
How to Cite
License
Copyright (c) 2025 Osamah Mohammed Alyasiri, Yu-N. Cheah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.