Multi-Class Text Classification using Machine Learning Techniques

Osamah Mohammed Alyasiri; Yu-N Cheah

doi:10.48084/etasr.9994

Authors

Osamah Mohammed Alyasiri School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia | Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala, Iraq https://orcid.org/0000-0002-2345-2443
Yu-N Cheah School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia https://orcid.org/0000-0002-5644-9427

Volume: 15 | Issue: 3 | Pages: 22598-22604 | June 2025 | https://doi.org/10.48084/etasr.9994

Received: 20 December 2024 | Revised: 20 February 2025, 5 March 2025, and 11 March 2025 | Accepted: 17 March 2025 | Online: 28 March 2025

Corresponding author: Yu-N Cheah

Abstract

The exponential growth of the World Wide Web has led to an overwhelming flood of information from diverse sources. This stream in data underscores the critical need for automated Text Classification (TC) to effectively manage, organize, and facilitate information discovery. TC plays a pivotal role in various real-world applications, spanning society, academia, government, and industry, as it eliminates the reliance on manual data classification, which is both costly and time-intensive. Machine learning models have emerged as key enablers, enhancing TC, prediction accuracy, and efficiency. However, existing models often struggle with multi-class imbalanced TC, where uneven class distributions lead to biased predictions and suboptimal model performance. This issue is further compounded by the lack of comprehensive evaluations on diverse datasets, making it challenging to determine the most effective model under imbalanced conditions. To tackle these challenges, this study systematically evaluates five widely recognized supervised machine learning algorithms: Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), K-Nearest Neighbor (KNN), Decision Tree (DT), and Logistic Regression (LR) across 19 benchmark datasets. Based on the average performance across F1-score, Classification Accuracy, and statistical significance tests, LR achieved the highest rank, closely followed by SVM and MNB. In contrast, KNN and DT demonstrated comparatively inadequate performance.

Keywords:

text classification, text categorization, multi-class dataset, machine learning models

Downloads

Download data is not yet available.

Author Biographies

Osamah Mohammed Alyasiri, School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia | Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala, Iraq

OSAMAH MOHAMMED ALYASIRI received the B.Sc. degree in computer science from Mustansiriyah University, Iraq, in 2009, and the M.Sc. degree in computer science from Dr. Babasaheb Ambedkar Marathwada University (Dr. BAMU), India, in 2013. He is currently pursuing a Ph.D. degree with the School of Computer Sciences, Universiti Sains Malaysia. He is also a Lecturer at Al-Furat Al-Awsat Technical University, Karbala Technical Institute, Department of Computer Network and Software Techniques, Iraq. His research interests include Artificial Intelligence, Text Mining, Text Classification, Information Retrieval, Machine Learning, Optimization, Pattern Recognition, Feature Selection, and AI Chatbots.

Yu-N Cheah, School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia

Yu-N Cheah received his B.Comp.Sc. (Hons.) and Ph.D. degrees from Universiti Sains Malaysia in 1998 and 2002, respectively. He is cur- rently an associate professor at the School of Computer Sciences, Universiti Sains Malaysia. His research interests include sentiment analysis, semantic technologies, knowledge management, intelligent systems, and health informatics.

References

O. M. Alyasiri, Y.-N. Cheah, A. K. Abasi, and O. M. Al-Janabi, "Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review," IEEE Access, vol. 10, pp. 39833–39852, 2022.

V. Dogra et al., "A Complete Process of Text Classification System Using State-of-the-Art NLP Models," Computational Intelligence and Neuroscience, vol. 2022, pp. 1–26, Jun. 2022.

W. Q. A. Saif, M. K. Alshammari, B. A. Mohammed, and A. A. Sallam, "Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16471–16477, Oct. 2024.

A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaševičius, "Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review," Algorithms, vol. 16, no. 5, Apr. 2023, Art. no. 236.

O. M. Alyasiri, Y.-N. Cheah, H. Zhang, O. M. Al-Janabi, and A. K. Abasi, "Text classification based on optimization feature selection methods: a review and future directions," Multimedia Tools and Applications, Jul. 2024.

A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, "A Survey on Text Classification Algorithms: From Text to Predictions," Information, vol. 13, no. 2, Feb. 2022, Art. no. 83.

A. Wahdan, M. Al-Emran, and K. Shaalan, "A systematic review of Arabic text classification: areas, applications, and future directions," Soft Computing, vol. 28, no. 2, pp. 1545–1566, Jan. 2024.

O. M. Alyasiri, Y.-N. Cheah, and A. K. Abasi, "Hybrid Filter-Wrapper Text Feature Selection Technique for Text Classification," in 2021 International Conference on Communication & Information Technology (ICICT), Basrah, Iraq, Jun. 2021, pp. 80–86.

S. Anitha, E. Kavi Varshini, N. Haritha Mahalakshmi, and S. Jishnu, "Optimizing Multi-Class Text Classification Models for Imbalanced News Data," in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, Jun. 2024, pp. 1–6.

R. Li, M. Liu, D. Xu, J. Gao, F. Wu, and L. Zhu, "A Review of Machine Learning Algorithms for Text Classification," in Cyber Security, vol. 1506, Eds. Singapore: Springer Nature Singapore, 2022, pp. 226–234.

M. Alloghani, D. Al-Jumeily, J. Mustafina, A. Hussain, and A. J. Aljaaf, "A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science," in Supervised and Unsupervised Learning for Data Science, Eds. Cham: Springer International Publishing, 2020, pp. 3–21.

A. Ali and W. K. Mashwani, "A Supervised Machine Learning Algorithms: Applications, Challenges, and Recommendations," Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, vol. 60, no. 4, Dec. 2023.

S. Naeem, A. Ali, S. Anam, and M. M. Ahmed, "An Unsupervised Machine Learning Algorithms: Comprehensive Review," International Journal of Computing and Digital Systems, vol. 13, no. 1, pp. 911–921, Apr. 2023.

M. Asif, A. A. Nagra, M. B. Ahmad, and K. Masood, "Feature Selection Empowered by Self-Inertia Weight Adaptive Particle Swarm Optimization for Text Classification," Applied Artificial Intelligence, vol. 36, no. 1, Dec. 2022, Art. no. 2004345.

P. Grover and S. Chawla, "Text Feature Space Optimization Using Artificial Bee Colony," in Soft Computing for Problem Solving, vol. 1057, Eds. Singapore: Springer Singapore, 2020, pp. 691–703.

R. Janani and S. Vijayarani, "Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection," in Advances in Electrical and Computer Technologies, vol. 672, Eds. Singapore: Springer Singapore, 2020, pp. 527–539.

R. Joseph Manoj, M. D. Anto Praveena, and K. Vijayakumar, "An ACO–ANN based feature selection algorithm for big data," Cluster Computing, vol. 22, no. S2, pp. 3953–3960, Mar. 2019.

A. Singh and A. Kumar, "Text document classification using a hybrid approach of ACOGA for feature selection," International Journal of Advanced Intelligence Paradigms, vol. 20, no. 1-2, 2021, Art. no. 158.

B. Mahesh, "Machine Learning Algorithms - A Review," International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381–386, Jan. 2020.

M. N. Ashtiani and B. Raahemi, "News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review," Expert Systems with Applications, vol. 217, May 2023, Art. no 119509.

Q. Li, S. Li, S. Zhang, J. Hu, and J. Hu, "A Review of Text Corpus-Based Tourism Big Data Mining," Applied Sciences, vol. 9, no. 16, Aug. 2019, Art. no. 3300.

A. Salau, N. Agwu Nwojo, M. Mahamat Boukar, and O. Usen, "Advancing Preauthorization Task in Healthcare: An Application of Deep Active Incremental Learning for Medical Text Classification," Engineering, Technology & Applied Science Research, vol. 13, no. 6, pp. 12205–12210, Dec. 2023.

S. G. Tesfagergish, R. Damaševičius, and J. Kapočiūtė-Dzikienė, "Deep Fake Recognition in Tweets Using Text Augmentation, Word Embeddings and Deep Learning," in Computational Science and Its Applications – ICCSA 2021, vol. 12954, Eds. Cham: Springer International Publishing, 2021, pp. 523–538.

M. N. Asim, M. U. Ghani, M. A. Ibrahim, W. Mahmood, A. Dengel, and S. Ahmed, "Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification," Neural Computing and Applications, vol. 33, no. 11, pp. 5437–5469, Jun. 2021.

X. Luo, "Efficient English text classification using selected Machine Learning Techniques," Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, Jun. 2021.

N. Aljedani, R. Alotaibi, and M. Taileb, "HMATC: Hierarchical multi-label Arabic text classification model using machine learning," Egyptian Informatics Journal, vol. 22, no. 3, pp. 225–237, Sep. 2021.

X. Liu et al., "Adapting Feature Selection Algorithms for the Classification of Chinese Texts," Systems, vol. 11, no. 9, Sep. 2023, Art. no. 483.

M. F. Ibrahim, M. A. Alhakeem, and N. A. Fadhil, "Evaluation of Naïve Bayes Classification in Arabic Short Text Classification," Al-Mustansiriyah Journal of Science, vol. 32, no. 4, pp. 42–50, Nov. 2021.

M. F. Ibrahim and A. Al-Taei, "Title-Based Document Classification for Arabic Theses and Dissertations," in Advances in Data and Information Sciences, vol. 318, Eds. Singapore: Springer Singapore, 2022, pp. 189–203.

H. Alshammary, M. F. Ibrahim, and H. A. Hussein, "Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification," InfoTech Spectrum: Iraqi Journal of Data Science, vol. 1, no. 1, pp. 42–54, Jun. 2024.

Q. Li et al., "A Survey on Text Classification: From Traditional to Deep Learning," ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–41, Apr. 2022.

M. Thangaraj and M. Sivakami, "Text Classification Techniques: A Literature Review," Interdisciplinary Journal of Information, Knowledge, and Management, vol. 13, pp. 117–135, 2018.

I. H. Sarker, "Machine Learning: Algorithms, Real-World Applications and Research Directions," SN Computer Science, vol. 2, no. 3, May 2021, Art. no. 160.

S. U. Hassan, J. Ahamed, and K. Ahmad, "Analytics of machine learning-based algorithms for text classification," Sustainable Operations and Computers, vol. 3, pp. 238–248, 2022.

S. Joshi and E. Abdelfattah, "Multi-Class Text Classification Using Machine Learning Models for Online Drug Reviews," in 2021 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, May 2021, pp. 0262–0267.

C. M. Suneera and J. Prakash, "Performance Analysis of Machine Learning and Deep Learning Models for Text Classification," in 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India, Dec. 2020, pp. 1–6.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, "A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification," Augmented Human Research, vol. 5, no. 1, Dec. 2020, Art. no. 12.

S. Szeghalmy and A. Fazekas, "A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2333.

19MclassTextWc dataset, 2006, G. Forman. [Online]. Available: https://sourceforge.net/projects/weka/files/datasets/text-datasets/19MclassTextWc.zip/download.

R. G. Rossi, R. M. Marcacini, and S. O. Rezende, "Benchmarking text collections for classification and clustering tasks," Institute of Mathematics and Computer Sciences, Nov. 2013.

E. H. Han and G. Karypis, "Centroid-Based Document Classification: Analysis and Experimental Results," in Principles of Data Mining and Knowledge Discovery, vol. 1910, Springer Berlin Heidelberg, 2000, pp. 424–431.

D. G. Pereira, A. Afonso, and F. M. Medeiros, "Overview of Friedman’s Test and Post-hoc Analysis," Communications in Statistics - Simulation and Computation, vol. 44, no. 10, pp. 2636–2653, Nov. 2015.

J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," The Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.

Vol. 15 (2025)	Vol. 7 (2017)
Vol. 14 (2024)	Vol. 6 (2016)
Vol. 13 (2023)	Vol. 5 (2015)
Vol. 12 (2022)	Vol. 4 (2014)
Vol. 11 (2021)	Vol. 3 (2013)
Vol. 10 (2020)	Vol. 2 (2012)
Vol. 9 (2019)	Vol. 1 (2011)
Vol. 8 (2018)

Multi-Class Text Classification using Machine Learning Techniques

Authors

Abstract

Keywords:

Downloads

Author Biographies

Osamah Mohammed Alyasiri, School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia | Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala, Iraq

Yu-N Cheah, School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia

References

Downloads

How to Cite

Metrics

License