A CNN-BiLSTM Hybrid Architecture with Resampling Techniques for Arabic Legal Text Classification

Authors

  • Oussama Tahtah L3IA Laboratory, Sidi Mohamed Ben Abdellah University, Fes, Morocco
  • Mohammed Bahbib Engineering Sciences Laboratory, Sidi Mohamed Ben Abdellah University, Fes, Morocco
  • Ahmed Zinedine L3IA Laboratory, Sidi Mohamed Ben Abdellah University, Fes, Morocco
  • Khalid Fardousse L3IA Laboratory, Sidi Mohamed Ben Abdellah University, Fes, Morocco
Volume: 15 | Issue: 6 | Pages: 29062-29068 | December 2025 | https://doi.org/10.48084/etasr.13506

Abstract

Arabic legal text classification has played a major role in improving judicial systems by automating the categorization of legal texts and facilitating access to legal information. Despite these benefits, developing a model to classify Arabic legal text faces significant challenges, including the rich morphology and the inherent complexities of the Arabic language. Additionally, the imbalanced distribution within the legal specialties adds more challenges to the development of such a model. To address these challenges, this paper proposes a hybrid Deep Learning (DL) model that combines Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) networks, using a pre-trained Arabic Bidirectional Encoder Representations from Transformers version 2 (AraBERTv2) model as a word embedding technique. Additionally, extensive experiments were conducted to explore the impact of resampling techniques on the legal text classification model and to achieve an equal class distribution. Furthermore, a newly collected Arabic legal dataset was used to evaluate the performance of the developed model, and several evaluation metrics were employed, including accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). The findings demonstrate that our model yielded superior performance, with a score of more than 95% across all employed metrics. Moreover, the Random Oversampling (RO) technique showed the best results among other resampling techniques.

Keywords:

Deep Learning (DL), Arabic Natural Language Processing (NLP), text classification, legal text, resampling

Downloads

Download data is not yet available.

References

T. El Moussaoui, C. Loqman, and J. Boumhidi, "Decoding legal processes: AI-driven system to streamline processing of the criminal records in Moroccan courts," Intelligent Systems with Applications, vol. 25, Mar. 2025, Art. no. 200487. DOI: https://doi.org/10.1016/j.iswa.2025.200487

F. Ariai, J. Mackenzie, and G. Demartini, "Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges." arXiv, Jul. 30, 2025. DOI: https://doi.org/10.1145/3777009

A. Bouhouche, M. Esghir, and M. Errachid, "Classification of Moroccan Legal and Legislative Texts Using Machine Learning Models," International Journal of Advanced Computer Science and Applications, vol. 15, no. 10, pp. 1108–1114, Oct. 2024. DOI: https://doi.org/10.14569/IJACSA.2024.01510113

A. Y. Ikram and L. Chakir, "Arabic Text Classification in the Legal Domain," in 2019 Third International Conference on Intelligent Computing in Data Sciences, Marrakech, Morocco, 2019, pp. 1–6. DOI: https://doi.org/10.1109/ICDS47004.2019.8942343

O. Tahtah, Y. Akhiat, A. Zinedine, and K. Fardousse, "Towards a Question/Answering System in Moroccan Legal Domain: data preparation and question classification phase using ML approaches," in 2023 7th IEEE Congress on Information Science and Technology, Agadir - Essaouira, Morocco, 2023, pp. 140–144. DOI: https://doi.org/10.1109/CiSt56084.2023.10409875

O. Tahtah, Y. Akhiat, A. Zinedine, and K. Fardousse, "Towards a Question/Answering System in Moroccan Legal Domain: Investigation of the impact of tailored preprocessing techniques and alternative word representations on the question classification," in 2024 3rd International Conference on Embedded Systems and Artificial Intelligence, Fez, Morocco, 2024, pp. 1–7. DOI: https://doi.org/10.1109/ESAI62891.2024.10913625

T. E. Moussaoui and L. Chakir, "BERT-CLSTM model for the classification of Moroccan commercial courts verdicts," in Proceedings of the 18th Conference on Computer Science and Intelligence Systems, Warsaw, Poland, 2023, pp. 281–284. DOI: https://doi.org/10.15439/2023F3561

E. Aljohani, "Enhancing Arabic Fake News Detection: Evaluating Data Balancing Techniques Across Multiple Machine Learning Models," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15947–15956, Aug. 2024. DOI: https://doi.org/10.48084/etasr.8019

M. J. Al-Khazaleh, M. Alian, and M. A. Jaradat, "Sentiment analysis of imbalanced Arabic data using sampling techniques and classification algorithms," Bulletin of Electrical Engineering and Informatics, vol. 13, no. 1, pp. 607–618, Feb. 2024. DOI: https://doi.org/10.11591/eei.v13i1.5886

"Judicial Portal of the Kingdom of Morocco." Juriscassation. https://juriscassation.cspj.ma/.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002. DOI: https://doi.org/10.1613/jair.953

H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning," in Advances in Intelligent Computing: International Conference on Intelligent Computing, Hefei, China, 2005, pp. 878–887. DOI: https://doi.org/10.1007/11538059_91

H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 2008, pp. 1322–1328. DOI: https://doi.org/10.1109/IJCNN.2008.4633969

P. Hart, "The condensed nearest neighbor rule (Corresp.)," IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 515–516, May 1968. DOI: https://doi.org/10.1109/TIT.1968.1054155

D. L. Wilson, "Asymptotic Properties of Nearest Neighbor Rules Using Edited Data," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, no. 3, pp. 408–421, Jul. 1972. DOI: https://doi.org/10.1109/TSMC.1972.4309137

"Two Modifications of CNN," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769–772, Nov. 1976. DOI: https://doi.org/10.1109/TSMC.1976.4309452

G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, Jun. 2004. DOI: https://doi.org/10.1145/1007730.1007735

W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, Dec. 2022. DOI: https://doi.org/10.1109/TNNLS.2021.3084827

S. Zhang, D. Zheng, X. Hu, and M. Yang, "Bidirectional Long Short-Term Memory Networks for Relation Classification," in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, Shanghai, China, 2015, pp. 73–78.

C. Halimu, A. Kasem, and S. H. S. Newaz, "Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification," in Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat, Vietnam, 2019, pp. 1–6. DOI: https://doi.org/10.1145/3310986.3311023

Downloads

How to Cite

[1]
O. Tahtah, M. Bahbib, A. Zinedine, and K. Fardousse, “A CNN-BiLSTM Hybrid Architecture with Resampling Techniques for Arabic Legal Text Classification”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 6, pp. 29062–29068, Dec. 2025.

Metrics

Abstract Views: 313
PDF Downloads: 297

Metrics Information