Chinese Scientific Paper Classification Based on BERT, Graph Convolutional Networks, and Ensemble Learning

Xin Luo; Syaripah Ruzaini Syed Aris; Sofianita Mutalib

doi:10.48084/etasr.13540

Authors

Xin Luo Faculty of Computer and Mathematical Sciences, Center of Computing Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia | School of Computer Information Engineering, Hanshan Normal University, Guangdong, China
Syaripah Ruzaini Syed Aris Faculty of Computer and Mathematical Sciences, Center of Computing Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Sofianita Mutalib Faculty of Computer and Mathematical Sciences, Center of Computing Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

Volume: 15 | Issue: 6 | Pages: 29292-29298 | December 2025 | https://doi.org/10.48084/etasr.13540

Received: 21 July 2025 | Revised: 12 September 2025 and 25 September 2025 | Accepted: 28 September 2025 | Online: 8 December 2025

Corresponding author: Syaripah Ruzaini Syed Aris

Abstract

With the increasing number of published papers, improving the efficiency of researchers in finding relevant literature through high-precision automatic classification has become a key research focus. Currently, most studies rely on a single type of paper metadata, such as abstracts or titles, without comparing the importance of different metadata types or utilizing them comprehensively together. Additionally, many experiments utilize non-public datasets, which makes it difficult to compare model performances across different research. This paper examines and compares the classification performance of TextGCN, BERT, and BERT-GCN models on public Chinese paper datasets. It also investigates the role of abstracts, keywords, and titles in classification and identifies the data combination that yields the highest accuracy through experiments. Since the output of the BERT-GCN model is based on the weighted sum of the outputs of BERT and TextGCN, determining the optimal weight that results in the best classification performance is also a key focus. To further enhance the performance of the BERT-GCN model, ensemble voting is used to combine the prediction results of multiple models trained on different data sources. Compared to the latest baseline models, the proposed method significantly improves the classification accuracy in Chinese paper datasets, with the highest accuracy increasing from 83.6 to 87.2%.

Keywords:

BERT-GCN, ensemble learning, Chinese scientific paper classification

References

I. Jaya, I. Aulia, S. M. Hardi, J. T. Tarigan, M. S. Lydia, and Caroline, ''Scientific Documents Classification Using Support Vector Machine Algorithm,'' Journal of Physics: Conference Series, Medan, Sumatera Utara, Indonesia, vol. 1235, no. 1, June 2019, Art. no. 012082. DOI: https://doi.org/10.1088/1742-6596/1235/1/012082

Y. Xiaohua and G. Haiyun, "Improved Bayesian algorithm based automatic classification method for bibliography,", Computer Science, vol. 45, no. 8, pp. 203-207, 2018.

Q. Li et al., ''A Survey on Text Classification: From Traditional to Deep Learning,'' ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–41, Apr. 2022. DOI: https://doi.org/10.1145/3495162

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ''BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.'' arXiv, 2018.

S. Lai, L. Xu, K. Liu, and J. Zhao, ''Recurrent Convolutional Neural Networks for Text Classification,'' Proceedings of the AAAI Conference on Artificial Intelligence, China, vol. 29, no. 1, Feb. 2015. DOI: https://doi.org/10.1609/aaai.v29i1.9513

R. Sarasu, K. K. Thyagharajan, and N. R. Shanker, ''SF-CNN: Deep Text Classification and Retrieval for Text Documents,'' Intelligent Automation & Soft Computing, vol. 35, no. 2, pp. 1799–1813, 2023. DOI: https://doi.org/10.32604/iasc.2023.027429

M. I. Salih, S. M. Mohammed, A. Kh. Ibrahim, O. M. Ahmed, and L. M. Haji, ''Fine-Tuning BERT for Automated News Classification,'' Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22953–22959, June 2025. DOI: https://doi.org/10.48084/etasr.10625

X. Luo, S. Mutalib, and S. R. Syed Aris, ''Chinese paper classification based on pre-trained language model and hybrid deep learning method,'' IAES International Journal of Artificial Intelligence (IJ-AI), vol. 14, no. 1, Feb. 2025, Art. no. 641. DOI: https://doi.org/10.11591/ijai.v14.i1.pp641-649

Z. Liu, F. Li, G. Hao, X. He, and Y. Zhang, ''GCN-LSTM: multi-label educational emotion prediction based on graph Convolutional network and long and short term memory network fusion label correlation in online social networks,'' Computer Science and Information Systems, vol. 21, no. 4, pp. 1583–1605, 2024. DOI: https://doi.org/10.2298/CSIS240314049L

Y. Peng, W. Wu, J. Ren, and X. Yu, ''Novel GCN Model Using Dense Connection and Attention Mechanism for Text Classification,'' Neural Processing Letters, vol. 56, no. 2, Apr. 2024, Art. no. 144. DOI: https://doi.org/10.1007/s11063-024-11599-9

H. Cui, G. Wang, Y. Li, and R. E. Welsch, ''Self-training method based on GCN for semi-supervised short text classification,'' Information Sciences, vol. 611, pp. 18–29, Sept. 2022. DOI: https://doi.org/10.1016/j.ins.2022.07.186

H. Li, Y. Yan, S. Wang, J. Liu, and Y. Cui, ''Text classification on heterogeneous information network via enhanced GCN and knowledge,'' Neural Computing and Applications, vol. 35, no. 20, pp. 14911–14927, July 2023. DOI: https://doi.org/10.1007/s00521-023-08494-0

N. A. Alabdulkarim, M. A. Haq, and J. Gyani, ''Exploring Sentiment Analysis on Social Media Texts,'' Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14442–14450, June 2024. DOI: https://doi.org/10.48084/etasr.7238

X. Zhang, X. Yu, X. Liu, and X. Lyu, ''Scientific Paper Classification by Fusing BERT and GCN,'' in 2023 International Conference on Intelligent Education and Intelligent Research (IEIR), Wuhan, China, Nov. 2023, pp. 1–6. DOI: https://doi.org/10.1109/IEIR59294.2023.10391239

Y. Jang, K. Won, H. Choi, and S. Y. Shin, ''Classification of Research Papers on Radio Frequency Electromagnetic Field (RF-EMF) Using Graph Neural Networks (GNN),'' Applied Sciences, vol. 13, no. 7, Apr. 2023, Art. no. 4614. DOI: https://doi.org/10.3390/app13074614

Y. Guo, L. Lin, and Y. Liu, "Automatic Abstract Classification Method Based on BERT-GCN-Res Net," Journal of Tianjin University of Science & Technology, vol. 37, no. 2, pp. 51–56, 2022.

M. Lu et al., ''EP-BERTGCN: A Simple but Effective Power Equipment Fault Recognition Method,'' in 2022 4th International Conference on Information Technology and Computer Communications (ITCC), Guangzhou, China, June 2022, pp. 64–68. DOI: https://doi.org/10.1145/3548636.3548646

Y. Zhang, Y. Xu, and Y. Zhang, ''A Graph Neural Network Node Classification Application Model with Enhanced Node Association,'' Applied Sciences, vol. 13, no. 12, June 2023, Art. no. 7150. DOI: https://doi.org/10.3390/app13127150

Y. Lin et al., ''BertGCN: Transductive Text Classification by Combining GCN and BERT.'' arXiv, Mar. 21, 2022. DOI: https://doi.org/10.18653/v1/2021.findings-acl.126

Y. Li et al., ''CSL: A Large-scale Chinese Scientific Literature Dataset.'' arXiv, Sept. 12, 2022.

L. Xu et al., ''CLUE: A Chinese Language Understanding Evaluation Benchmark,'' in Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020, pp. 4762–4772. DOI: https://doi.org/10.18653/v1/2020.coling-main.419

Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, ''ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.'' arXiv, 2019.

Y. Cui, W. Che, T. Liu, B. Qin, and Z. Yang, ''Pre-Training With Whole Word Masking for Chinese BERT,'' IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3504–3514, 2021. DOI: https://doi.org/10.1109/TASLP.2021.3124365

A. Attaallah and R. Ahmad Khan, ''SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification,'' Computers, Materials & Continua, vol. 71, no. 1, pp. 1403–1425, 2022. DOI: https://doi.org/10.32604/cmc.2022.021968

M. A. Haq, ''CDLSTM: A Novel Model for Climate Change Forecasting,'' Computers, Materials & Continua, vol. 71, no. 2, pp. 2363–2381, 2022. DOI: https://doi.org/10.32604/cmc.2022.023059