Chinese Scientific Paper Classification Based on BERT, Graph Convolutional Networks, and Ensemble Learning
Received: 21 July 2025 | Revised: 12 September 2025 and 25 September 2025 | Accepted: 28 September 2025 | Online: 8 December 2025
Corresponding author: Syaripah Ruzaini Syed Aris
Abstract
With the increasing number of published papers, improving the efficiency of researchers in finding relevant literature through high-precision automatic classification has become a key research focus. Currently, most studies rely on a single type of paper metadata, such as abstracts or titles, without comparing the importance of different metadata types or utilizing them comprehensively together. Additionally, many experiments utilize non-public datasets, which makes it difficult to compare model performances across different research. This paper examines and compares the classification performance of TextGCN, BERT, and BERT-GCN models on public Chinese paper datasets. It also investigates the role of abstracts, keywords, and titles in classification and identifies the data combination that yields the highest accuracy through experiments. Since the output of the BERT-GCN model is based on the weighted sum of the outputs of BERT and TextGCN, determining the optimal weight that results in the best classification performance is also a key focus. To further enhance the performance of the BERT-GCN model, ensemble voting is used to combine the prediction results of multiple models trained on different data sources. Compared to the latest baseline models, the proposed method significantly improves the classification accuracy in Chinese paper datasets, with the highest accuracy increasing from 83.6 to 87.2%.
Keywords:
BERT-GCN, ensemble learning, Chinese scientific paper classificationDownloads
References
I. Jaya, I. Aulia, S. M. Hardi, J. T. Tarigan, M. S. Lydia, and Caroline, ''Scientific Documents Classification Using Support Vector Machine Algorithm,'' Journal of Physics: Conference Series, Medan, Sumatera Utara, Indonesia, vol. 1235, no. 1, June 2019, Art. no. 012082. DOI: https://doi.org/10.1088/1742-6596/1235/1/012082
Y. Xiaohua and G. Haiyun, "Improved Bayesian algorithm based automatic classification method for bibliography,", Computer Science, vol. 45, no. 8, pp. 203-207, 2018.
Q. Li et al., ''A Survey on Text Classification: From Traditional to Deep Learning,'' ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–41, Apr. 2022. DOI: https://doi.org/10.1145/3495162
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ''BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.'' arXiv, 2018.
S. Lai, L. Xu, K. Liu, and J. Zhao, ''Recurrent Convolutional Neural Networks for Text Classification,'' Proceedings of the AAAI Conference on Artificial Intelligence, China, vol. 29, no. 1, Feb. 2015. DOI: https://doi.org/10.1609/aaai.v29i1.9513
R. Sarasu, K. K. Thyagharajan, and N. R. Shanker, ''SF-CNN: Deep Text Classification and Retrieval for Text Documents,'' Intelligent Automation & Soft Computing, vol. 35, no. 2, pp. 1799–1813, 2023. DOI: https://doi.org/10.32604/iasc.2023.027429
M. I. Salih, S. M. Mohammed, A. Kh. Ibrahim, O. M. Ahmed, and L. M. Haji, ''Fine-Tuning BERT for Automated News Classification,'' Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22953–22959, June 2025. DOI: https://doi.org/10.48084/etasr.10625
X. Luo, S. Mutalib, and S. R. Syed Aris, ''Chinese paper classification based on pre-trained language model and hybrid deep learning method,'' IAES International Journal of Artificial Intelligence (IJ-AI), vol. 14, no. 1, Feb. 2025, Art. no. 641. DOI: https://doi.org/10.11591/ijai.v14.i1.pp641-649
Z. Liu, F. Li, G. Hao, X. He, and Y. Zhang, ''GCN-LSTM: multi-label educational emotion prediction based on graph Convolutional network and long and short term memory network fusion label correlation in online social networks,'' Computer Science and Information Systems, vol. 21, no. 4, pp. 1583–1605, 2024. DOI: https://doi.org/10.2298/CSIS240314049L
Y. Peng, W. Wu, J. Ren, and X. Yu, ''Novel GCN Model Using Dense Connection and Attention Mechanism for Text Classification,'' Neural Processing Letters, vol. 56, no. 2, Apr. 2024, Art. no. 144. DOI: https://doi.org/10.1007/s11063-024-11599-9
H. Cui, G. Wang, Y. Li, and R. E. Welsch, ''Self-training method based on GCN for semi-supervised short text classification,'' Information Sciences, vol. 611, pp. 18–29, Sept. 2022. DOI: https://doi.org/10.1016/j.ins.2022.07.186
H. Li, Y. Yan, S. Wang, J. Liu, and Y. Cui, ''Text classification on heterogeneous information network via enhanced GCN and knowledge,'' Neural Computing and Applications, vol. 35, no. 20, pp. 14911–14927, July 2023. DOI: https://doi.org/10.1007/s00521-023-08494-0
N. A. Alabdulkarim, M. A. Haq, and J. Gyani, ''Exploring Sentiment Analysis on Social Media Texts,'' Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14442–14450, June 2024. DOI: https://doi.org/10.48084/etasr.7238
X. Zhang, X. Yu, X. Liu, and X. Lyu, ''Scientific Paper Classification by Fusing BERT and GCN,'' in 2023 International Conference on Intelligent Education and Intelligent Research (IEIR), Wuhan, China, Nov. 2023, pp. 1–6. DOI: https://doi.org/10.1109/IEIR59294.2023.10391239
Y. Jang, K. Won, H. Choi, and S. Y. Shin, ''Classification of Research Papers on Radio Frequency Electromagnetic Field (RF-EMF) Using Graph Neural Networks (GNN),'' Applied Sciences, vol. 13, no. 7, Apr. 2023, Art. no. 4614. DOI: https://doi.org/10.3390/app13074614
Y. Guo, L. Lin, and Y. Liu, "Automatic Abstract Classification Method Based on BERT-GCN-Res Net," Journal of Tianjin University of Science & Technology, vol. 37, no. 2, pp. 51–56, 2022.
M. Lu et al., ''EP-BERTGCN: A Simple but Effective Power Equipment Fault Recognition Method,'' in 2022 4th International Conference on Information Technology and Computer Communications (ITCC), Guangzhou, China, June 2022, pp. 64–68. DOI: https://doi.org/10.1145/3548636.3548646
Y. Zhang, Y. Xu, and Y. Zhang, ''A Graph Neural Network Node Classification Application Model with Enhanced Node Association,'' Applied Sciences, vol. 13, no. 12, June 2023, Art. no. 7150. DOI: https://doi.org/10.3390/app13127150
Y. Lin et al., ''BertGCN: Transductive Text Classification by Combining GCN and BERT.'' arXiv, Mar. 21, 2022. DOI: https://doi.org/10.18653/v1/2021.findings-acl.126
Y. Li et al., ''CSL: A Large-scale Chinese Scientific Literature Dataset.'' arXiv, Sept. 12, 2022.
L. Xu et al., ''CLUE: A Chinese Language Understanding Evaluation Benchmark,'' in Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020, pp. 4762–4772. DOI: https://doi.org/10.18653/v1/2020.coling-main.419
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, ''ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.'' arXiv, 2019.
Y. Cui, W. Che, T. Liu, B. Qin, and Z. Yang, ''Pre-Training With Whole Word Masking for Chinese BERT,'' IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3504–3514, 2021. DOI: https://doi.org/10.1109/TASLP.2021.3124365
A. Attaallah and R. Ahmad Khan, ''SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification,'' Computers, Materials & Continua, vol. 71, no. 1, pp. 1403–1425, 2022. DOI: https://doi.org/10.32604/cmc.2022.021968
M. A. Haq, ''CDLSTM: A Novel Model for Climate Change Forecasting,'' Computers, Materials & Continua, vol. 71, no. 2, pp. 2363–2381, 2022. DOI: https://doi.org/10.32604/cmc.2022.023059
Downloads
How to Cite
License
Copyright (c) 2025 Xin Luo, Syaripah Ruzaini Syed Aris, Sofianita Mutalib

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
