Context-Aware Code Summarization Using Multimodal Transformer
Received: 29 October 2025 | Revised: 20 November 2025 and 30 November 2025 | Accepted: 3 December 2025 | Online: 26 March 2026
Corresponding author: Gohar Rahman
Abstract
Modern software systems continue to grow in complexity, making it increasingly challenging for developers to understand code without clear and up-to-date documentation. This study proposes a multimodal transformer architecture based on CodeT5, enhanced with Abstract Syntax Tree (AST) information to improve both code summarization and the detection of semantic bugs. The suggested framework is designed to capture token-level, structural, and contextual cues, enabling deeper program comprehension than traditional text-only models. The model has been trained and tested on the CoNaLa dataset and compared to baseline and CodeT5 models. Experimental results show substantial improvements, achieving a Bilingual Evaluation Understudy (BLEU) score of 81.34 (an improvement of 44.14 points over CodeT5) and a Recall-Oriented Understudy for Gisting Evaluation–Longest Common Subsequence (ROUGE-L) score of 0.89. These findings confirm that incorporating structural awareness significantly enhances summary relevance and bug-identification capability. The study contributes a scalable, context-sensitive model for automated software understanding and offers strong potential for integration into real-world development tools.
Keywords:
code summarization, CodeT5, Abstract Syntax Tree (AST), transformer, software engineeringDownloads
References
A. N. Sontakke, M. Patwardhan, L. Vig, R. K. Medicherla, R. Naik, and G. Shroff, "Code Summarization: Do Transformers Really Understand Code?," presented at the Deep Learning for Code Workshop, Virtual Event, 2022.
S. Gao et al., "Code Structure–Guided Transformer for Source Code Summarization," ACM Transactions on Software Engineering and Methodology, vol. 32, no. 1, Feb. 2023, Art. no. 23. DOI: https://doi.org/10.1145/3522674
Z. Yang et al., "A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts," in 2021 IEEE/ACM 29th International Conference on Program Comprehension, Madrid, Spain, 2021, pp. 1–12. DOI: https://doi.org/10.1109/ICPC52881.2021.00010
Y. Gao and C. Lyu, "M2TS: multi-scale multi-modal approach based on transformer for source code summarization," in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Virtual Event, 2022, pp. 24–35. DOI: https://doi.org/10.1145/3524610.3527907
A. Mastropaolo, M. Ciniselli, L. Pascarella, R. Tufano, E. Aghajani, and G. Bavota, "Towards Summarizing Code Snippets Using Pre-Trained Transformers," in 2024 IEEE/ACM 32nd International Conference on Program Comprehension, Lisbon, Portugal, 2024, pp. 1–12. DOI: https://doi.org/10.1145/3643916.3644400
X. Zhang, X. Hou, X. Qiao, and W. Song, "A review of automatic source code summarization," Empirical Software Engineering, vol. 29, no. 6, Oct. 2024, Art. no. 162. DOI: https://doi.org/10.1007/s10664-024-10553-6
A. Abdu, Z. Zhai, R. Algabri, H. A. Abdo, K. Hamad, and M. A. Al-antari, "Deep Learning-Based Software Defect Prediction via Semantic Key Features of Source Code—Systematic Survey," Mathematics, vol. 10, no. 17, Aug. 2022, Art. no. 3120. DOI: https://doi.org/10.3390/math10173120
I. Mehmood et al., "A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning," IEEE Access, vol. 11, pp. 63579–63597, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3287326
C.-Y. Su, A. Bansal, Y. Huang, T. J.-J. Li, and C. McMillan, "Context-aware code summary generation," Journal of Systems and Software, vol. 231, Jan. 2026, Art. no. 112580. DOI: https://doi.org/10.1016/j.jss.2025.112580
R. Wallace et al., "Programmer Visual Attention During Context-Aware Code Summarization," IEEE Transactions on Software Engineering, vol. 51, no. 5, pp. 1524–1537, May 2025. DOI: https://doi.org/10.1109/TSE.2025.3554990
K. Yang et al., "An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization," in 2023 IEEE/ACM 31st International Conference on Program Comprehension, Melbourne, Australia, 2023, pp. 89–100. DOI: https://doi.org/10.1109/ICPC58990.2023.00024
S. Kotsiantis, V. Verykios, and M. Tzagarakis, "AI-Assisted Programming Tasks Using Code Embeddings and Transformers," Electronics, vol. 13, no. 4, Feb. 2024, Art. no. 767. DOI: https://doi.org/10.3390/electronics13040767
A. Mastropaolo, M. Ciniselli, M. Di Penta, and G. Bavota, "Evaluating Code Summarization Techniques: A New Metric and an Empirical Characterization," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 2024, pp. 1–13. DOI: https://doi.org/10.1145/3597503.3639174
E. Shi et al., "On the evaluation of neural code summarization," in Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 2022, pp. 1597–1608. DOI: https://doi.org/10.1145/3510003.3510060
T. Ahmed, K. S. Pai, P. Devanbu, and E. Barr, "Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 2024, pp. 1–13. DOI: https://doi.org/10.1145/3597503.3639183
C. Lin, Z. Ouyang, J. Zhuang, J. Chen, H. Li, and R. Wu, "Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting," in 2021 IEEE/ACM 29th International Conference on Program Comprehension, Madrid, Spain, 2021, pp. 184–195. DOI: https://doi.org/10.1109/ICPC52881.2021.00026
Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," in Findings of the Association for Computational Linguistics: EMNLP 2020, Virtual Event, 2020, pp. 1536–1547. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.139
D. Guo et al., "GraphCodeBERT: Pre-training Code Representations with Data Flow," in 9th International Conference on Learning Representations, Virtual Event, 2021.
Y. Huang et al., "Context-aware Bug Reproduction for Mobile Apps," in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 2023, pp. 2336–2348. DOI: https://doi.org/10.1109/ICSE48619.2023.00196
X. Wang et al., "SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation." arXiv, Sept. 09, 2021.
A. A. A. Mohammed, "Improving Intrusion Detection Systems by Using Deep Learning Methods on Time Series Data," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19267–19272, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9417
T. M. Rocha and A. L. D. C. Carvalho, "SiameseQAT: A Semantic Context-Based Duplicate Bug Report Detection Using Replicated Cluster Information," IEEE Access, vol. 9, pp. 44610–44630, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3066283
A. Bansal, "Context-Aware Models for Automatic Source Code Summarization," Ph.D. dissertation, Division of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA, 2024.
D. Roy, S. Fakhoury, and V. Arnaoudova, "Reassessing automatic evaluation metrics for code summarization tasks," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 2021, pp. 1105–1116. DOI: https://doi.org/10.1145/3468264.3468588
P. Yin, B. Deng, E. Chen, B. Vasilescu, and G. Neubig, "Learning to mine aligned code and natural language pairs from stack overflow," in Proceedings of the 15th International Conference on Mining Software Repositories, Gothenburg, Sweden, 2018, pp. 476–486. DOI: https://doi.org/10.1145/3196398.3196408
Downloads
How to Cite
License
Copyright (c) 2026 Sadia Saif, Muhammad Yaseen, Umar Farooq Khattak, Gohar Rahman

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
