Context-Aware Code Summarization Using Multimodal Transformer

Sadia Saif; Muhammad Yaseen; Umar Farooq Khattak; Gohar Rahman

doi:10.48084/etasr.15857

Authors

Sadia Saif Faculty of Computing, Riphah International University, Lahore, Pakistan
Muhammad Yaseen Faculty of Computing, Riphah International University, Lahore, Pakistan
Umar Farooq Khattak Faculty of Artificial Intelligence and Frontier Technologies, UNITAR International University Selangor, Malaysia
Gohar Rahman Faculty of Computing and Informatics, University Malaysia Sabah (UMS), Kota Kinabalu, Sabah, Malaysia

Volume: 16 | Issue: 2 | Pages: 34257-34263 | April 2026 | https://doi.org/10.48084/etasr.15857

Received: 29 October 2025 | Revised: 20 November 2025 and 30 November 2025 | Accepted: 3 December 2025 | Online: 26 March 2026

Corresponding author: Gohar Rahman

Abstract

Modern software systems continue to grow in complexity, making it increasingly challenging for developers to understand code without clear and up-to-date documentation. This study proposes a multimodal transformer architecture based on CodeT5, enhanced with Abstract Syntax Tree (AST) information to improve both code summarization and the detection of semantic bugs. The suggested framework is designed to capture token-level, structural, and contextual cues, enabling deeper program comprehension than traditional text-only models. The model has been trained and tested on the CoNaLa dataset and compared to baseline and CodeT5 models. Experimental results show substantial improvements, achieving a Bilingual Evaluation Understudy (BLEU) score of 81.34 (an improvement of 44.14 points over CodeT5) and a Recall-Oriented Understudy for Gisting Evaluation–Longest Common Subsequence (ROUGE-L) score of 0.89. These findings confirm that incorporating structural awareness significantly enhances summary relevance and bug-identification capability. The study contributes a scalable, context-sensitive model for automated software understanding and offers strong potential for integration into real-world development tools.

Keywords:

code summarization, CodeT5, Abstract Syntax Tree (AST), transformer, software engineering

References

A. N. Sontakke, M. Patwardhan, L. Vig, R. K. Medicherla, R. Naik, and G. Shroff, "Code Summarization: Do Transformers Really Understand Code?," presented at the Deep Learning for Code Workshop, Virtual Event, 2022.

S. Gao et al., "Code Structure–Guided Transformer for Source Code Summarization," ACM Transactions on Software Engineering and Methodology, vol. 32, no. 1, Feb. 2023, Art. no. 23. DOI: https://doi.org/10.1145/3522674

Z. Yang et al., "A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts," in 2021 IEEE/ACM 29th International Conference on Program Comprehension, Madrid, Spain, 2021, pp. 1–12. DOI: https://doi.org/10.1109/ICPC52881.2021.00010

Y. Gao and C. Lyu, "M2TS: multi-scale multi-modal approach based on transformer for source code summarization," in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Virtual Event, 2022, pp. 24–35. DOI: https://doi.org/10.1145/3524610.3527907

A. Mastropaolo, M. Ciniselli, L. Pascarella, R. Tufano, E. Aghajani, and G. Bavota, "Towards Summarizing Code Snippets Using Pre-Trained Transformers," in 2024 IEEE/ACM 32nd International Conference on Program Comprehension, Lisbon, Portugal, 2024, pp. 1–12. DOI: https://doi.org/10.1145/3643916.3644400

X. Zhang, X. Hou, X. Qiao, and W. Song, "A review of automatic source code summarization," Empirical Software Engineering, vol. 29, no. 6, Oct. 2024, Art. no. 162. DOI: https://doi.org/10.1007/s10664-024-10553-6

A. Abdu, Z. Zhai, R. Algabri, H. A. Abdo, K. Hamad, and M. A. Al-antari, "Deep Learning-Based Software Defect Prediction via Semantic Key Features of Source Code—Systematic Survey," Mathematics, vol. 10, no. 17, Aug. 2022, Art. no. 3120. DOI: https://doi.org/10.3390/math10173120

I. Mehmood et al., "A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning," IEEE Access, vol. 11, pp. 63579–63597, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3287326

C.-Y. Su, A. Bansal, Y. Huang, T. J.-J. Li, and C. McMillan, "Context-aware code summary generation," Journal of Systems and Software, vol. 231, Jan. 2026, Art. no. 112580. DOI: https://doi.org/10.1016/j.jss.2025.112580

R. Wallace et al., "Programmer Visual Attention During Context-Aware Code Summarization," IEEE Transactions on Software Engineering, vol. 51, no. 5, pp. 1524–1537, May 2025. DOI: https://doi.org/10.1109/TSE.2025.3554990

K. Yang et al., "An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization," in 2023 IEEE/ACM 31st International Conference on Program Comprehension, Melbourne, Australia, 2023, pp. 89–100. DOI: https://doi.org/10.1109/ICPC58990.2023.00024

S. Kotsiantis, V. Verykios, and M. Tzagarakis, "AI-Assisted Programming Tasks Using Code Embeddings and Transformers," Electronics, vol. 13, no. 4, Feb. 2024, Art. no. 767. DOI: https://doi.org/10.3390/electronics13040767

A. Mastropaolo, M. Ciniselli, M. Di Penta, and G. Bavota, "Evaluating Code Summarization Techniques: A New Metric and an Empirical Characterization," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 2024, pp. 1–13. DOI: https://doi.org/10.1145/3597503.3639174

E. Shi et al., "On the evaluation of neural code summarization," in Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 2022, pp. 1597–1608. DOI: https://doi.org/10.1145/3510003.3510060

T. Ahmed, K. S. Pai, P. Devanbu, and E. Barr, "Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 2024, pp. 1–13. DOI: https://doi.org/10.1145/3597503.3639183

C. Lin, Z. Ouyang, J. Zhuang, J. Chen, H. Li, and R. Wu, "Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting," in 2021 IEEE/ACM 29th International Conference on Program Comprehension, Madrid, Spain, 2021, pp. 184–195. DOI: https://doi.org/10.1109/ICPC52881.2021.00026

Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," in Findings of the Association for Computational Linguistics: EMNLP 2020, Virtual Event, 2020, pp. 1536–1547. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.139

D. Guo et al., "GraphCodeBERT: Pre-training Code Representations with Data Flow," in 9th International Conference on Learning Representations, Virtual Event, 2021.

Y. Huang et al., "Context-aware Bug Reproduction for Mobile Apps," in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 2023, pp. 2336–2348. DOI: https://doi.org/10.1109/ICSE48619.2023.00196

X. Wang et al., "SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation." arXiv, Sept. 09, 2021.

A. A. A. Mohammed, "Improving Intrusion Detection Systems by Using Deep Learning Methods on Time Series Data," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19267–19272, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9417

T. M. Rocha and A. L. D. C. Carvalho, "SiameseQAT: A Semantic Context-Based Duplicate Bug Report Detection Using Replicated Cluster Information," IEEE Access, vol. 9, pp. 44610–44630, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3066283

A. Bansal, "Context-Aware Models for Automatic Source Code Summarization," Ph.D. dissertation, Division of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA, 2024.

D. Roy, S. Fakhoury, and V. Arnaoudova, "Reassessing automatic evaluation metrics for code summarization tasks," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 2021, pp. 1105–1116. DOI: https://doi.org/10.1145/3468264.3468588

P. Yin, B. Deng, E. Chen, B. Vasilescu, and G. Neubig, "Learning to mine aligned code and natural language pairs from stack overflow," in Proceedings of the 15th International Conference on Mining Software Repositories, Gothenburg, Sweden, 2018, pp. 476–486. DOI: https://doi.org/10.1145/3196398.3196408

Context-Aware Code Summarization Using Multimodal Transformer

Authors

Abstract

Keywords:

References

Downloads

How to Cite

Metrics

License

template

Download the latest version of our template (March 13, 2026)