RetinoFusionNet: A Scalable and Interpretable Vision Transformer Framework for Diabetic Retinopathy Detection
Received: 4 October 2025 | Revised: 27 October 2025 | Accepted: 3 November 2025 | Online: 10 December 2025
Corresponding author: Niranjan C. Kundur
Abstract
Diabetic Retinopathy (DR) is a leading cause of preventable blindness, highlighting the need for automated screening systems that combine accuracy, efficiency, and interpretability. The present study introduces RetinoFusionNet, a prototype-guided Vision Transformer (ViT) that unifies multi-resolution patch embedding, cross-scale attention, and class-specific prototype reasoning to capture both localized lesions and broader retinal structures. By segmenting fundus images into varied patch sizes, the model effectively extracts fine and global features, while cross-scale attention establishes dependencies across distant abnormalities. Prototype-based learning provides interpretable visual anchors that align predictions with clinically recognized disease patterns, enhancing trust in automated decisions. Comprehensive evaluation on EyePACS, APTOS 2019, and Messidor-2 datasets demonstrates state-of-the-art accuracy with only a 4.1–4.5% cross-dataset drop, outperforming ViT and ProtoPNet, which show a decline of 8.3–12.1%. RetinoFusionNet also achieves a per-image inference time of 78 ms, reduces memory usage by 42% compared to standard ViTs, and operates at just 14.6 GFLOPs, confirming its robustness and deployment feasibility. By combining precision, computational efficiency, and transparency, RetinoFusionNet is established as a practical and scalable solution for large-scale DR screening, particularly in resource-limited clinical settings.
Keywords:
diabetic retinopathy, vision transformers, prototype learning, distributed training, medical image analysis, interpretable AI, deep learning, fundus image classificationDownloads
References
A. Pandey, A. Pandey, K. Maharjan, K. Shrestha, and P. Upadhyaya, "Deep Learning-Based Analysis for Diabetic Retinopathy Identification," Kathford Journal of Engineering and Management, vol. 4, no. 1, pp. 1–20, Feb. 2025. DOI: https://doi.org/10.3126/kjem.v4i1.74701
S. Pendhari, R. Kewalya, F. Rizvi, M. S. Khan, and N. Pendhari, "Attention-Enhanced Prototypical Networks for Few-Shot Microaneurysm Detection in Diabetic Retinopathy Images," in 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, Gwalior, India, Mar. 2025, pp. 1–6. DOI: https://doi.org/10.1109/IATMSI64286.2025.10985676
S. Asif et al., "Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision," Archives of Computational Methods in Engineering, vol. 32, no. 2, pp. 853–883, Mar. 2025. DOI: https://doi.org/10.1007/s11831-024-10148-w
M. Trigka and E. Dritsas, "A Comprehensive Survey of Deep Learning Approaches in Image Processing," Sensors, vol. 25, no. 2, Jan. 2025, Art. no. 531. DOI: https://doi.org/10.3390/s25020531
W. Khan, S. Leem, K. B. See, J. K. Wong, S. Zhang, and R. Fang, "A Comprehensive Survey of Foundation Models in Medicine," IEEE Reviews in Biomedical Engineering, pp. 1–22, 2025.
D. M. H. Nguyen et al., "Deep Learning for Ophthalmology: The State-of-the-Art and Future Trends." arXiv, 2025.
Y. Yin, Z. Tang, and H. Weng, "Application of Visual Transformer in Renal Image Analysis," BioMedical Engineering OnLine, vol. 23, no. 1, Mar. 2024, Art. no. 27. DOI: https://doi.org/10.1186/s12938-024-01209-z
T. Lai, "Interpretable Medical Imagery Diagnosis with Self-Attentive Transformers: A Review of Explainable AI for Health Care," BioMedInformatics, vol. 4, no. 1, pp. 113–126, Jan. 2024. DOI: https://doi.org/10.3390/biomedinformatics4010008
D. Mehta, Y. Jiang, C. Jan, M. He, K. Jadhav, and Z. Ge, "Interpretable Few-Shot Retinal Disease Diagnosis with Concept-Guided Prompting of Vision-Language Models," in Information Processing in Medical Imaging, I. Oguz, S. Zhang, and D. N. Metaxas, Eds. Cham: Springer Nature Switzerland, 2026, vol. 15830, pp. 263–277. DOI: https://doi.org/10.1007/978-3-031-96625-5_18
R. Ramesh and S. Sathiamoorthy, "A Deep Learning Grading Classification of Diabetic Retinopathy on Retinal Fundus Images with Bio-inspired Optimization," Engineering, Technology & Applied Science Research, vol. 13, no. 4, pp. 11248–11252, Aug. 2023. DOI: https://doi.org/10.48084/etasr.6033
Z. Li et al., "Interactively Assisting Glaucoma Diagnosis with an Expert Knowledge-Distilled Vision Transformer," in Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, Apr. 2025, pp. 1–8. DOI: https://doi.org/10.1145/3706599.3719719
O. Folorunsho, S. E. Akinsanya, O. A. Fagbuagun, S. A. Mogaji, and S. K. Raji, "Explainable Ensemble Deep Learning Model for Predicting Diabetic Retinopathy Based on APTOS 2019 Eye Pack Dataset," LAUTECH Journal of Engineering and Technology, vol. 19, no. 1, pp. 1–14, Feb. 2025. DOI: https://doi.org/10.36108/laujet/5202.91.0110
J. Cuadros and G. Bresnick, "EyePACS: An Adaptable Telemedicine System for Diabetic Retinopathy Screening," Journal of Diabetes Science and Technology, vol. 3, no. 3, pp. 509–516, May 2009. DOI: https://doi.org/10.1177/193229680900300315
H. Riaz, J. Park, H. Choi, H. Kim, and J. Kim, "Deep and Densely Connected Networks for Classification of Diabetic Retinopathy," Diagnostics, vol. 10, no. 1, Jan. 2020, Art. no. 24. DOI: https://doi.org/10.3390/diagnostics10010024
V. H. Vardhan, N. V. Kumar, and K. V. N. Reddy, "Advancements in Diabetic Retinopathy Detection: An Analysis of Emerging Deep Learning Architectures and Techniques," SSRN Electronic Journal, 2025. DOI: https://doi.org/10.2139/ssrn.5224195
Downloads
How to Cite
License
Copyright (c) 2025 K. V. Shanthala, Niranjan C. Kundur

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
