RetinoFusionNet: A Scalable and Interpretable Vision Transformer Framework for Diabetic Retinopathy Detection

K. V. Shanthala; Niranjan C. Kundur

doi:10.48084/etasr.15311

Authors

K. V. Shanthala JSS Academy of Technical Education, Bengaluru, Karnataka, India | Visvesvaraya Technological University, Belagavi, Karnataka, India
Niranjan C. Kundur JSS Academy of Technical Education, Bengaluru, Karnataka, India | Visvesvaraya Technological University, Belagavi, Karnataka, India

Volume: 16 | Issue: 1 | Pages: 31386-31392 | February 2026 | https://doi.org/10.48084/etasr.15311

Received: 4 October 2025 | Revised: 27 October 2025 | Accepted: 3 November 2025 | Online: 10 December 2025

Corresponding author: Niranjan C. Kundur

Abstract

Diabetic Retinopathy (DR) is a leading cause of preventable blindness, highlighting the need for automated screening systems that combine accuracy, efficiency, and interpretability. The present study introduces RetinoFusionNet, a prototype-guided Vision Transformer (ViT) that unifies multi-resolution patch embedding, cross-scale attention, and class-specific prototype reasoning to capture both localized lesions and broader retinal structures. By segmenting fundus images into varied patch sizes, the model effectively extracts fine and global features, while cross-scale attention establishes dependencies across distant abnormalities. Prototype-based learning provides interpretable visual anchors that align predictions with clinically recognized disease patterns, enhancing trust in automated decisions. Comprehensive evaluation on EyePACS, APTOS 2019, and Messidor-2 datasets demonstrates state-of-the-art accuracy with only a 4.1–4.5% cross-dataset drop, outperforming ViT and ProtoPNet, which show a decline of 8.3–12.1%. RetinoFusionNet also achieves a per-image inference time of 78 ms, reduces memory usage by 42% compared to standard ViTs, and operates at just 14.6 GFLOPs, confirming its robustness and deployment feasibility. By combining precision, computational efficiency, and transparency, RetinoFusionNet is established as a practical and scalable solution for large-scale DR screening, particularly in resource-limited clinical settings.

Keywords:

diabetic retinopathy, vision transformers, prototype learning, distributed training, medical image analysis, interpretable AI, deep learning, fundus image classification

Downloads

Download data is not yet available.

References

A. Pandey, A. Pandey, K. Maharjan, K. Shrestha, and P. Upadhyaya, "Deep Learning-Based Analysis for Diabetic Retinopathy Identification," Kathford Journal of Engineering and Management, vol. 4, no. 1, pp. 1–20, Feb. 2025. DOI: https://doi.org/10.3126/kjem.v4i1.74701

S. Pendhari, R. Kewalya, F. Rizvi, M. S. Khan, and N. Pendhari, "Attention-Enhanced Prototypical Networks for Few-Shot Microaneurysm Detection in Diabetic Retinopathy Images," in 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, Gwalior, India, Mar. 2025, pp. 1–6. DOI: https://doi.org/10.1109/IATMSI64286.2025.10985676

S. Asif et al., "Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision," Archives of Computational Methods in Engineering, vol. 32, no. 2, pp. 853–883, Mar. 2025. DOI: https://doi.org/10.1007/s11831-024-10148-w

M. Trigka and E. Dritsas, "A Comprehensive Survey of Deep Learning Approaches in Image Processing," Sensors, vol. 25, no. 2, Jan. 2025, Art. no. 531. DOI: https://doi.org/10.3390/s25020531

W. Khan, S. Leem, K. B. See, J. K. Wong, S. Zhang, and R. Fang, "A Comprehensive Survey of Foundation Models in Medicine," IEEE Reviews in Biomedical Engineering, pp. 1–22, 2025.

D. M. H. Nguyen et al., "Deep Learning for Ophthalmology: The State-of-the-Art and Future Trends." arXiv, 2025.

Y. Yin, Z. Tang, and H. Weng, "Application of Visual Transformer in Renal Image Analysis," BioMedical Engineering OnLine, vol. 23, no. 1, Mar. 2024, Art. no. 27. DOI: https://doi.org/10.1186/s12938-024-01209-z

T. Lai, "Interpretable Medical Imagery Diagnosis with Self-Attentive Transformers: A Review of Explainable AI for Health Care," BioMedInformatics, vol. 4, no. 1, pp. 113–126, Jan. 2024. DOI: https://doi.org/10.3390/biomedinformatics4010008

D. Mehta, Y. Jiang, C. Jan, M. He, K. Jadhav, and Z. Ge, "Interpretable Few-Shot Retinal Disease Diagnosis with Concept-Guided Prompting of Vision-Language Models," in Information Processing in Medical Imaging, I. Oguz, S. Zhang, and D. N. Metaxas, Eds. Cham: Springer Nature Switzerland, 2026, vol. 15830, pp. 263–277. DOI: https://doi.org/10.1007/978-3-031-96625-5_18

R. Ramesh and S. Sathiamoorthy, "A Deep Learning Grading Classification of Diabetic Retinopathy on Retinal Fundus Images with Bio-inspired Optimization," Engineering, Technology & Applied Science Research, vol. 13, no. 4, pp. 11248–11252, Aug. 2023. DOI: https://doi.org/10.48084/etasr.6033

Z. Li et al., "Interactively Assisting Glaucoma Diagnosis with an Expert Knowledge-Distilled Vision Transformer," in Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, Apr. 2025, pp. 1–8. DOI: https://doi.org/10.1145/3706599.3719719

O. Folorunsho, S. E. Akinsanya, O. A. Fagbuagun, S. A. Mogaji, and S. K. Raji, "Explainable Ensemble Deep Learning Model for Predicting Diabetic Retinopathy Based on APTOS 2019 Eye Pack Dataset," LAUTECH Journal of Engineering and Technology, vol. 19, no. 1, pp. 1–14, Feb. 2025. DOI: https://doi.org/10.36108/laujet/5202.91.0110

J. Cuadros and G. Bresnick, "EyePACS: An Adaptable Telemedicine System for Diabetic Retinopathy Screening," Journal of Diabetes Science and Technology, vol. 3, no. 3, pp. 509–516, May 2009. DOI: https://doi.org/10.1177/193229680900300315

H. Riaz, J. Park, H. Choi, H. Kim, and J. Kim, "Deep and Densely Connected Networks for Classification of Diabetic Retinopathy," Diagnostics, vol. 10, no. 1, Jan. 2020, Art. no. 24. DOI: https://doi.org/10.3390/diagnostics10010024

V. H. Vardhan, N. V. Kumar, and K. V. N. Reddy, "Advancements in Diabetic Retinopathy Detection: An Analysis of Emerging Deep Learning Architectures and Techniques," SSRN Electronic Journal, 2025. DOI: https://doi.org/10.2139/ssrn.5224195