AURORA-OCR: A Neuroevolutionary Framework with LLM-Guided Correction for Robust Text Recognition Under Degraded Imaging Conditions

T. M. Rakesh; G. S. Girisha; M. N. Renukadevi

doi:10.48084/etasr.18219

Authors

T. M. Rakesh Department of Computer Science and Engineering, Dayananda Sagar University, Bangalore, Karnataka, India
G. S. Girisha Department of Computer Science and Engineering, Dayananda Sagar University, Bangalore, Karnataka, India
M. N. Renukadevi Department of Computer Science and Engineering, Dayananda Sagar University, Bangalore, Karnataka, India

Volume: 16 | Issue: 3 | Pages: 36617-36623 | June 2026 | https://doi.org/10.48084/etasr.18219

Received: 17 February 2026 | Revised: 17 March 2026 and 27 March 2026 | Accepted: 28 March 2026 | Online: 6 June 2026

Corresponding author: T. M. Rakesh

Abstract

The performance of Optical Character Recognition (OCR) is significantly reduced under difficult imaging conditions, including blur, skew, background textures (interference), uneven illumination, and polarization (inverted). This study presents AURORA-OCR (Adaptive Universal Recognition and Robustness Architecture), an adaptive/self-optimizing OCR framework that implements: (i) a neuroevolutionary-based preprocessing engine, (ii) a multi-scale dual-polarity OCR fusion mechanism, and (iii) a lightweight LLM-guided text correction module using continuous local memory. An evolution search strategy dynamically determines optimal parameters for (i) gamma correction, (ii) contrast clipping, (iii) adaptive threshold sensitivity, and (iv) Front of Polarity (FOB) to maximize the OCR confidence and structural fidelity of degraded images. Final recognition occurs through a hybrid of Transformer/CRNN-inspired fusion that combines multiple OCR hypotheses produced from various spatial scales and polarities in order to achieve a stable output. An extensive evaluation was conducted on seven publicly available OCR benchmark datasets, namely ICDAR 2013, ICDAR 2015, ICDAR MLT-2019, Street View Text (SVT), IIIT5K, COCO-Text, and TextOCR-2021, along with a custom dataset of 500 real-world smartphone-captured document images, representing a broad spectrum of photometric and geometric degradation conditions, using the Precision, Recall, F1-ccore, Character Error Rate (CER), Word Error Rate (WER), semantic similarity, and semantic drift metrics, indicated that AURORA-OCR consistently outperformed previous OCR pipelines and was substantially superior for documents exhibiting low contrast, noise, and illumination distortion. AURORA-OCR achieved a reduction in CER of 23-41%, an improvement in F1-score of 19-36%, and a decrease in SD of 32%, therefore providing additional robustness to text extraction. The proposed method is lightweight, interpretable, and suitable for deployment in document digitization and embedded applications.

Keywords:

Optical Character Recognition (OCR), neuroevolutionary learning, multi-scale fusion, semantic correction, Large Language Model (LLM), dual-polarity processing, adaptive preprocessing, semantic drift, AURORA-OCR

References

Y. Huang, T. Lv, L. Cui, Y. Lu, and F. Wei, "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking," in Proceedings of the 30th ACM International Conference on Multimedia, July 2022, pp. 4083–4091.

Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character Region Awareness for Text Detection," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019, pp. 9357–9366.

Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network," in Computer Vision – ECCV 2016, vol. 9912, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 56–72.

M. Liao, B. Shi, and X. Bai, "TextBoxes++: A Single-Shot Oriented Scene Text Detector," IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3676–3690, Aug. 2018.

S. Fang, H. Xie, Y. Wang, Z. Mao, and Y. Zhang, "Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 7094–7103.

E. Boros, M. Ehrmann, M. Romanello, S. Najem-Meyer, and F. Kaplan, "Post-Correction of Historical Text Transcripts with Large Language Models: An Exploratory Study," in Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), 2024, pp. 133–159.

J. Baek et al., "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, pp. 4714–4722.

Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 13525–13534.

D. Bautista and R. Atienza, "Scene Text Recognition with Permuted Autoregressive Sequence Models," in Computer Vision – ECCV 2022, 2022, pp. 178–196.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, "Understanding deep learning requires rethinking generalization." arXiv, 2016.

Y. Wang, H. Xie, S. Fang, J. Wang, S. Zhu, and Y. Zhang, "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network," in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp. 14174–14183.

Dadapeer and Y. Suresh, "A Transformer-Based Optical Character Recognition Framework with Unified Residual Recurrent Networks for Multilingual Handwritten Documents," Engineering, Technology & Applied Science Research, vol. 16, no. 1, pp. 31363–31370, Feb. 2026.

K. O. Stanley and R. Miikkulainen, "Evolving Neural Networks through Augmenting Topologies," Evolutionary Computation, vol. 10, no. 2, pp. 99–127, June 2002.

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, "Deep Reinforcement Learning: A Brief Survey," IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, Nov. 2017.

B. Shi, X. Bai, and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, Nov. 2017.

A. Vaswani et al., "Attention Is All You Need." arXiv, 2017.

A. Hassaïne, S. Al Maadeed, J. Aljaam, and A. Jaoua, "ICDAR 2013 Competition on Gender Prediction from Handwriting," in 2013 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp. 1417–1421.

D. Karatzas et al., "ICDAR 2015 competition on Robust Reading," in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug. 2015, pp. 1156–1160.

N. Nayef et al., "ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT," in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov. 2017, pp. 1454–1459.

K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in 2011 International Conference on Computer Vision, Nov. 2011, pp. 1457–1464.

A. Mishra, K. Alahari, and C. Jawahar, "Scene Text Recognition using Higher Order Language Priors," in Proceedings of the British Machine Vision Conference 2012, 2012, Art. no. 127.

A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, "COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images." arXiv, 2016.

A. Singh, G. Pang, M. Toh, J. Huang, W. Galuba, and T. Hassner, "TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 8798–8808.

"Focused Scene Text." Robust Reading Competition, 2013, [Online]. Available: https://rrc.cvc.uab.es/?ch=2.

"Incidental Scene Text." Robust Reading Competition, 2015, [Online]. Available: https://rrc.cvc.uab.es/?ch=4.

"ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition." Robust Reading Competition, 2019, [Online]. Available: https://rrc.cvc.uab.es/?ch=15.

"The Street View Text Dataset - TC11." [Online]. Available: http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset.

"The IIIT 5K-word dataset." [Online]. Available: https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset.

"COCO-Text V2.0." [Online]. Available: https://bgshih.github.io/cocotext/.

"TextOCR." [Online]. Available: https://textvqa.org/textocr/.