A Multi-Stage Deep Learning Model for the Enhancement of the Quality of Camera-Captured Document Images

Pushplata Dubey; D. R. Shashikumar

doi:10.48084/etasr.14469

Authors

Pushplata Dubey Department of Computer Science and Engineering, Sai Vidya Institute of Technology, Bengaluru, Karnataka, India | Visvesvaraya Technological University, Belagavi, India
D. R. Shashikumar Department of Computer Science and Engineering, Sai Vidya Institute of Technology, Bengaluru, Karnataka, India | Visvesvaraya Technological University, Belagavi, India

Volume: 15 | Issue: 6 | Pages: 29964-29970 | December 2025 | https://doi.org/10.48084/etasr.14469

Received: 1 September 2025 | Revised: 23 September 2025 and 14 October 2025 | Accepted: 18 October 2025 | Online: 26 October 025

Corresponding author: Pushplata Dubey

Abstract

This study introduces a multistaged deep learning model aimed at enhancing document images captured through handheld or mobile cameras. The model comprises a modular three-stage pipeline for denoising, deblurring, and enhancement to progressively improve image clarity. Each stage leverages pretrained, task-specific networks to resolve common degradation issues such as sensor noise, motion blur, and uneven illumination. By progressively refining the image through these stages, the model effectively addresses common degradations found in captured image documents. The proposed model was trained and evaluated on camera-captured documents and compared with different existing models, such as GCDRNet and DocEnTr, achieving higher PSNR scores and improving text clarity. The experimental results render the model ideal for OCR and digital archiving use, highlighting its robustness and superior generalization across real-world document conditions.

Keywords:

camera-captured documents, denoising, GCDRNet, DocEnTr, OCR-readiness, PSNR, MSE

References

M. A. Souibgui and Y. Kessentini, "DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1180–1191, Mar. 2022. DOI: https://doi.org/10.1109/TPAMI.2020.3022406

J. Zhang, D. Peng, C. Liu, P. Zhang, and L. Jin, "DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks," in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, Jun. 2024, pp. 15654–15664. DOI: https://doi.org/10.1109/CVPR52733.2024.01482

Z. Yang, B. Liu, Y. Xiong, and G. Wu, "GDB: Gated Convolutions-based Document Binarization," Pattern Recognition, vol. 146, Feb. 2024, Art. no. 109989. DOI: https://doi.org/10.1016/j.patcog.2023.109989

G. D. Fan, B. Fan, M. Gan, G. Y. Chen, and C. L. P. Chen, "Multiscale Low-Light Image Enhancement Network With Illumination Constraint," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7403–7417, Aug. 2022. DOI: https://doi.org/10.1109/TCSVT.2022.3186880

B. Pan, Y. Du, and X. Guo, "Super-Resolution Reconstruction of Cell Images Based on Generative Adversarial Networks," IEEE Access, vol. 12, pp. 72252–72263, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3402535

X. Zhang and X. Wang, "MARN: Multi-Scale Attention Retinex Network for Low-Light Image Enhancement," IEEE Access, vol. 9, pp. 50939–50948, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3068534

J. Zhang, L. Liang, K. Ding, F. Guo, and L. Jin, "Appearance Enhancement for Camera-Captured Document Images in the Wild," IEEE Transactions on Artificial Intelligence, vol. 5, no. 5, pp. 2319–2330, Feb. 2024. DOI: https://doi.org/10.1109/TAI.2023.3321257

X. Wang et al., "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks," in Computer Vision – ECCV 2018 Workshops, Munich, Germany, 2019, pp. 63–79. DOI: https://doi.org/10.1007/978-3-030-11021-5_5

V. S. K. Katta, H. Kapalavai, and S. Mondal, "Generating New Human Faces and Improving the Quality of Images Using Generative Adversarial Networks(GAN)," in 2023 2nd International Conference on Edge Computing and Applications (ICECAA), Namakkal, India, Jul. 2023, pp. 1647–1652. DOI: https://doi.org/10.1109/ICECAA58104.2023.10212099

M. Sukesh, M. Muthunayagam, and M. Latha, "Super-Resolution Performance: A Comparative Analysis of SRGAN and ESRGAN Techniques for Single Image Restoration," in 2024 Intelligent Systems and Machine Learning Conference (ISML), Hyderabad, India, May 2024, pp. 128–134. DOI: https://doi.org/10.1109/ISML60050.2024.11007359

S. Das, K. Ma, Z. Shu, D. Samaras, and R. Shilkrot, "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), Oct. 2019, pp. 131–140. DOI: https://doi.org/10.1109/ICCV.2019.00022

J. Pan, Z. Hu, Z. Su, and M. H. Yang, "Deblurring Text Images via L0-Regularized Intensity and Gradient Prior," in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, Jun. 2014, pp. 2901–2908. DOI: https://doi.org/10.1109/CVPR.2014.371

W. Xiong, X. Jia, J. Xu, Z. Xiong, M. Liu, and J. Wang, "Historical document image binarization using background estimation and energy minimization," in 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, Aug. 2018, pp. 3716–3721. DOI: https://doi.org/10.1109/ICPR.2018.8546099

K. Ntirogiannis, B. Gatos, and I. Pratikakis, "ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014)," in 2014 14th International Conference on Frontiers in Handwriting Recognition, Hersonissos, Greece, Sep. 2014, pp. 809–813. DOI: https://doi.org/10.1109/ICFHR.2014.141

A. Hettiarachchi, S. Rathnayake, and K. Dissanayaka, "A Generative Adversarial Network to Upscale the Resolution of Low-Resolution Galaxy Images," in 2024 6th International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka, Dec. 2024, pp. 55–60. DOI: https://doi.org/10.1109/ICAC64487.2024.10851119

S. Mashhadani, W. H. Abdulsalam, I. Alhakam, O. A. Hassen, and S. M. Darwish, "An Enhanced Document Source Identification System for Printer Forensic Applications based on the Boosted Quantum KNN Classifier," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19983–19991, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9420

P. Dubey, "pushplatadubey/Multi-Staged-Document-Enhancement." Oct. 13, 2025, [Online]. Available: https://github.com/pushplatadubey/Multi-Staged-Document-Enhancement.