A Multi-Stage Deep Learning Model for the Enhancement of the Quality of Camera-Captured Document Images
Received: 1 September 2025 | Revised: 23 September 2025 and 14 October 2025 | Accepted: 18 October 2025 | Online: 26 October 025
Corresponding author: Pushplata Dubey
Abstract
This study introduces a multistaged deep learning model aimed at enhancing document images captured through handheld or mobile cameras. The model comprises a modular three-stage pipeline for denoising, deblurring, and enhancement to progressively improve image clarity. Each stage leverages pretrained, task-specific networks to resolve common degradation issues such as sensor noise, motion blur, and uneven illumination. By progressively refining the image through these stages, the model effectively addresses common degradations found in captured image documents. The proposed model was trained and evaluated on camera-captured documents and compared with different existing models, such as GCDRNet and DocEnTr, achieving higher PSNR scores and improving text clarity. The experimental results render the model ideal for OCR and digital archiving use, highlighting its robustness and superior generalization across real-world document conditions.
Keywords:
camera-captured documents, denoising, GCDRNet, DocEnTr, OCR-readiness, PSNR, MSEDownloads
References
M. A. Souibgui and Y. Kessentini, "DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1180–1191, Mar. 2022. DOI: https://doi.org/10.1109/TPAMI.2020.3022406
J. Zhang, D. Peng, C. Liu, P. Zhang, and L. Jin, "DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks," in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, Jun. 2024, pp. 15654–15664. DOI: https://doi.org/10.1109/CVPR52733.2024.01482
Z. Yang, B. Liu, Y. Xiong, and G. Wu, "GDB: Gated Convolutions-based Document Binarization," Pattern Recognition, vol. 146, Feb. 2024, Art. no. 109989. DOI: https://doi.org/10.1016/j.patcog.2023.109989
G. D. Fan, B. Fan, M. Gan, G. Y. Chen, and C. L. P. Chen, "Multiscale Low-Light Image Enhancement Network With Illumination Constraint," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7403–7417, Aug. 2022. DOI: https://doi.org/10.1109/TCSVT.2022.3186880
B. Pan, Y. Du, and X. Guo, "Super-Resolution Reconstruction of Cell Images Based on Generative Adversarial Networks," IEEE Access, vol. 12, pp. 72252–72263, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3402535
X. Zhang and X. Wang, "MARN: Multi-Scale Attention Retinex Network for Low-Light Image Enhancement," IEEE Access, vol. 9, pp. 50939–50948, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3068534
J. Zhang, L. Liang, K. Ding, F. Guo, and L. Jin, "Appearance Enhancement for Camera-Captured Document Images in the Wild," IEEE Transactions on Artificial Intelligence, vol. 5, no. 5, pp. 2319–2330, Feb. 2024. DOI: https://doi.org/10.1109/TAI.2023.3321257
X. Wang et al., "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks," in Computer Vision – ECCV 2018 Workshops, Munich, Germany, 2019, pp. 63–79. DOI: https://doi.org/10.1007/978-3-030-11021-5_5
V. S. K. Katta, H. Kapalavai, and S. Mondal, "Generating New Human Faces and Improving the Quality of Images Using Generative Adversarial Networks(GAN)," in 2023 2nd International Conference on Edge Computing and Applications (ICECAA), Namakkal, India, Jul. 2023, pp. 1647–1652. DOI: https://doi.org/10.1109/ICECAA58104.2023.10212099
M. Sukesh, M. Muthunayagam, and M. Latha, "Super-Resolution Performance: A Comparative Analysis of SRGAN and ESRGAN Techniques for Single Image Restoration," in 2024 Intelligent Systems and Machine Learning Conference (ISML), Hyderabad, India, May 2024, pp. 128–134. DOI: https://doi.org/10.1109/ISML60050.2024.11007359
S. Das, K. Ma, Z. Shu, D. Samaras, and R. Shilkrot, "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), Oct. 2019, pp. 131–140. DOI: https://doi.org/10.1109/ICCV.2019.00022
J. Pan, Z. Hu, Z. Su, and M. H. Yang, "Deblurring Text Images via L0-Regularized Intensity and Gradient Prior," in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, Jun. 2014, pp. 2901–2908. DOI: https://doi.org/10.1109/CVPR.2014.371
W. Xiong, X. Jia, J. Xu, Z. Xiong, M. Liu, and J. Wang, "Historical document image binarization using background estimation and energy minimization," in 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, Aug. 2018, pp. 3716–3721. DOI: https://doi.org/10.1109/ICPR.2018.8546099
K. Ntirogiannis, B. Gatos, and I. Pratikakis, "ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014)," in 2014 14th International Conference on Frontiers in Handwriting Recognition, Hersonissos, Greece, Sep. 2014, pp. 809–813. DOI: https://doi.org/10.1109/ICFHR.2014.141
A. Hettiarachchi, S. Rathnayake, and K. Dissanayaka, "A Generative Adversarial Network to Upscale the Resolution of Low-Resolution Galaxy Images," in 2024 6th International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka, Dec. 2024, pp. 55–60. DOI: https://doi.org/10.1109/ICAC64487.2024.10851119
S. Mashhadani, W. H. Abdulsalam, I. Alhakam, O. A. Hassen, and S. M. Darwish, "An Enhanced Document Source Identification System for Printer Forensic Applications based on the Boosted Quantum KNN Classifier," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19983–19991, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9420
P. Dubey, "pushplatadubey/Multi-Staged-Document-Enhancement." Oct. 13, 2025, [Online]. Available: https://github.com/pushplatadubey/Multi-Staged-Document-Enhancement.
Downloads
How to Cite
License
Copyright (c) 2025 Pushplata Dubey, D. R. Sashikumar

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
