Word Recognition in Degraded Historical Documents Using Deep Neural Networks

Authors

  • B. K. Rajithkumar Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India
  • H. S. Mohana Department of Computer Science and Engineering, Rajeev Institute of Technology, Hassan, India
  • B. V. Uma Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India
  • M. Govinda Raju Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India
Volume: 16 | Issue: 1 | Pages: 31519-31524 | February 2026 | https://doi.org/10.48084/etasr.15235

Abstract

Document Image Analysis (DIA) converts pixel-based document images into machine-readable formats. While text recognition in printed documents generally achieves high accuracy due to their consistent structure and minimal variation, historical documents such as handwritten manuscripts and stone inscriptions present unique challenges, including script variability, skew, and degradation that may impact legibility. This study introduces a deep neural network approach to improve word recognition in degraded historical documents. By integrating decoding and deslanting methods, the proposed model achieves an accuracy of 90.8%, underscoring its effectiveness in addressing degradation and variability in historical document images.

Keywords:

handwriting recognition, neural networks, document image analysis, machine learning, computer vision

Downloads

Download data is not yet available.

References

B. K. Rajithkumar, B. V. Uma, and H. S. Mohana, "A Hybrid CNN-RNN Model for Automated Recognition of Kannada Characters in Ancient Inscriptions," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18423–18428, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8602

M. R. Gupta, N. P. Jacobson, and E. K. Garcia, "OCR Binarization and Image Pre-Processing for Searching Historical Documents," Pattern Recognition, vol. 40, no. 2, pp. 389–397, Feb. 2007. DOI: https://doi.org/10.1016/j.patcog.2006.04.043

J. Martínek, L. Lenc, and P. Král, "Building an Efficient OCR System for Historical Documents with Little Training Data," Neural Computing and Applications, vol. 32, no. 23, pp. 17209–17227, Dec. 2020. DOI: https://doi.org/10.1007/s00521-020-04910-x

R. Manmatha and N. Srimal, "Scale Space Technique for Word Segmentation in Handwritten Documents," in Scale-Space Theories in Computer Vision, M. Nielsen, P. Johansen, O. F. Olsen, and J. Weickert, Eds. Berlin, Heidelberg, Germany: Springer Berlin Heidelberg, 1999, vol. 1682, pp. 22–33. DOI: https://doi.org/10.1007/3-540-48236-9_3

G. Chen, Q. Chen, X. Zhu, and Y. Chen, "A Study of Historical Documents Denoising," in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Shanghai, China, Oct. 2017, pp. 1–4. DOI: https://doi.org/10.1109/CISP-BMEI.2017.8301947

S. Bhardwaj, "Convolutional Neural Networks: Understand the Basics," Analytics Vidhya, Jun. 2021. https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-understand-the-basics/.

S. P. Sharan, S. Aitha, A. Kumar, A. Trivedi, A. Augustine, and R. K. Sarvadevabhatla, "Palmira: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts," in Document Analysis and Recognition – ICDAR 2021, vol. 12822, J. Lladós, D. Lopresti, and S. Uchida, Eds. Cham, Switzerland: Springer International Publishing, 2021, pp. 477–491. DOI: https://doi.org/10.1007/978-3-030-86331-9_31

M. J. Brown, M. Dai, C. Yang, and R. Ingle, "Experiments With Early Modern Manuscripts and Computer-Aided Transcription," Folger Shakespeare Library, Sept. 2018. https://www.folger.edu/blogs/collation/computer-aided-transcription/.

R. L. Kshetry, "Image Preprocessing and Modified Adaptive Thresholding for Improving OCR," SSRN Electronic Journal, 2022. DOI: https://doi.org/10.2139/ssrn.4135966

Mande Shen and Hansheng Lei, "Improving OCR Performance with Background Image Elimination," in 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery, Zhangjiajie, China, Aug. 2015, pp. 1566–1570. DOI: https://doi.org/10.1109/FSKD.2015.7382178

T. Blanke, M. Bryant, and M. Hedges, "Open Source Optical Character Recognition for Historical Research," Journal of Documentation, vol. 68, no. 5, pp. 659–683, Aug. 2012. DOI: https://doi.org/10.1108/00220411211256021

B. J. Bipin Nair, N. Shobharani, N. R. Sreekumar, and G. Ashok, "A Two Phase Denoising Approach to Remove Uneven illumination From Ancient Note Book Images," in 2021 7th International Conference on Advanced Computing and Communication Systems, Coimbatore, India, Mar. 2021, pp. 1563–1568. DOI: https://doi.org/10.1109/ICACCS51430.2021.9441911

K. Saddami, K. Munadi, Y. Away, and F. Arnia, "Effective and Fast Binarization Method for Combined Degradation on Ancient Documents," Heliyon, vol. 5, no. 10, Oct. 2019, Art. no. e02613. DOI: https://doi.org/10.1016/j.heliyon.2019.e02613

C. Tensmeyer and T. Martinez, "Historical Document Image Binarization: A Review," SN Computer Science, vol. 1, no. 3, May 2020, Art. no. 173. DOI: https://doi.org/10.1007/s42979-020-00176-1

J. A. Sánchez, V. Romero, A. H. Toselli, M. Villegas, and E. Vidal, "A Set of Benchmarks for Handwritten Text Recognition on Historical Documents," Pattern Recognition, vol. 94, pp. 122–134, Oct. 2019. DOI: https://doi.org/10.1016/j.patcog.2019.05.025

M. Almeida, R. Lins, R. Bernardino, D. Jesus, and B. Lima, "A New Binarization Algorithm for Historical Documents," Journal of Imaging, vol. 4, no. 2, Jan. 2018, Art. no. 27. DOI: https://doi.org/10.3390/jimaging4020027

W. Xiong, L. Zhou, L. Yue, L. Li, and S. Wang, "An Enhanced Binarization Framework for Degraded Historical Document Images," EURASIP Journal on Image and Video Processing, vol. 2021, no. 1, Dec. 2021, Art. no. 13. DOI: https://doi.org/10.1186/s13640-021-00556-4

S. Lu and C. L. Tan, "Script and Language Identification in Noisy and Degraded Document Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 14–24, Jan. 2008. DOI: https://doi.org/10.1109/TPAMI.2007.1158

A. Sakila and S. Vijayarani, "Multi-Script Language Identification From Document Images," International Research Journal of Modernization in Engineering Technology and Science, vol. 3, no. 1, pp. 1292–1304, 2021.

J. Memon, M. Sami, R. A. Khan, and M. Uddin, "Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)," IEEE Access, vol. 8, pp. 142642–142668, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3012542

H. Michalak and K. Okarma, "Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition," Sensors, vol. 20, no. 10, May 2020, Art. no. 2914. DOI: https://doi.org/10.3390/s20102914

A. Vinciarelli and J. Luettin, "A New Normalization Technique for Cursive Handwritten Words," Pattern Recognition Letters, vol. 22, no. 9, pp. 1043–1050, July 2001. DOI: https://doi.org/10.1016/S0167-8655(01)00042-3

S. K. H. R and R. A. G, "Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script." arXiv, Jan. 02, 2019.

G. S. Monisha and S. Malathi, "Effective Survey on Handwriting Character Recognition," in Computational Methods and Data Engineering, vol. 1257, V. Singh, V. K. Asari, S. Kumar, and R. B. Patel, Eds. Singapore: Springer Singapore, 2021, pp. 115–131. DOI: https://doi.org/10.1007/978-981-15-7907-3_9

"Transcribe Bentham Dataset." UCL Transcribe Bentham Project, University College London, 2021, [Online]. Available: https://blogs.ucl.ac.uk/transcribe-bentham/.

"Saint Gall Dataset." e-Codices: Virtual Manuscript Library of Switzerland, University of Fribourg, 2013, [Online]. Available: https://www.e-codices.unifr.ch/en.

Downloads

How to Cite

[1]
B. K. Rajithkumar, H. S. Mohana, B. V. Uma, and M. G. Raju, “Word Recognition in Degraded Historical Documents Using Deep Neural Networks”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 1, pp. 31519–31524, Feb. 2026.

Metrics

Abstract Views: 127
PDF Downloads: 101

Metrics Information