Implementation of Preprocessing Techniques for Precise Classification of Ancient Kannada Epigraphs
Received: 17 March 2025 | Revised: 17 April 2025 and 25 April 2025 | Accepted: 27 April 2025 | Online: 4 June 2025
Corresponding author: L. S. Anusha
Abstract
The Dravidian language Kannada is most spoken in the state of Karnataka, and due to its extensive library of epigraphs, including old manuscripts and inscriptions, it is regarded as a repository of knowledge. To make this knowledge more accessible, efforts are underway to digitize documents for optimized usage and storage using Optical Character Recognition (OCR). However, often these epigraphs are in poor condition, and the quality of the image being fed to the OCR model may not be good enough to achieve high accuracy of recognition and classification. Preprocessing techniques are used to improve dataset quality. Preprocessing methods, including binarization, smoothing, edge detection, and segmentation, help to increase the model's interpretability, decrease overfitting, and train it more quickly and with fewer resources. When applied to epigraphs, these preprocessing approaches significantly increase image quality and minimize noise, making it easier to identify and digitize the text.
Keywords:
data preprocessing, binarization, edge detection, segmentation, smoothingDownloads
References
S. N. Karnam, V. S. Vaddagallaiah, P. K. Rangnaik, A. Kumar, C. Kumar, and B. M. Vishwanath, "Precised Cashew Classification Using Machine Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17414–17421, Oct. 2024.
S. Chadha, S. Mittal, and V. Singhal, "Ancient Text Character Recognition Using Deep Learning," International Journal of Engineering Research and Technology, vol. 13, no. 9, Sep. 2020, Art. no. 2177.
G. Kumar, P. K. Bhatia, and Indu, "Analytical Review of Preprocessing Techniques for Offline Handwritten Character Recognition," 2013.
P. Mishra, P. Pai, M. Patel, and R. Sonkusare, "Extraction of Information from Handwriting using Optical Character recognition and Neural Networks," in 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, Nov. 2020, pp. 1328–1333.
J. Memon, M. Sami, R. A. Khan, and M. Uddin, "Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)," IEEE Access, vol. 8, pp. 142642–142668, 2020.
R. Dey, R. C. Balabantaray, S. Mohanty, D. Singh, M. Karuppiah, and D. Samanta, "Approach for Preprocessing in Offline Optical Character Recognition (OCR)," in 2022 Interdisciplinary Research in Technology and Management (IRTM), Kolkata, India, Feb. 2022, pp. 1–6.
S. Joseph and J. George, "Handwritten Character Recognition of MODI Script using Convolutional Neural Network Based Feature Extraction Method and Support Vector Machine Classifier," in 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, Oct. 2020, pp. 32–36.
M. Brisinello, R. Grbic, M. Pul, and T. Andelic, "Improving optical character recognition performance for low quality images," in 2017 International Symposium ELMAR, Zadar, Croatia, Sep. 2017, pp. 167–171.
G. V. S. S. K. R. Naganjaneyulu, A. V. Narasimhadhan, and K. Venkatesh, "Performance evaluation of OCR on poor resolution text document images using different pre processing steps," in TENCON 2014 - 2014 IEEE Region 10 Conference, Bangkok, Thailand, Oct. 2014, pp. 1–4.
W. Bieniecki, S. Grabowski, and W. Rozenberg, "Image Preprocessing for Improving OCR Accuracy," in 2007 International Conference on Perspective Technologies and Methods in MEMS Design, Lviv, Ukraine, May 2007, pp. 75–80.
C. Chinara, N. Nath, S. Mishra, S. K. Sahoo, and F. A. Ali, "A novel approach to skew-detection and correction of English alphabets for OCR," in 2012 IEEE Student Conference on Research and Development (SCOReD), Pulau Pinang, Malaysia, Dec. 2012, pp. 241–244.
M. Sarfraz, A. Zidouri, and S. A. Shahab, "A Novel Approach for Skew Estimation of Document Images in OCR System," in International Conference on Computer Graphics, Imaging and Visualization (CGIV’05), Beijing, China, pp. 175–180.
Md. M. Hasan, H. Ali, Md. F. Hossain, and S. Abujar, "Preprocessing of Continuous Bengali Speech for Feature Extraction," in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, Jul. 2020, pp. 1–4.
A. V. Kottath and S. Shri Bharathi, "Image Preprocessing Techniques in Skin Diseases Prediction using Deep Learning: A Review," in 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, Sep. 2022, pp. 1–6.
M. I. Shah and C. Y. Suen, "Word Spotting in Gray Scale Handwritten Pashto Documents," in 2010 12th International Conference on Frontiers in Handwriting Recognition, Kolkata, India, Nov. 2010, pp. 136–141.
P. Ranjitha and T. D. Shreelakshmi, "A Hybrid Ostu based Niblack Binarization for Degraded Image Documents," in 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India, May 2021, pp. 1–7.
B. Vishnudharan and K. Anusudha, "Reclamation of information from degraded and damaged document images by image binarization method," in 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, Mar. 2017, pp. 1–4.
R. G. G. Acuna, Junli Tao, and R. Klette, "Generalization of Otsu’s binarization into recursive colour image segmentation," in 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, Nov. 2015, pp. 1–6.
S. Wang, K. Ma, and G. Wu, "Edge Detection of Noisy Images Based on Improved Canny and Morphology," in 2021 IEEE 3rd Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, Oct. 2021, pp. 247–251.
Y. Li and B. Liu, "Improved edge detection algorithm for canny operator," in 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, Jun. 2022, pp. 1–5.
P. Li, H. Wang, M. Yu, and Y. Li, "Overview of Image Smoothing Algorithms," Journal of Physics: Conference Series, vol. 1883, no. 1, Apr. 2021, Art. no. 012024.
B. Gatos, G. Louloudis, and N. Stamatopoulos, "Segmentation of Historical Handwritten Documents into Text Zones and Text Lines," in 2014 14th International Conference on Frontiers in Handwriting Recognition, Greece, Sep. 2014, pp. 464–469.
K. Mullick, S. Banerjee, and U. Bhattacharya, "An efficient line segmentation approach for handwritten Bangla document image," in 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), Kolkata, India, Jan. 2015, pp. 1–6.
R. J. Shah and T. V. Ratanpara, "Challenges of broken characters in character segmentation method for Gujarati printed documents," in 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, Mar. 2015, pp. 1–5.
A. A. A. Ali and M. Suresha, "An Efficient Character Segmentation Algorithm for Recognition of Arabic Handwritten Script," in 2019 International Conference on Data Science and Communication (IconDSC), Bangalore, India, Mar. 2019, pp. 1–6.
Downloads
How to Cite
License
Copyright (c) 2025 L. S. Anusha, Abhay Anandrao Deshpande

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.