Implementation of Preprocessing Techniques for Precise Classification of Ancient Kannada Epigraphs

L. S. Anusha; Abhay Anandrao Deshpande

doi:10.48084/etasr.10996

Authors

L. S. Anusha Department of ECE, RV College of Engineering, Bangalore, India | Visvesvaraya Technological University, Belagavi, Karnataka, India
Abhay Anandrao Deshpande Department of ECE, RV College of Engineering, Bangalore, India | Visvesvaraya Technological University, Belagavi, Karnataka, India

Volume: 15 | Issue: 3 | Pages: 23592-23598 | June 2025 | https://doi.org/10.48084/etasr.10996

Received: 17 March 2025 | Revised: 17 April 2025 and 25 April 2025 | Accepted: 27 April 2025 | Online: 4 June 2025

Corresponding author: L. S. Anusha

Abstract

The Dravidian language Kannada is most spoken in the state of Karnataka, and due to its extensive library of epigraphs, including old manuscripts and inscriptions, it is regarded as a repository of knowledge. To make this knowledge more accessible, efforts are underway to digitize documents for optimized usage and storage using Optical Character Recognition (OCR). However, often these epigraphs are in poor condition, and the quality of the image being fed to the OCR model may not be good enough to achieve high accuracy of recognition and classification. Preprocessing techniques are used to improve dataset quality. Preprocessing methods, including binarization, smoothing, edge detection, and segmentation, help to increase the model's interpretability, decrease overfitting, and train it more quickly and with fewer resources. When applied to epigraphs, these preprocessing approaches significantly increase image quality and minimize noise, making it easier to identify and digitize the text.

Keywords:

data preprocessing, binarization, edge detection, segmentation, smoothing

Downloads

Download data is not yet available.

References

S. N. Karnam, V. S. Vaddagallaiah, P. K. Rangnaik, A. Kumar, C. Kumar, and B. M. Vishwanath, "Precised Cashew Classification Using Machine Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17414–17421, Oct. 2024. DOI: https://doi.org/10.48084/etasr.8052

S. Chadha, S. Mittal, and V. Singhal, "Ancient Text Character Recognition Using Deep Learning," International Journal of Engineering Research and Technology, vol. 13, no. 9, Sep. 2020, Art. no. 2177. DOI: https://doi.org/10.37624/IJERT/13.9.2020.2177-2184

G. Kumar, P. K. Bhatia, and Indu, "Analytical Review of Preprocessing Techniques for Offline Handwritten Character Recognition," 2013.

P. Mishra, P. Pai, M. Patel, and R. Sonkusare, "Extraction of Information from Handwriting using Optical Character recognition and Neural Networks," in 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, Nov. 2020, pp. 1328–1333. DOI: https://doi.org/10.1109/ICECA49313.2020.9297418

J. Memon, M. Sami, R. A. Khan, and M. Uddin, "Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)," IEEE Access, vol. 8, pp. 142642–142668, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3012542

R. Dey, R. C. Balabantaray, S. Mohanty, D. Singh, M. Karuppiah, and D. Samanta, "Approach for Preprocessing in Offline Optical Character Recognition (OCR)," in 2022 Interdisciplinary Research in Technology and Management (IRTM), Kolkata, India, Feb. 2022, pp. 1–6. DOI: https://doi.org/10.1109/IRTM54583.2022.9791698

S. Joseph and J. George, "Handwritten Character Recognition of MODI Script using Convolutional Neural Network Based Feature Extraction Method and Support Vector Machine Classifier," in 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, Oct. 2020, pp. 32–36. DOI: https://doi.org/10.1109/ICSIP49896.2020.9339435

M. Brisinello, R. Grbic, M. Pul, and T. Andelic, "Improving optical character recognition performance for low quality images," in 2017 International Symposium ELMAR, Zadar, Croatia, Sep. 2017, pp. 167–171. DOI: https://doi.org/10.23919/ELMAR.2017.8124460

G. V. S. S. K. R. Naganjaneyulu, A. V. Narasimhadhan, and K. Venkatesh, "Performance evaluation of OCR on poor resolution text document images using different pre processing steps," in TENCON 2014 - 2014 IEEE Region 10 Conference, Bangkok, Thailand, Oct. 2014, pp. 1–4. DOI: https://doi.org/10.1109/TENCON.2014.7022357

W. Bieniecki, S. Grabowski, and W. Rozenberg, "Image Preprocessing for Improving OCR Accuracy," in 2007 International Conference on Perspective Technologies and Methods in MEMS Design, Lviv, Ukraine, May 2007, pp. 75–80. DOI: https://doi.org/10.1109/MEMSTECH.2007.4283429

C. Chinara, N. Nath, S. Mishra, S. K. Sahoo, and F. A. Ali, "A novel approach to skew-detection and correction of English alphabets for OCR," in 2012 IEEE Student Conference on Research and Development (SCOReD), Pulau Pinang, Malaysia, Dec. 2012, pp. 241–244. DOI: https://doi.org/10.1109/SCOReD.2012.6518646

M. Sarfraz, A. Zidouri, and S. A. Shahab, "A Novel Approach for Skew Estimation of Document Images in OCR System," in International Conference on Computer Graphics, Imaging and Visualization (CGIV’05), Beijing, China, pp. 175–180. DOI: https://doi.org/10.1109/CGIV.2005.6

Md. M. Hasan, H. Ali, Md. F. Hossain, and S. Abujar, "Preprocessing of Continuous Bengali Speech for Feature Extraction," in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, Jul. 2020, pp. 1–4. DOI: https://doi.org/10.1109/ICCCNT49239.2020.9225469

A. V. Kottath and S. Shri Bharathi, "Image Preprocessing Techniques in Skin Diseases Prediction using Deep Learning: A Review," in 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, Sep. 2022, pp. 1–6. DOI: https://doi.org/10.1109/ICIRCA54612.2022.9985547

M. I. Shah and C. Y. Suen, "Word Spotting in Gray Scale Handwritten Pashto Documents," in 2010 12th International Conference on Frontiers in Handwriting Recognition, Kolkata, India, Nov. 2010, pp. 136–141. DOI: https://doi.org/10.1109/ICFHR.2010.28

P. Ranjitha and T. D. Shreelakshmi, "A Hybrid Ostu based Niblack Binarization for Degraded Image Documents," in 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India, May 2021, pp. 1–7. DOI: https://doi.org/10.1109/INCET51464.2021.9456150

B. Vishnudharan and K. Anusudha, "Reclamation of information from degraded and damaged document images by image binarization method," in 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, Mar. 2017, pp. 1–4. DOI: https://doi.org/10.1109/ICIIECS.2017.8276100

R. G. G. Acuna, Junli Tao, and R. Klette, "Generalization of Otsu’s binarization into recursive colour image segmentation," in 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, Nov. 2015, pp. 1–6. DOI: https://doi.org/10.1109/IVCNZ.2015.7761549

S. Wang, K. Ma, and G. Wu, "Edge Detection of Noisy Images Based on Improved Canny and Morphology," in 2021 IEEE 3rd Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, Oct. 2021, pp. 247–251. DOI: https://doi.org/10.1109/ECICE52819.2021.9645601

Y. Li and B. Liu, "Improved edge detection algorithm for canny operator," in 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, Jun. 2022, pp. 1–5. DOI: https://doi.org/10.1109/ITAIC54216.2022.9836608

P. Li, H. Wang, M. Yu, and Y. Li, "Overview of Image Smoothing Algorithms," Journal of Physics: Conference Series, vol. 1883, no. 1, Apr. 2021, Art. no. 012024. DOI: https://doi.org/10.1088/1742-6596/1883/1/012024

B. Gatos, G. Louloudis, and N. Stamatopoulos, "Segmentation of Historical Handwritten Documents into Text Zones and Text Lines," in 2014 14th International Conference on Frontiers in Handwriting Recognition, Greece, Sep. 2014, pp. 464–469. DOI: https://doi.org/10.1109/ICFHR.2014.84

K. Mullick, S. Banerjee, and U. Bhattacharya, "An efficient line segmentation approach for handwritten Bangla document image," in 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), Kolkata, India, Jan. 2015, pp. 1–6. DOI: https://doi.org/10.1109/ICAPR.2015.7050679

R. J. Shah and T. V. Ratanpara, "Challenges of broken characters in character segmentation method for Gujarati printed documents," in 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, Mar. 2015, pp. 1–5. DOI: https://doi.org/10.1109/ICIIECS.2015.7193263

A. A. A. Ali and M. Suresha, "An Efficient Character Segmentation Algorithm for Recognition of Arabic Handwritten Script," in 2019 International Conference on Data Science and Communication (IconDSC), Bangalore, India, Mar. 2019, pp. 1–6. DOI: https://doi.org/10.1109/IconDSC.2019.8817037

Implementation of Preprocessing Techniques for Precise Classification of Ancient Kannada Epigraphs

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License