A Baseline Evaluation of OCR Segmentation and Classification Methods for Printed Javanese Script

Authors

  • Anastasia Rita Widiarti Informatics Department, Sanata Dharma University, Yogyakarta, Indonesia
  • Susilawati Endah Peni Adji Department of Indonesian Literature, Sanata Dharma University, Yogyakarta, Indonesia
  • Fransisca Tjandrasih Adji Department of Indonesian Literature, Sanata Dharma University, Yogyakarta, Indonesia
  • Gerardus Kristha Bayu Indraputra Informatics Department, Sanata Dharma University, Yogyakarta, Indonesia
  • Yulius Agung Trisnanto Informatics Department, Sanata Dharma University, Yogyakarta, Indonesia
  • FX Bima Yudha Pratama Informatics Department, Sanata Dharma University, Yogyakarta, Indonesia
Volume: 16 | Issue: 1 | Pages: 31699-31705 | February 2026 | https://doi.org/10.48084/etasr.15502

Abstract

Optical Character Recognition (OCR) for Javanese script remains challenging due to its complex glyph structure and overlapping components. In this context, this study presents a pilot investigation into the segmentation and transliteration performance of printed Javanese text using a single-page dataset. The workflow begins with preprocessing, followed by segmentation and script-level classification using the k-Nearest Neighbor (k-NN) algorithm. Experiments were conducted with 91 and 281 scripts, utilizing 100 and 500 samples per script to evaluate system performance. As a result, the segmentation stage achieved an average accuracy of 63.5%, with some lines reaching above 80%. Moving to transliteration, accuracy comparisons were performed between segmentation output, model predictions, and ground truth, yielding average accuracies of 61.4%, 72.3%, and 39.2%, respectively. Further experiments using 3-NN and 11-NN configurations demonstrated that increasing training samples and script diversity improved recognition accuracy, achieving up to 68.75% on certain lines. This research provides an initial benchmark dataset and a systematic evaluation framework, establishing a baseline that bridges the gap between handwritten and printed OCR research. The findings offer empirical insights for developing robust OCR systems to support the digital preservation of Indonesia's written cultural heritage.

Keywords:

Optical Character Recognition (OCR), k-Nearest Neighbor (k-NN), printed Javanese text, segmentation

Downloads

Download data is not yet available.

References

H. Alwi, "Kebijakan Bahasa Daerah," in Bahasa Daerah dan Otonomi Daerah, H. Alwi and A. R. Zaidan, Eds. Jakarta, Indonesia: Pusat Bahasa, 2001, pp. 38–47.

A. R. Widiarti, A. Harjoko, Marsono, and S. Hartati, "Preprocessing Model of Manuscripts in Javanese Characters," Journal of Signal and Information Processing, vol. 5, no. 4, pp. 112–122, Oct. 2014. DOI: https://doi.org/10.4236/jsip.2014.54014

A. R. Widiarti, "Model Transliterasi Otomatis Citra Naskah Aksara Jawa," Ph.D. dissertation, Computer Science Department, Universitas Gadjah Mada, Yogyakarta, Indonesia, 2015.

A. W. Mahastama and L. D. Krisnawati, "Improving Projection Profile for Segmenting Characters from Javanese Manuscripts," in Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology, Yogyakarta, Indonesia, 2019, pp. 77–82. DOI: https://doi.org/10.5220/0008526900770082

Y. Sugianela and N. Suciati, "Character Image Segmentation of Javanese Script using Connected Component Method," Jurnal Ilmu Komputer dan Informasi, vol. 12, no. 2, pp. 67–74, July 2019. DOI: https://doi.org/10.21609/jiki.v12i2.677

A. F. Ganai and A. Koul, "Projection profile based ligature segmentation of Nastaleeq Urdu OCR," in 2016 4th International Symposium on Computational and Business Intelligence, Olten, Switzerland, 2016, pp. 170–175. DOI: https://doi.org/10.1109/ISCBI.2016.7743278

T. A. Tofiq and J. A. Hussein, "Kurdish Text Segmentation using Projection-Based Approaches," UHD Journal of Science and Technology, vol. 5, no. 1, pp. 56–65, May 2021. DOI: https://doi.org/10.21928/uhdjst.v5n1y2021.pp56-65

L. Liliana, S. M. Soephomo, G. S. Budhi, and R. Adipranata, "Segmentation of Hanacaraka Characters Using Double Projection Profile and Hough Transform," in Big Data Technologies and Applications: 8th International Conference, BDTA 2017, Gwangju, South Korea, 2017, pp. 29–37. DOI: https://doi.org/10.1007/978-3-319-98752-1_4

A. Susanto, I. U. W. Mulyono, C. A. Sari, E. H. Rachmawanto, and R. R. Ali, "Javanese Character Recognition Based on K-Nearest Neighbor and Linear Binary Pattern Features," Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, vol. 7, no. 3, pp. 309–316, Aug. 2022. DOI: https://doi.org/10.22219/kinetik.v7i3.1491

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2000.

A. Rahman and M. N. A. Khan, “A Classification Based Model to Assess Customer Behavior in Banking Sector,” Engineering, Technology & Applied Science Research, vol. 8, no. 3, pp. 2949–2953, June 2018. DOI: https://doi.org/10.48084/etasr.1917

K. F. Holle, Hamong Tani (in Javanese, translated from Dutch by F. L. Winter). Batavia, Indonesia: Landsdrukkerij, 1876.

Z. N. Khudhair et al., "Color to Grayscale Image Conversion Based on Singular Value Decomposition," IEEE Access, vol. 11, pp. 54629–54638, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3279734

J. Sauvola and M. Pietikäinen, "Adaptive document image binarization," Pattern Recognition, vol. 33, no. 2, pp. 225–236, Feb. 2000. DOI: https://doi.org/10.1016/S0031-3203(99)00055-2

G. K. Indraputra and A. R. Widiarti, "Implementation of 4-Directional Depth First Search and Projection Profile for Javanese Manuscript Image Segmentation," Journal of Informatics and Telecommunication Engineering, vol. 9, no. 1, pp. 218–228, July 2025.

M. Ramanan, "A Hybrid Approach for Skew Detection and Correction in the Multi-script Scanned Document," Asian Journal of Research in Computer Science, vol. 4, no. 2, pp. 1–8, Nov. 2019. DOI: https://doi.org/10.9734/ajrcos/2019/v4i230112

V. Chauhan and P. Malhotra, "Reduction of Noise in Restoration of Images Using Mean and Median Filtering Techniques," International Journal for Research in Applied Science and Engineering Technology, vol. 9, no. 9, pp. 301–313, Sept. 2021. DOI: https://doi.org/10.22214/ijraset.2021.37965

S. Anand and L. Priya, "Digital Image Fundamentals," in A Guide for Machine Vision in Quality Control, Boca Raton, FL, USA: Chapman and Hall/CRC, 2019. DOI: https://doi.org/10.1201/9781003002826

H. W. Herwanto, A. N. Handayani, K. L. Chandrika, and A. P. Wibawa, "Zoning Feature Extraction for Handwritten Javanese Character Recognition," in 2019 International Conference on Electrical, Electronics and Information Engineering, Denpasar, Indonesia, 2019, pp. 264–268. DOI: https://doi.org/10.1109/ICEEIE47180.2019.8981462

M. T. Graciello, A. N. Handayani, and A. P. Wibawa, "Optimization of Nglegena Javanese Script Recognition With Machine Learning Based on Zoning And Normalization of Feature Extraction," Indonesian Journal of Data and Science, vol. 6, no. 2, pp. 281–293, July 2025. DOI: https://doi.org/10.56705/ijodas.v6i2.256

M.-K. Hu, "Visual pattern recognition by moment invariants," IRE Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, Feb. 1962. DOI: https://doi.org/10.1109/TIT.1962.1057692

G. A. Robby, A. Tandra, I. Susanto, J. Harefa, and A. Chowanda, "Implementation of Optical Character Recognition using Tesseract with the Javanese Script Target in Android Application," Procedia Computer Science, vol. 157, pp. 499–505, Jan. 2019. DOI: https://doi.org/10.1016/j.procs.2019.09.006

M. Anandhalli, A. Tanuja, and P. Baligar, "Geometric invariant features for the detection and analysis of vehicle," Multimedia Tools and Applications, vol. 81, no. 23, pp. 33549–33567, Sept. 2022. DOI: https://doi.org/10.1007/s11042-022-12919-8

S. R. Basha, J. K. Rani, and J. J. C. P. Yadav, "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5001–5005, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3173

R. Basha, P. Pathak, M. Sudha, K. V. Soumya, and J. Arockia Venice, "Optimization of Quantum Dilated Convolutional Neural Networks: Image Recognition With Quantum Computing," Internet Technology Letters, vol. 8, no. 3, May 2025, Art. no. e70027. DOI: https://doi.org/10.1002/itl2.70027

Downloads

How to Cite

[1]
A. R. Widiarti, S. E. P. Adji, F. T. Adji, G. K. B. Indraputra, Y. A. Trisnanto, and F. B. Y. Pratama, “A Baseline Evaluation of OCR Segmentation and Classification Methods for Printed Javanese Script”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 1, pp. 31699–31705, Feb. 2026.

Metrics

Abstract Views: 85
PDF Downloads: 54

Metrics Information