IndoVetBERT: A Domain-Adaptive Transformer for Indonesian Veterinary Clinical Text Classification

Agus Fatkhurohman; Nur Rokhman; Sri Mulyana; Ida Tjahajati

doi:10.48084/etasr.19231

Authors

Agus Fatkhurohman Department of Computer Science and Electronics, Faculty of Mathematics and Natural Science, Universitas Gadjah Mada, Indonesia | Faculty of Computer Science, Universitas Amikom Yogyakarta, Indonesia https://orcid.org/0009-0007-7554-2153
Nur Rokhman Department of Computer Science and Electronics, Faculty of Mathematics and Natural Science, Universitas Gadjah Mada, Indonesia
Sri Mulyana Department of Computer Science and Electronics, Faculty of Mathematics and Natural Science, Universitas Gadjah Mada, Indonesia
Ida Tjahajati Department of Bioresources Technology and Veterinary, Vocational College, Universitas Gadjah Mada, Indonesia | Department of Internal Medicine, Faculty of Veterinary Medicine, Universitas Gadjah Mada, Indonesia

Volume: 16 | Issue: 4 | Pages: 37387-37393 | August 2026 | https://doi.org/10.48084/etasr.19231

Received: 9 April 2026 | Revised: 19 May 2026 | Accepted: 24 May 2026 | Online: 9 June 2026

Corresponding author: Nur Rokhman

Abstract

Medical Health Records (MHRs) contain structured and unstructured data, such as textual information that supports the diagnostic process, informs clinical decision-making, and improves healthcare efficiency. This textual information is often overlooked compared to structured data due to its high dimensionality. Natural Language Processing (NLP) is a powerful tool for analyzing text data and classifying clinical text. Veterinary MHRs are more difficult to understand than human MHRs due to the limited number of datasets, the mixed use of English and Latin, and the lack of a robust ontology standard. To overcome language and domain gaps in prior models, this study proposes IndoVetBERT, a transformer-based model for analyzing veterinary MHRs in Indonesia. Compared to baseline models mBERT, XLM-R, and IndoBERT, IndoVetBERT achieved an accuracy of 87%. The proposed model effectively handles veterinary clinical narratives and captures veterinary-specific terminology, clinical reasoning patterns, and diagnostic clues in veterinary records in Indonesia.

Keywords:

Natural Language Processing (NLP), clinical text classification, veterinary, IndoVetBERT, transformer based model

References

[1] D. Khurana, A. Koli, K. Khatter, and S. Singh, "Natural Language Processing: State of the Art, Current Trends and Challenges," Multimedia Tools and Applications, vol. 82, no. 3, pp. 3713–3744, Jan. 2023.

[2] H. C. Tissot et al., "Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2950–2959, Oct. 2020.

[3] S. Mulyana, S. Hartati, R. Wardoyo, and Subandi, "A Processing Model Using Natural Language Processing (NLP) for Narrative Text of Medical Record for Producing Symptoms of Mental Disorders," in 2019 Fourth International Conference on Informatics and Computing, Oct. 2019, pp. 1–6.

[4] E. H. Houssein, R. E. Mohamed, and A. A. Ali, "Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review," IEEE Access, vol. 9, pp. 140628–140653, 2021.

[5] S. A. Hasan and O. Farri, "Clinical Natural Language Processing with Deep Learning," in Data Science for Healthcare, S. Consoli, D. Reforgiato Recupero, and M. Petković, Eds. Cham, Switzerland: Springer International Publishing, 2019, pp. 147–171.

[6] L. Stimmer et al., "Natural Language Processing in Veterinary Pathology: A Review," Veterinary Pathology, vol. 62, no. 6, pp. 829–848, Nov. 2025.

[7] C. M. Corcoran et al., "Language as a Biomarker for Psychosis: A Natural Language Processing Approach," Schizophrenia Research, vol. 226, pp. 158–166, Dec. 2020.

[8] P. López-Úbeda, T. Martín-Noguerol, J. Aneiros-Fernández, and A. Luna, "Natural Language Processing in Pathology," The American Journal of Pathology, vol. 192, no. 11, pp. 1486–1495, Nov. 2022.

[9] B. Hur et al., "Using Natural Language Processing and Patient Journey Clustering for Temporal Phenotyping of Antimicrobial Therapies for Cat Bite Abscesses," Preventive Veterinary Medicine, vol. 223, Feb. 2024, Art. no. 106112.

[10] P. Seethalakshmi, D. R. Ch, and K. Swaroopa, "Enhanced NLP for Medical Text Classification: A Deep Active Learning Approach," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27710–27714, Oct. 2025.

[11] R. M. Anholt, J. Berezowski, I. Jamal, C. Ribble, and C. Stephen, "Mining Free-Text Medical Records for Companion Animal Enteric Syndrome Surveillance," Preventive Veterinary Medicine, vol. 113, no. 4, pp. 417–422, Mar. 2014.

[12] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text Classification Algorithms: A Survey," Information, vol. 10, no. 4, Apr. 2019, Art. no. 150.

[13] J. Liu, D. Capurro, A. Nguyen, and K. Verspoor, "'Note Bloat' Impacts Deep Learning-Based NLP Models for Clinical Prediction Tasks," Journal of Biomedical Informatics, vol. 133, Sep. 2022, Art. no. 104149.

[14] P. Mukherjee, R. S. Gokul, S. Sadhukhan, M. Godse, and B. Chakraborty, "Detection of Autism Spectrum Disorder (ASD) from Natural Language Text using BERT and ChatGPT Models," International Journal of Advanced Computer Science and Applications, vol. 14, no. 10, 2023.

[15] K. Huang, J. Altosaar, and R. Ranganath, "ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission." arXiv, 2019.

[16] K. M. Chaitrashree, T. N. Sneha, S. R. Tanushree, G. R. Usha, and T. C. Pramod, "Unstructured Medical Text Classification using Machine Learning and Deep Learning Approaches," in 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology, Bangalore, India, Aug. 2021, pp. 429–433.

[17] A. L. Lezama-Sánchez, M. Tovar Vidal, and J. A. Reyes-Ortiz, "Integrating Text Classification in Topic Discovery with Semantic Embedding Models." Computer Science and Mathematics, May 12, 2023.

[18] A. Turchin, S. Masharsky, and M. Zitnik, "Comparison of BERT Implementations for Natural Language Processing of Narrative Medical Documents," Informatics in Medicine Unlocked, vol. 36, 2023, Art. no. 101139.

[19] E. Alsentzer et al., "Publicly Available Clinical BERT Embeddings." arXiv, 2019.

[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2019, pp. 4171–4186.

[21] S. Fu et al., "Clinical Concept Extraction: A Methodology Review," Journal of Biomedical Informatics, vol. 109, Sep. 2020, Art. no. 103526.

[22] G. Mujtaba et al., "Clinical Text Classification Research Trends: Systematic Literature Review and Open Issues," Expert Systems with Applications, vol. 116, pp. 494–520, Feb. 2019.

[23] A. Rajkomar et al., "Scalable and Accurate Deep Learning with Electronic Health Records," npj Digital Medicine, vol. 1, no. 1, May 2018, Art. no. 18.

[24] Y. Wang et al., "Clinical Information Extraction Applications: A Literature Review," Journal of Biomedical Informatics, vol. 77, pp. 34–49, Jan. 2018.

[25] A. E. W. Johnson et al., "MIMIC-III, A Freely Accessible Critical Care Database," Scientific Data, vol. 3, no. 1, May 2016, Art. no. 160035.