IndoVetBERT: A Domain-Adaptive Transformer for Indonesian Veterinary Clinical Text Classification
Received: 9 April 2026 | Revised: 19 May 2026 | Accepted: 24 May 2026 | Online: 9 June 2026
Corresponding author: Nur Rokhman
Abstract
Medical Health Records (MHRs) contain structured and unstructured data, such as textual information that supports the diagnostic process, informs clinical decision-making, and improves healthcare efficiency. This textual information is often overlooked compared to structured data due to its high dimensionality. Natural Language Processing (NLP) is a powerful tool for analyzing text data and classifying clinical text. Veterinary MHRs are more difficult to understand than human MHRs due to the limited number of datasets, the mixed use of English and Latin, and the lack of a robust ontology standard. To overcome language and domain gaps in prior models, this study proposes IndoVetBERT, a transformer-based model for analyzing veterinary MHRs in Indonesia. Compared to baseline models mBERT, XLM-R, and IndoBERT, IndoVetBERT achieved an accuracy of 87%. The proposed model effectively handles veterinary clinical narratives and captures veterinary-specific terminology, clinical reasoning patterns, and diagnostic clues in veterinary records in Indonesia.
Keywords:
Natural Language Processing (NLP), clinical text classification, veterinary, IndoVetBERT, transformer based modelReferences
[1] D. Khurana, A. Koli, K. Khatter, and S. Singh, "Natural Language Processing: State of the Art, Current Trends and Challenges," Multimedia Tools and Applications, vol. 82, no. 3, pp. 3713–3744, Jan. 2023.
[2] H. C. Tissot et al., "Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2950–2959, Oct. 2020.
[3] S. Mulyana, S. Hartati, R. Wardoyo, and Subandi, "A Processing Model Using Natural Language Processing (NLP) for Narrative Text of Medical Record for Producing Symptoms of Mental Disorders," in 2019 Fourth International Conference on Informatics and Computing, Oct. 2019, pp. 1–6.
[4] E. H. Houssein, R. E. Mohamed, and A. A. Ali, "Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review," IEEE Access, vol. 9, pp. 140628–140653, 2021.
[5] S. A. Hasan and O. Farri, "Clinical Natural Language Processing with Deep Learning," in Data Science for Healthcare, S. Consoli, D. Reforgiato Recupero, and M. Petković, Eds. Cham, Switzerland: Springer International Publishing, 2019, pp. 147–171.
[6] L. Stimmer et al., "Natural Language Processing in Veterinary Pathology: A Review," Veterinary Pathology, vol. 62, no. 6, pp. 829–848, Nov. 2025.
[7] C. M. Corcoran et al., "Language as a Biomarker for Psychosis: A Natural Language Processing Approach," Schizophrenia Research, vol. 226, pp. 158–166, Dec. 2020.
[8] P. López-Úbeda, T. Martín-Noguerol, J. Aneiros-Fernández, and A. Luna, "Natural Language Processing in Pathology," The American Journal of Pathology, vol. 192, no. 11, pp. 1486–1495, Nov. 2022.
[9] B. Hur et al., "Using Natural Language Processing and Patient Journey Clustering for Temporal Phenotyping of Antimicrobial Therapies for Cat Bite Abscesses," Preventive Veterinary Medicine, vol. 223, Feb. 2024, Art. no. 106112.
[10] P. Seethalakshmi, D. R. Ch, and K. Swaroopa, "Enhanced NLP for Medical Text Classification: A Deep Active Learning Approach," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27710–27714, Oct. 2025.
[11] R. M. Anholt, J. Berezowski, I. Jamal, C. Ribble, and C. Stephen, "Mining Free-Text Medical Records for Companion Animal Enteric Syndrome Surveillance," Preventive Veterinary Medicine, vol. 113, no. 4, pp. 417–422, Mar. 2014.
[12] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text Classification Algorithms: A Survey," Information, vol. 10, no. 4, Apr. 2019, Art. no. 150.
[13] J. Liu, D. Capurro, A. Nguyen, and K. Verspoor, "'Note Bloat' Impacts Deep Learning-Based NLP Models for Clinical Prediction Tasks," Journal of Biomedical Informatics, vol. 133, Sep. 2022, Art. no. 104149.
[14] P. Mukherjee, R. S. Gokul, S. Sadhukhan, M. Godse, and B. Chakraborty, "Detection of Autism Spectrum Disorder (ASD) from Natural Language Text using BERT and ChatGPT Models," International Journal of Advanced Computer Science and Applications, vol. 14, no. 10, 2023.
[15] K. Huang, J. Altosaar, and R. Ranganath, "ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission." arXiv, 2019.
[16] K. M. Chaitrashree, T. N. Sneha, S. R. Tanushree, G. R. Usha, and T. C. Pramod, "Unstructured Medical Text Classification using Machine Learning and Deep Learning Approaches," in 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology, Bangalore, India, Aug. 2021, pp. 429–433.
[17] A. L. Lezama-Sánchez, M. Tovar Vidal, and J. A. Reyes-Ortiz, "Integrating Text Classification in Topic Discovery with Semantic Embedding Models." Computer Science and Mathematics, May 12, 2023.
[18] A. Turchin, S. Masharsky, and M. Zitnik, "Comparison of BERT Implementations for Natural Language Processing of Narrative Medical Documents," Informatics in Medicine Unlocked, vol. 36, 2023, Art. no. 101139.
[19] E. Alsentzer et al., "Publicly Available Clinical BERT Embeddings." arXiv, 2019.
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2019, pp. 4171–4186.
[21] S. Fu et al., "Clinical Concept Extraction: A Methodology Review," Journal of Biomedical Informatics, vol. 109, Sep. 2020, Art. no. 103526.
[22] G. Mujtaba et al., "Clinical Text Classification Research Trends: Systematic Literature Review and Open Issues," Expert Systems with Applications, vol. 116, pp. 494–520, Feb. 2019.
[23] A. Rajkomar et al., "Scalable and Accurate Deep Learning with Electronic Health Records," npj Digital Medicine, vol. 1, no. 1, May 2018, Art. no. 18.
[24] Y. Wang et al., "Clinical Information Extraction Applications: A Literature Review," Journal of Biomedical Informatics, vol. 77, pp. 34–49, Jan. 2018.
[25] A. E. W. Johnson et al., "MIMIC-III, A Freely Accessible Critical Care Database," Scientific Data, vol. 3, no. 1, May 2016, Art. no. 160035.
Downloads
How to Cite
License
Copyright (c) 2026 Agus Fatkhurohman, Nur Rokhman, Sri Mulyana, Ida Tjahajati

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
