SAHI-BAR: An Instance Segmentation Model for Medical Prescriptions

G. R. Rekha; S. Siddesha; V. N. Manjunath Aradhya

doi:10.48084/etasr.16418

Authors

G. R. Rekha Department of Computer Applications, JSS Science and Technology University, Mysuru, Karnataka, India https://orcid.org/0000-0002-4901-4193
S. Siddesha Department of Computer Applications, JSS Science and Technology University, Mysuru, Karnataka, India https://orcid.org/0000-0002-5504-7696
V. N. Manjunath Aradhya Department of Computer Applications, JSS Science and Technology University, Mysuru, Karnataka, India https://orcid.org/0000-0003-0680-1338

Volume: 16 | Issue: 2 | Pages: 32899-32906 | April 2026 | https://doi.org/10.48084/etasr.16418

Received: 21 November 2025 | Revised: 12 December 2025, 31 December 2025, and 7 January 2026 | Accepted: 9 January 2026 | Online: 4 April 2026

Corresponding author: S. Siddesha

Abstract

Medical prescription documents pose significant challenges for automated information extraction due to dense layouts, small text, and heterogeneous field structures. This study presents a modular pipeline that augments a YOLO-based segmentation baseline with two lightweight strategies: (i) Sliced Aided Hyper Inference (SAHI) for tiled processing with post-hoc merging and (ii) Block Aware Routing (BAR) mechanism that fuses baseline and tiled predictions while enforcing a one-entity-per-class-per-block constraint to segment prescription parameters into eight different classes. Experiments on a custom prescription dataset with eight semantic classes, namely Block_id, Med_Name, Med_Type, Dose_strength, Frequency, Duration, Quantity, and Instructions, show that the proposed approach improves recall on dense textual regions without sacrificing precision. In addition, the newer YOLOv11 architecture was evaluated, demonstrating that inference-time tiling and routing remain the dominant contributors to small-object performance gains. The proposed framework is fully compatible with the Ultralytics ecosystem, does not require retraining for tiling benefits, and produces class-specific crops for downstream OCR and archival. These results indicate a practical and deployment-friendly approach to document parsing that balances accuracy, interpretability, and efficiency.

Keywords:

prescription analysis, parameter segmentation, instance segmentation, YOLOv8, YOLOv11, SAHI, routing

References

J. Memon, M. Sami, R. A. Khan, and M. Uddin, ''Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),'' IEEE Access, vol. 8, pp. 142642–142668, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3012542

J. Rausch, O. Martinez, F. Bissig, C. Zhang, and S. Feuerriegel, ''DocParser: Hierarchical Document Structure Parsing from Renderings,'' Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, pp. 4328–4338, Vancouver, Canada, May 2021. DOI: https://doi.org/10.1609/aaai.v35i5.16558

I. Sanasam, P. Choudhary, and K. M. Singh, ''Line and word segmentation of handwritten text document by mid-point detection and gap trailing,'' Multimedia Tools and Applications, vol. 79, no. 41, pp. 30135–30150, Nov. 2020. DOI: https://doi.org/10.1007/s11042-020-09416-1

Y. Qian, E. Santus, Z. Jin, J. Guo, and R. Barzilay, ''GraphIE: A Graph-Based Framework for Information Extraction,'' in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, Mar. 2019, pp. 751–761. DOI: https://doi.org/10.18653/v1/N19-1082

M. Javed, P. Nagabhushan, and B. B. Chaudhuri, ''A direct approach for word and character segmentation in run-length compressed documents with an application to word spotting,'' in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, Dec. 2015, pp. 216–220. DOI: https://doi.org/10.1109/ICDAR.2015.7333755

B. Kada, A. Mohammed, and B. Abdelmajid, ''An Optimized Approach for Handwritten Arabic Character Recognition based on the SVM Classifier,'' Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 22232–22238, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9292

M. Das and M. Panda, ''Seam carving, horizontal projection profile and contour tracing for line and word segmentation of language independent handwritten documents,'' Results in Engineering, vol. 18, June 2023, Art. no. 101110. DOI: https://doi.org/10.1016/j.rineng.2023.101110

S. Kaur, S. Bawa, and R. Kumar, ''Heuristic-based text segmentation of bilingual handwritten documents for Gurumukhi-Latin scripts,'' Multimedia Tools and Applications, vol. 83, no. 7, pp. 18667–18697, Feb. 2024. DOI: https://doi.org/10.1007/s11042-023-15335-8

C. Vinotheni and S. L. Pandian, "Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil Text Segmentation in NLP," ACM Transactions on Asian Low-Resource Language Information Processing, vol. 23, no. 5, Feb. 2024, Art. no. 68. DOI: https://doi.org/10.1145/3643808

M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, ''Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919–931, Jan. 2023. DOI: https://doi.org/10.1109/TPAMI.2022.3155612

W. Wang et al., ''Shape Robust Text Detection With Progressive Scale Expansion Network,'' in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 2019, pp. 9328–9337. DOI: https://doi.org/10.1109/CVPR.2019.00956

W. Wang et al., ''Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network,'' in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), July 2019, pp. 8439–8448. DOI: https://doi.org/10.1109/ICCV.2019.00853

D. Deng, H. Liu, X. Li, and D. Cai, ''PixelLink: Detecting Scene Text via Instance Segmentation,'' Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. DOI: https://doi.org/10.1609/aaai.v32i1.12269

P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, ''Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.'' in Proceedings European Conference Computer Vision (ECCV), Munich, Germany, 2018, pp. 67–83. DOI: https://doi.org/10.1007/978-3-030-01264-9_5

A. Mukhejee, A. Halder, S. Nath, and S. K. Sarkar, ''A New Approach to Information Retrieval Based on Keyword Spotting from Handwritten Medical Prescriptions,'' Advances In Industrial Engineering And Management, vol. 6, no. 2, 2017.

E. Hassan, H. Tarek, M. Hazem, S. Bahnacy, L. Shaheen, and W. H. Elashmwai, ''Medical Prescription Recognition using Machine Learning,'' in 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Jan. 2021, pp. 0973–0979. DOI: https://doi.org/10.1109/CCWC51732.2021.9376141

J. Zia, U. Habib, and M. A. Naeem, ''Extraction and Classification of Medicines from Handwritten Medical Prescriptions,'' in 2023 18th International Conference on Emerging Technologies (ICET), Peshawar, Pakistan, Aug. 2023, pp. 104–109. DOI: https://doi.org/10.1109/ICET59753.2023.10374771

G. Jocher, J. Qiu, and A. Chaurasia, ''Ultralytics YOLO.'' Jan. 2023, Available: https://github.com/ultralytics/ultralytics.

F. C. Akyon, S. O. Altinuc, and A. Temizel, ''Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection,'' in 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, July 2022, pp. 966–970. DOI: https://doi.org/10.1109/ICIP46576.2022.9897990

H. Zhang, C. Hao, W. Song, B. Jiang, and B. Li, ''Adaptive Slicing-Aided Hyper Inference for Small Object Detection in High-Resolution Remote Sensing Images,'' Remote Sensing, vol. 15, no. 5, Feb. 2023. DOI: https://doi.org/10.3390/rs15051249

G. A. Reina, R. Panchumarthy, S. P. Thakur, A. Bastidas, and S. Bakas, ''Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation,'' Frontiers in Neuroscience, vol. 14, Feb. 2020. DOI: https://doi.org/10.3389/fnins.2020.00065

M. Gong, D. Wang, X. Zhao, H. Guo, D. Luo, and M. Song, ''A review of non-maximum suppression algorithms for deep learning target detection,'' in Seventh Symposium on Novel Photoelectronic Detection Technology and Applications, Kunming, China, Mar. 2021, vol. 11763, pp. 821–828. DOI: https://doi.org/10.1117/12.2586477

R. Solovyev, W. Wang, and T. Gabruseva, ''Weighted boxes fusion: Ensembling boxes from different object detection models,'' Image and Vision Computing, vol. 107, Mar. 2021, Art. no. 104117. DOI: https://doi.org/10.1016/j.imavis.2021.104117

A. Banerjee, S. Biswas, J. Lladós, and U. Pal, ''SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation,'' in Document Analysis and Recognition - ICDAR 2023, San José, CA, USA, 2023, pp. 307–325. DOI: https://doi.org/10.1007/978-3-031-41676-7_18

G. R. Rekha and S. Siddesha, "Categorization and Content Extraction in Medical Prescription Using YOLOv8," in Emerging Electronics and Automation, Volume 1, vol. 1455, M. Kankanhalli, S. Bhartiya, and P. S. Pravin, Eds. Springer Nature Singapore, 2026, pp. 419–428. DOI: https://doi.org/10.1007/978-981-96-9554-6_33

K. He, G. Gkioxari, P. Dollar, and R. Girshick, ''Mask R-CNN,'' in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2961–2969. DOI: https://doi.org/10.1109/ICCV.2017.322

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, ''End-to-End Object Detection with Transformers,'' in Computer Vision – ECCV 2020, Glasgow, UK, 2020, pp. 213–229. DOI: https://doi.org/10.1007/978-3-030-58452-8_13

M. Tan, R. Pang, and Q. V. Le, ''EfficientDet: Scalable and Efficient Object Detection,'' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 10781–10790. DOI: https://doi.org/10.1109/CVPR42600.2020.01079