Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation
Received: 31 August 2023 | Revised: 8 October 2023 | Accepted: 16 October 2023 | Online: 8 February 2024
Corresponding author: Amitava Nag
Abstract
Word Sense Disambiguation (WSD) is concerned with automatically assigning the appropriate sense to an ambiguous word. WSD is an important task and plays a crucial role in many Natural Language Processing (NLP) applications. A Statistical Machine Translation (SMT) system translates a source into a target language based on phrase-based statistical translation. MT plays a crucial role in a WSD system, as a source language word may be associated with multiple translations in the target language. This study aims to apply WSD to the input of the MT system to enhance the disambiguation output. Hindi WordNet was used by selecting the most frequent synonym to obtain the most accurate translation. This study also compared Naïve Bayes (NB) and Decision Tree (DT) to test and build a WSD model. NB was more appropriate for the WSD task than DT when evaluated in the Weka machine learning toolkit. To the best of our knowledge, no such work has been carried out yet for the Assamese Indo-Aryan language. The applied WSD achieved better results than the baseline MT system without embedding the WSD module. The results were analyzed by linguist scholars. Furthermore, the Assamese-Hindi transliteration system was merged with the baseline MT system for the translation of proper nouns. This study marks a remarkable contribution to Assamese NLP, which is a low computationally aware Indian language.
Keywords:
word sense disambiguation, machine translation, machine learning, assamese, natural language processingDownloads
References
R. Joshi, R. Karnavat, K. Jirapure, and R. Joshi, "Evaluation of Deep Learning Models for Hostility Detection in Hindi Text," in 2021 6th International Conference for Convergence in Technology (I2CT), Maharashtra, India, Apr. 2021, pp. 1–5. DOI: https://doi.org/10.1109/I2CT51068.2021.9418073
A. Kumari and D. K. Lobiyal, "Efficient estimation of Hindi WSD with distributed word representation in vector space," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, Part B, pp. 6092–6103, Sep. 2022. DOI: https://doi.org/10.1016/j.jksuci.2021.03.008
M. Sheth, S. Popat, and T. Vyas, "Word Sense Disambiguation for Indian Languages," in Emerging Research in Computing, Information, Communication and Applications, 2018, pp. 583–593. DOI: https://doi.org/10.1007/978-981-10-4741-1_50
R. L. Singh, K. Ghosh, K. Nongmeikapam, and S. Bandyopadhyay, "A Decision Tree Based Word Sense Disambiguation System in Manipuri Language," Advanced Computing: An International Journal, vol. 5, no. 4, pp. 17–22, Jul. 2014. DOI: https://doi.org/10.5121/acij.2014.5403
S. K. Sarma, H. Bharali, A. Gogoi, R. Deka, and A. K. Barman, "A Structured Approach for Building Assamese Corpus: Insights, Applications and Challenges," in Proceedings of the 10th Workshop on Asian Language Resources, Mumbai, India, Dec. 2012, pp. 21–28.
D. S. K. Sarma and R. Medhi, "Foundation and Structure of Developing an Assamese Wordnet," presented at the 5th International Conference of the Global WordNet Association, Mumbai, India, Jan. 2021.
P. Bhattacharyya, "IndoWordNet," in The WordNet in Indian Languages, N. S. Dash, P. Bhattacharyya, and J. D. Pawar, Eds. Singapore: Springer, 2017, pp. 1–18. DOI: https://doi.org/10.1007/978-981-10-1909-8_1
A. K. Barman, J. Sarmah, and S. K. Sarma, "Assamese WordNet based Quality Enhancement of Bilingual Machine Translation System," in Proceedings of the Seventh Global Wordnet Conference, 2014, pp. 256–261.
N. J. Kalita and B. Islam, "Bengali to Assamese Statistical Machine Translation using Moses (Corpus Based)." arXiv, Apr. 05, 2015.
A. Stolcke, "SRILM-an extensible language modeling toolkit," presented at the Seventh International Conference on Spoken Language Processing, Denver, CO, USA, Sep. 2002. DOI: https://doi.org/10.21437/ICSLP.2002-303
F. J. Och and H. Ney, "A Systematic Comparison of Various Statistical Alignment Models," Computational Linguistics, vol. 29, no. 1, pp. 19–51, Mar. 2003. DOI: https://doi.org/10.1162/089120103321337421
B. Nethravathi, G. Amitha, A. Saruka, T. P. Bharath, and S. Suyagya, "Structuring Natural Language to Query Language: A Review," Engineering, Technology & Applied Science Research, vol. 10, no. 6, pp. 6521–6525, Dec. 2020. DOI: https://doi.org/10.48084/etasr.3873
A. Alqahtani, H. Alhakami, T. Alsubait, and A. Baz, "A Survey of Text Matching Techniques," Engineering, Technology & Applied Science Research, vol. 11, no. 1, pp. 6656–6661, Feb. 2021. DOI: https://doi.org/10.48084/etasr.3968
P. Sharma and N. Joshi, "Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3985–3989, Apr. 2019. DOI: https://doi.org/10.48084/etasr.2596
A. Alblwi, M. Mahyoob, J. Algaraady, and K. S. Mustafa, "A Deterministic Finite-State Morphological Analyzer for Urdu Nominal System," Engineering, Technology & Applied Science Research, vol. 13, no. 3, pp. 11026–11031, Jun. 2023. DOI: https://doi.org/10.48084/etasr.5823
D. Chopra, N. Joshi, and I. Mathur, "Improving Translation Quality By Using Ensemble Approach," Engineering, Technology & Applied Science Research, vol. 8, no. 6, pp. 3512–3514, Dec. 2018. DOI: https://doi.org/10.48084/etasr.2269
Downloads
How to Cite
License
Copyright (c) 2023 Subungshri Basumatary, Anup Kumar Barman, Jumi Sarmah, Amitava Nag
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.