Marathi Word Sense Disambiguation through unsupervised K-Means Clustering
Received: 19 December 2024 | Revised: 25 January 2025 | Accepted: 29 January 2025 | Online: 4 June 2025
Corresponding author: Rasika Ransing
Abstract
Word Sense Disambiguation (WSD) is the most crucial Natural Language Processing task and refers to the process of determining the most suitable meaning of a word within its contextual usage. The case of the Marathi language is a bit complicated because it is considered a low-resource language, primarily due to the scarcity of annotated datasets. This study employs an unsupervised machine learning technique using k-means clustering for the disambiguation of Marathi words with more than one meanings without relying on manually labeled data. This disambiguation is accomplished with the help of the context these ambiguous words are used. Instead of implementing k-means clustering concurrently for all 12 words including 42 meanings, it is implemented separately for each word. The number of clusters for each word equals the number of meanings assigned to it. For each word, a Silhouette score is calculated to evaluate the quality of the obtained clustering. In the case of nouns, semantic boundaries were better defined, achieving higher Silhouette scores.
Keywords:
unsupervised learning, k-means clustering, word sense disambiguation, Marathi language, natural language processingDownloads
References
X. Zhang et al., "Word Sense Disambiguation by Refining Target Word Embedding," in Proceedings of the ACM Web Conference 2023, New York, NY, USA, Dec. 2023, pp. 1405–1414.
P. Jha, S. Agarwal, A. Abbas, and T. J. Siddiqui, "A Novel Unsupervısed Graph-Based Algorıthm for Hindi Word Sense Disambiguation," SN Computer Science, vol. 4, no. 5, Sep. 2023, Art. no. 675.
M. Alian and A. Awajan, "Arabic word sense disambiguation using sense inventories," International Journal of Information Technology, vol. 15, no. 2, pp. 735–744, Feb. 2023.
A. K. Barman, J. Sarmah, S. Basumatary, and A. Nag, "Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12581–12586, Feb. 2024.
C. D. Kokane, S. D. Babar, P. N. Mahalle, and S. P. Patil, "Word Sense Disambiguation: Adaptive Word Embedding with Adaptive-Lexical Resource," in Proceedings of International Conference on Data Analytics and Insights, ICDAI 2023, 2023, pp. 421–429.
P. Lahoti, N. Mittal, and G. Singh, "A Survey on NLP Resources, Tools, and Techniques for Marathi Language Processing," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 2, Sep. 2022.
M. Bevilacqua, T. Pasini, A. Raganato, and R. Navigli, "Recent Trends in Word Sense Disambiguation: A Survey," in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, Canada, Aug. 2021, pp. 4330–4338.
R. Navigli, "Word sense disambiguation: A survey," ACM Computing Surveys, vol. 41, no. 2, Oct. 2009, Art. no. 10.
D. Ustalov, D. Teslenko, A. Panchenko, M. Chernoskutov, C. Biemann, and S. P. Ponzetto, "An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages." arXiv, Apr. 27, 2018.
K. P. S. Sankar, P. C. R. Raj, and V. Jayan, "Unsupervised Approach to Word Sense Disambiguation in Malayalam," Procedia Technology, vol. 24, pp. 1507–1513, Jan. 2016.
A. R. Pal, D. Saha, S. Naskar, and N. S. Dash, "Word sense disambiguation in Bengali: A lemmatized system increases the accuracy of the result," in 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), Kolkata, India, Jul. 2015, pp. 342–346.
A. R. Pal and D. Saha, "Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications," Sādhanā, vol. 44, no. 7, Jun. 2019, Art. no. 168.
D. I. Martin, M. W. Berry, and J. C. Martin, "Semantic Unsupervised Learning for Word Sense Disambiguation," in Supervised and Unsupervised Learning for Data Science, M. W. Berry, A. Mohamed, and B. W. Yap, Eds. Springer International Publishing, 2020, pp. 101–120.
A. R. Pal and D. Saha, "Word sense disambiguation in Bengali: An unsupervised approach," in 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, Feb. 2017, pp. 1–5.
L. Popale and P. Bhattacharyya, "Creating Marathi WordNet," in The WordNet in Indian Languages, N. S. Dash, P. Bhattacharyya, and J. D. Pawar, Eds. Springer, 2017, pp. 147–166.
J. Qi, Y. Yu, L. Wang, and J. Liu, "K*-Means: An Effective and Efficient K-Means Clustering Algorithm," in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), Atlanta, GA, USA, Oct. 2016, pp. 242–249.
R. Ransing and A. Gulati, "Word Sense Disambiguation for Marathi language using Supervised Learning," in Proceedings of the 20th International Conference on Natural Language Processing (ICON), Goa University, Goa, India, Sep. 2023, pp. 754–759. [Online]. Available: https://aclanthology.org/2023.icon-1.76/.
R. Ransing and A. Gulati, "Unsupervised Word Sense Disambiguation for Marathi language using Word Embeddings," International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 3, pp. 1374–1380, Mar. 2024.
Downloads
How to Cite
License
Copyright (c) 2025 Rasika Ransing, Archana Gulati

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.