Marathi Word Sense Disambiguation through unsupervised K-Means Clustering

Authors

  • Rasika Ransing Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India | Vidyalankar Institute of Technology, Mumbai, Maharashtra, India
  • Archana Gulati School of Business Management, SVKM's NMIMS University, Navi Mumbai, Maharashtra, India
Volume: 15 | Issue: 3 | Pages: 22837-22843 | June 2025 | https://doi.org/10.48084/etasr.9975

Abstract

Word Sense Disambiguation (WSD) is the most crucial Natural Language Processing task and refers to the process of determining the most suitable meaning of a word within its contextual usage. The case of the Marathi language is a bit complicated because it is considered a low-resource language, primarily due to the scarcity of annotated datasets. This study employs an unsupervised machine learning technique using k-means clustering for the disambiguation of Marathi words with more than one meanings without relying on manually labeled data. This disambiguation is accomplished with the help of the context these ambiguous words are used. Instead of implementing k-means clustering concurrently for all 12 words including 42 meanings, it is implemented separately for each word. The number of clusters for each word equals the number of meanings assigned to it. For each word, a Silhouette score is calculated to evaluate the quality of the obtained clustering. In the case of nouns, semantic boundaries were better defined, achieving higher Silhouette scores.

Keywords:

unsupervised learning, k-means clustering, word sense disambiguation, Marathi language, natural language processing

Downloads

Download data is not yet available.

References

X. Zhang et al., "Word Sense Disambiguation by Refining Target Word Embedding," in Proceedings of the ACM Web Conference 2023, New York, NY, USA, Dec. 2023, pp. 1405–1414.

P. Jha, S. Agarwal, A. Abbas, and T. J. Siddiqui, "A Novel Unsupervısed Graph-Based Algorıthm for Hindi Word Sense Disambiguation," SN Computer Science, vol. 4, no. 5, Sep. 2023, Art. no. 675.

M. Alian and A. Awajan, "Arabic word sense disambiguation using sense inventories," International Journal of Information Technology, vol. 15, no. 2, pp. 735–744, Feb. 2023.

A. K. Barman, J. Sarmah, S. Basumatary, and A. Nag, "Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12581–12586, Feb. 2024.

C. D. Kokane, S. D. Babar, P. N. Mahalle, and S. P. Patil, "Word Sense Disambiguation: Adaptive Word Embedding with Adaptive-Lexical Resource," in Proceedings of International Conference on Data Analytics and Insights, ICDAI 2023, 2023, pp. 421–429.

P. Lahoti, N. Mittal, and G. Singh, "A Survey on NLP Resources, Tools, and Techniques for Marathi Language Processing," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 2, Sep. 2022.

M. Bevilacqua, T. Pasini, A. Raganato, and R. Navigli, "Recent Trends in Word Sense Disambiguation: A Survey," in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, Canada, Aug. 2021, pp. 4330–4338.

R. Navigli, "Word sense disambiguation: A survey," ACM Computing Surveys, vol. 41, no. 2, Oct. 2009, Art. no. 10.

D. Ustalov, D. Teslenko, A. Panchenko, M. Chernoskutov, C. Biemann, and S. P. Ponzetto, "An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages." arXiv, Apr. 27, 2018.

K. P. S. Sankar, P. C. R. Raj, and V. Jayan, "Unsupervised Approach to Word Sense Disambiguation in Malayalam," Procedia Technology, vol. 24, pp. 1507–1513, Jan. 2016.

A. R. Pal, D. Saha, S. Naskar, and N. S. Dash, "Word sense disambiguation in Bengali: A lemmatized system increases the accuracy of the result," in 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), Kolkata, India, Jul. 2015, pp. 342–346.

A. R. Pal and D. Saha, "Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications," Sādhanā, vol. 44, no. 7, Jun. 2019, Art. no. 168.

D. I. Martin, M. W. Berry, and J. C. Martin, "Semantic Unsupervised Learning for Word Sense Disambiguation," in Supervised and Unsupervised Learning for Data Science, M. W. Berry, A. Mohamed, and B. W. Yap, Eds. Springer International Publishing, 2020, pp. 101–120.

A. R. Pal and D. Saha, "Word sense disambiguation in Bengali: An unsupervised approach," in 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, Feb. 2017, pp. 1–5.

L. Popale and P. Bhattacharyya, "Creating Marathi WordNet," in The WordNet in Indian Languages, N. S. Dash, P. Bhattacharyya, and J. D. Pawar, Eds. Springer, 2017, pp. 147–166.

J. Qi, Y. Yu, L. Wang, and J. Liu, "K*-Means: An Effective and Efficient K-Means Clustering Algorithm," in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), Atlanta, GA, USA, Oct. 2016, pp. 242–249.

R. Ransing and A. Gulati, "Word Sense Disambiguation for Marathi language using Supervised Learning," in Proceedings of the 20th International Conference on Natural Language Processing (ICON), Goa University, Goa, India, Sep. 2023, pp. 754–759. [Online]. Available: https://aclanthology.org/2023.icon-1.76/.

R. Ransing and A. Gulati, "Unsupervised Word Sense Disambiguation for Marathi language using Word Embeddings," International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 3, pp. 1374–1380, Mar. 2024.

Downloads

How to Cite

[1]
Ransing, R. and Gulati, A. 2025. Marathi Word Sense Disambiguation through unsupervised K-Means Clustering. Engineering, Technology & Applied Science Research. 15, 3 (Jun. 2025), 22837–22843. DOI:https://doi.org/10.48084/etasr.9975.

Metrics

Abstract Views: 27
PDF Downloads: 22

Metrics Information