A Deterministic Finite-State Morphological Analyzer for Urdu Nominal System

Abdulaziz Alblwi; Mohammad Mahyoob; Jeehaan Algaraady; Khateeb Syed Mustafa

doi:10.48084/etasr.5823

Authors

Abdulaziz Alblwi Department of Computer Science, Applied College, Taibah University, Saudi Arabia
Mohammad Mahyoob Department of Languages and Translation, Faculty of Science and Arts, Taibah University, Saudi Arabia
Jeehaan Algaraady Centre of Languages, Taiz University, Yemen
Khateeb Syed Mustafa Department of Linguistics, Aligarh Muslim University, India

Volume: 13 | Issue: 3 | Pages: 11026-11031 | June 2023 | https://doi.org/10.48084/etasr.5823

Received: 28 February 2023 | Revised: 29 April 2023 | Accepted: 2 May 2023 | Online: 24 May 2023

Corresponding author: Mohammad Mahyoob

Abstract

The morphological analyzer is a computational process that combines lemmas with other linguistic features to produce new lexical word forms. This paper investigates the processing of a nominal system in the Urdu language. It focuses on the inflections of noun forms and studies number, gender, person, and case representations, using a Finite State Machine (FSM) to analyze and create all the possible forms of the standardized registers. The application of the analysis using this tool provides and displays all the possible structures and their declensions. This study adds all the necessary features and values to the lexical concatenating nouns according to their patterns. The accuracy score of the output is 92.7, where the actual output depends on the detailed design of the FSM and the specific morphological processes provided to the finite state tools.

Keywords:

Urdu natural language processing, computational morphology, morphological analyzer, finite-state automata, inflection, derivation

References

M. G. A. Malik, C. Boitet, and P. Bhattacharyya, "Hindi Urdu machine transliteration using finite-state transducers," in 22nd International Conference on Computational Linguistics, Stroudsburg,PA, USA, Aug. 2008, pp. 537–544. DOI: https://doi.org/10.3115/1599081.1599149

R. Ahmad, "Urdu in Devanagari: Shifting orthographic practices and Muslim identity in Delhi," Language in Society, vol. 40, no. 3, pp. 259–284, Jun. 2011. DOI: https://doi.org/10.1017/S0047404511000182

K. V. S. Prasad and S. M. Virk, "Computational evidence that Hindi and Urdu share a grammar but not the lexicon," in 3rd Workshop on South and Southeast Asian Natural Language Processing, Mumbai, India, Dec. 2012, pp. 1–14.

K. Koskenniemi, "Guessing lexicon entries using finite-state methods," in 4th International Workshop for Computational Linguistics for Uralic Languages, Helsinki, Finland, Jan. 2018, pp. 59–75. DOI: https://doi.org/10.18653/v1/W18-0206

J. M. M. A. Algaraady, "Needs Challenges and Preliminary Solutions for Verb Phrases Translation from English to Arabic An Example Based Machine Translation Model," Ph.D. dissertation, Aligarh Muslim University, Aligarh, India.

K. V. Goethem, "Affixation in Morphology," in Oxford Research Encyclopedia of Linguistics, Oxford, England: Oxford University Press, 2020, pp. 1–35. DOI: https://doi.org/10.1093/acrefore/9780199384655.013.678

S. Vikram, "Morphology: Indian Languages and European Languages," International Journal of Scientific and Research Publications, vol. 3, no. 6, pp. 1–5, 2013.

M. C. Shapiro, "Chapter Seven: Hindi," in The Indo-Aryan Languages 2, 2003, pp. 276–314.

A. Niazi, "Morphological Analysis of Urdu Verbs," in 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Apr. 2016, pp. 284–293. DOI: https://doi.org/10.1007/978-3-319-75477-2_19

T. Fatima, R. U. Islam, M. W. Anwar, M. H. Jamal, M. T. Chaudhry, and Z. Gillani, "STEMUR: An Automated Word Conflation Algorithm for the Urdu Language," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 2, Aug. 2021, Art. no. 35. DOI: https://doi.org/10.1145/3476226

K. Riaz, "Challenges in Urdu stemming: a progress report," in BCS IRSG Symposium: Future Directions in Information Access, Swindon, United Kingdom, Aug. 2007, pp. 23–27.

D. Chopra, N. Joshi, and I. Mathur, "A Review on Machine Translation in Indian Languages," Engineering, Technology & Applied Science Research, vol. 8, no. 5, pp. 3475–3478, Oct. 2018. DOI: https://doi.org/10.48084/etasr.2288

M. Mahyoob, J. Algaraady, M. Alrahiali, and A. Alblwi, "Sentiment Analysis of Public Tweets Towards the Emergence of SARS-CoV-2 Omicron Variant: A Social Media Analytics Framework," Engineering, Technology & Applied Science Research, vol. 12, no. 3, pp. 8525–8531, Jun. 2022. DOI: https://doi.org/10.48084/etasr.4865

J. Algaraady, "An analysis of Yemenis’ responses and sentiments on social media towards the emergence of the COVID-19 pandemic," Humanities and Educational Sciences Journal, vol. 27, pp. 589–607, Dec. 2022. DOI: https://doi.org/10.55074/hesj.v0i27.621

J. Algaraady and M. Mahyoob, "Public Sentiment Analysis in Social Media on the SARS-CoV-2 Vaccination Using VADER Lexicon Polarity," Humanities and Educational Sciences Journal, vol. 22, pp. 591–609, Apr. 2022. DOI: https://doi.org/10.55074/hesj.v0i22.476

S. Jha, A. Sudhakar, and A. K. Singh, "Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction." arXiv, Sep. 16, 2019.

P. Sharma and N. Joshi, "Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3985–3989, Apr. 2019. DOI: https://doi.org/10.48084/etasr.2596

M. Humayoun, H. Hammarstrom, and A. Ranta, "Urdu Morphology, Orthography and Lexicon Extraction." arXiv, Apr. 06, 2022.

S. Mukund, R. Srihari, and E. Peterson, "An Information-Extraction System for Urdu---A Resource-Poor Language," ACM Transactions on Asian Language Information Processing, vol. 9, no. 4, Sep. 2010, Art. no. 15. DOI: https://doi.org/10.1145/1838751.1838754

M. Mahyoob, "Semi-automatic Annotation of Arabic Corpus: A Morpho-syntactic Study," Ph.D. dissertation, Aligarh Muslim University, Aligarh, India, 2015.

M. Mahyoob and J. Algaraady, "Towards Developing a Morphological Analyser for Arabic Noun Forms," International Journal of Linguistics and Computational Applications, vol. 5, no. 3, pp. 45–51, Jun. 2018. DOI: https://doi.org/10.30726/ijlca/v5.i3.2018.52012

M. Mahyoob, "Developing a Simplified Morphological Analyzer for Arabic Pronominal System," International Journal on Natural Language Computing, vol. 9, no. 2, pp. 9–19, Apr. 2020. DOI: https://doi.org/10.5121/ijnlc.2020.9202

V. Gupta, N. Joshi, and I. Mathur, "Rule based stemmer in Urdu," in 4th International Conference on Computer and Communication Technology, Allahabad, India, Sep. 2013, pp. 129–132. DOI: https://doi.org/10.1109/ICCCT.2013.6749615

T. Bogel, M. Butt, A. Hautli, and S. Sulger, "Developing a finite-state morphological analyzer for Urdu and Hindi," in Finite-state Methods and Natural Language Processing, Potsdam, Germany: University of Potsdam, 2008, pp. 86–96.

S. Srirampur, R. Chandibhamar, and R. Mamidi, "Statistical Morph Analyzer (SMA++) for Indian Languages," in First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, Dublin, Ireland, Aug. 2014, pp. 103–109. DOI: https://doi.org/10.3115/v1/W14-5312

G. Chrupała, G. Dinu, and J. van Genabith, "Learning morphology with Morfette," in Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco, Dec. 2008, pp. 1–6.

D. K. Malladi and P. Mannem, "Context Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing," in Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, Seattle, WA, USA, Oct. 2013, pp. 119–128.

A. Agarwal, Pramila, S. P. Singh, A. Kumar, and H. Darbari, "Morphological Analyser for Hindi – A Rule Based Implementation," International Journal of Advanced Computer Research, vol. 4, no. 1, pp. 19–25, Mar. 2014.

C. Rao, "Morphology in word recognition: Hindi and Urdu," Ph.D. dissertation, Texas A&M University, College Station, TX, USa, 2010.

M. Hulden, "Finite-State Technology," in The Oxford Handbook of Computational Linguistics, R. Mitkov, Ed. Oxford, England: Oxford University Press, 2022, pp. 230–254.