A Deterministic Finite-State Morphological Analyzer for Urdu Nominal System
Received: 28 February 2023 | Revised: 29 April 2023 | Accepted: 2 May 2023 | Online: 24 May 2023
Corresponding author: Mohammad Mahyoob
Abstract
The morphological analyzer is a computational process that combines lemmas with other linguistic features to produce new lexical word forms. This paper investigates the processing of a nominal system in the Urdu language. It focuses on the inflections of noun forms and studies number, gender, person, and case representations, using a Finite State Machine (FSM) to analyze and create all the possible forms of the standardized registers. The application of the analysis using this tool provides and displays all the possible structures and their declensions. This study adds all the necessary features and values to the lexical concatenating nouns according to their patterns. The accuracy score of the output is 92.7, where the actual output depends on the detailed design of the FSM and the specific morphological processes provided to the finite state tools.
Keywords:
Urdu natural language processing, computational morphology, morphological analyzer, finite-state automata, inflection, derivationDownloads
References
M. G. A. Malik, C. Boitet, and P. Bhattacharyya, "Hindi Urdu machine transliteration using finite-state transducers," in 22nd International Conference on Computational Linguistics, Stroudsburg,PA, USA, Aug. 2008, pp. 537–544. DOI: https://doi.org/10.3115/1599081.1599149
R. Ahmad, "Urdu in Devanagari: Shifting orthographic practices and Muslim identity in Delhi," Language in Society, vol. 40, no. 3, pp. 259–284, Jun. 2011. DOI: https://doi.org/10.1017/S0047404511000182
K. V. S. Prasad and S. M. Virk, "Computational evidence that Hindi and Urdu share a grammar but not the lexicon," in 3rd Workshop on South and Southeast Asian Natural Language Processing, Mumbai, India, Dec. 2012, pp. 1–14.
K. Koskenniemi, "Guessing lexicon entries using finite-state methods," in 4th International Workshop for Computational Linguistics for Uralic Languages, Helsinki, Finland, Jan. 2018, pp. 59–75. DOI: https://doi.org/10.18653/v1/W18-0206
J. M. M. A. Algaraady, "Needs Challenges and Preliminary Solutions for Verb Phrases Translation from English to Arabic An Example Based Machine Translation Model," Ph.D. dissertation, Aligarh Muslim University, Aligarh, India.
K. V. Goethem, "Affixation in Morphology," in Oxford Research Encyclopedia of Linguistics, Oxford, England: Oxford University Press, 2020, pp. 1–35. DOI: https://doi.org/10.1093/acrefore/9780199384655.013.678
S. Vikram, "Morphology: Indian Languages and European Languages," International Journal of Scientific and Research Publications, vol. 3, no. 6, pp. 1–5, 2013.
M. C. Shapiro, "Chapter Seven: Hindi," in The Indo-Aryan Languages 2, 2003, pp. 276–314.
A. Niazi, "Morphological Analysis of Urdu Verbs," in 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Apr. 2016, pp. 284–293. DOI: https://doi.org/10.1007/978-3-319-75477-2_19
T. Fatima, R. U. Islam, M. W. Anwar, M. H. Jamal, M. T. Chaudhry, and Z. Gillani, "STEMUR: An Automated Word Conflation Algorithm for the Urdu Language," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 2, Aug. 2021, Art. no. 35. DOI: https://doi.org/10.1145/3476226
K. Riaz, "Challenges in Urdu stemming: a progress report," in BCS IRSG Symposium: Future Directions in Information Access, Swindon, United Kingdom, Aug. 2007, pp. 23–27.
D. Chopra, N. Joshi, and I. Mathur, "A Review on Machine Translation in Indian Languages," Engineering, Technology & Applied Science Research, vol. 8, no. 5, pp. 3475–3478, Oct. 2018. DOI: https://doi.org/10.48084/etasr.2288
M. Mahyoob, J. Algaraady, M. Alrahiali, and A. Alblwi, "Sentiment Analysis of Public Tweets Towards the Emergence of SARS-CoV-2 Omicron Variant: A Social Media Analytics Framework," Engineering, Technology & Applied Science Research, vol. 12, no. 3, pp. 8525–8531, Jun. 2022. DOI: https://doi.org/10.48084/etasr.4865
J. Algaraady, "An analysis of Yemenis’ responses and sentiments on social media towards the emergence of the COVID-19 pandemic," Humanities and Educational Sciences Journal, vol. 27, pp. 589–607, Dec. 2022. DOI: https://doi.org/10.55074/hesj.v0i27.621
J. Algaraady and M. Mahyoob, "Public Sentiment Analysis in Social Media on the SARS-CoV-2 Vaccination Using VADER Lexicon Polarity," Humanities and Educational Sciences Journal, vol. 22, pp. 591–609, Apr. 2022. DOI: https://doi.org/10.55074/hesj.v0i22.476
S. Jha, A. Sudhakar, and A. K. Singh, "Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction." arXiv, Sep. 16, 2019.
P. Sharma and N. Joshi, "Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3985–3989, Apr. 2019. DOI: https://doi.org/10.48084/etasr.2596
M. Humayoun, H. Hammarstrom, and A. Ranta, "Urdu Morphology, Orthography and Lexicon Extraction." arXiv, Apr. 06, 2022.
S. Mukund, R. Srihari, and E. Peterson, "An Information-Extraction System for Urdu---A Resource-Poor Language," ACM Transactions on Asian Language Information Processing, vol. 9, no. 4, Sep. 2010, Art. no. 15. DOI: https://doi.org/10.1145/1838751.1838754
M. Mahyoob, "Semi-automatic Annotation of Arabic Corpus: A Morpho-syntactic Study," Ph.D. dissertation, Aligarh Muslim University, Aligarh, India, 2015.
M. Mahyoob and J. Algaraady, "Towards Developing a Morphological Analyser for Arabic Noun Forms," International Journal of Linguistics and Computational Applications, vol. 5, no. 3, pp. 45–51, Jun. 2018. DOI: https://doi.org/10.30726/ijlca/v5.i3.2018.52012
M. Mahyoob, "Developing a Simplified Morphological Analyzer for Arabic Pronominal System," International Journal on Natural Language Computing, vol. 9, no. 2, pp. 9–19, Apr. 2020. DOI: https://doi.org/10.5121/ijnlc.2020.9202
V. Gupta, N. Joshi, and I. Mathur, "Rule based stemmer in Urdu," in 4th International Conference on Computer and Communication Technology, Allahabad, India, Sep. 2013, pp. 129–132. DOI: https://doi.org/10.1109/ICCCT.2013.6749615
T. Bogel, M. Butt, A. Hautli, and S. Sulger, "Developing a finite-state morphological analyzer for Urdu and Hindi," in Finite-state Methods and Natural Language Processing, Potsdam, Germany: University of Potsdam, 2008, pp. 86–96.
S. Srirampur, R. Chandibhamar, and R. Mamidi, "Statistical Morph Analyzer (SMA++) for Indian Languages," in First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, Dublin, Ireland, Aug. 2014, pp. 103–109. DOI: https://doi.org/10.3115/v1/W14-5312
G. Chrupała, G. Dinu, and J. van Genabith, "Learning morphology with Morfette," in Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco, Dec. 2008, pp. 1–6.
D. K. Malladi and P. Mannem, "Context Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing," in Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, Seattle, WA, USA, Oct. 2013, pp. 119–128.
A. Agarwal, Pramila, S. P. Singh, A. Kumar, and H. Darbari, "Morphological Analyser for Hindi – A Rule Based Implementation," International Journal of Advanced Computer Research, vol. 4, no. 1, pp. 19–25, Mar. 2014.
C. Rao, "Morphology in word recognition: Hindi and Urdu," Ph.D. dissertation, Texas A&M University, College Station, TX, USa, 2010.
M. Hulden, "Finite-State Technology," in The Oxford Handbook of Computational Linguistics, R. Mitkov, Ed. Oxford, England: Oxford University Press, 2022, pp. 230–254.
Downloads
How to Cite
License
Copyright (c) 2023 Abdulaziz Alblwi, Mohammad Mahyoob, Jeehaan Algaraady, Khateeb Syed Mustafa
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.