Voting Strategies for Arabic Named Entity Recognition using Annotation Schemes
Received: 6 August 2024 | Revised: 31 August 2024 | Accepted: 14 September 2024 | Online: 2 December 2024
Corresponding author: Ikram Belhajem
Abstract
Named Entity Recognition (NER) seeks to identify and classify NEs into predefined categories and is an important subtask in information extraction. Many annotation schemes have been proposed to assign suitable labels for multiword NEs within a given text. This study proposes a method to combine the results of different annotation schemes (IOB, IOE, IOBE, IOBS, IOES, and IOBES) for Arabic NER (ANER). Three voting strategies are explored, namely, majority voting, weighted voting, and weighted voting-based Particle Swarm Optimization (PSO), applied to Conditional Random Fields (CRF) classifiers, each corresponding to a certain annotation scheme. The experimental results showed that majority voting can be considered an effective combination strategy to enhance the performance of ANER systems.
Keywords:
Information Extraction, Named Entity Recognition, Machine Learning, Conditional Random Fields, annotation schemes, voting strategiesDownloads
References
R. Grishman and B. Sundheim, "Message Understanding Conference-6: a brief history," in Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, 1996, vol. 1, Art. no. 466.
E. F. T. K. Sang and F. De Meulder, "Introduction to the CoNLL-2003 shared task: language-independent named entity recognition," in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, vol. 4, pp. 142–147.
A. Ababneh, J. Lu, and Q. Xu, "Arabic Information Retrieval: A Relevancy Assessment Survey," Proceedings of the International Conference on Information Systems Development (ISD), Katowice, Poland, Sep. 2016.
A. Alqudsi, N. Omar, and K. Shaker, "Arabic machine translation: a survey," Artificial Intelligence Review, vol. 42, no. 4, pp. 549–572, Dec. 2014.
L. Abouenour, K. Bouzoubaa, and P. Rosso, "IDRAAQ: New Arabic Question Answering system based on Query Expansion and Passage Retrieval," in CLEF 2012 Evaluation Labs and Workshop, Rome, Italy, 2012.
Y. Benajiba, M. Diab, and P. Rosso, "Arabic Named Entity Recognition: A Feature-Driven Study," IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 5, pp. 926–934, Jul. 2009.
"Arabic - Worldwide distribution," Worlddata.info. https://www.worlddata.info/languages/arabic.php.
K. Shaalan, "A Survey of Arabic Named Entity Recognition and Classification," Computational Linguistics, vol. 40, no. 2, pp. 469–510, Jun. 2014.
W. Zaghouani, "RENAR: A Rule-Based Arabic Named Entity Recognition System," ACM Transactions on Asian Language Information Processing, vol. 11, no. 1, pp. 1–13, Mar. 2012.
M. Oudah and K. Shaalan, "NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic," Natural Language Engineering, vol. 23, no. 3, pp. 441–472, May 2017.
Y. Benajiba and P. Rosso, "Arabic Named Entity Recognition using Conditional Random Fields," in Proceedings of the Workshop on HLT and NLP within the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, pp. 143–153.
J. Li, A. Sun, J. Han, and C. Li, "A Survey on Deep Learning for Named Entity Recognition," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 50–70, Jan. 2022.
M. Oudah and K. Shaalan, "A Pipeline Arabic Named Entity Recognition using a Hybrid Approach," in Proceedings of the Coling Organizing Committee, Mumbai, India, 2012, pp. 2159–2176.
M. Konkol and M. Konopík, "Segment Representations in Named Entity Recognition," in Text, Speech, and Dialogue, Pilsen,Czech Republic, 2015, pp. 61–70.
D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel, "Nymble: a high-performance learning name-finder," in Proceedings of the fifth conference on Applied natural language processing, Washington, DC, USA, 1997, pp. 194–201.
L. A. Ramshaw and M. P. Marcus, "Text Chunking Using Transformation-Based Learning," in Natural Language Processing Using Very Large Corpora, S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, and D. Yarowsky, Eds. Dordrecht, Netherlands: Springer Netherlands, 1999, pp. 157–176.
E. F. Tjong Kim Sang, "Memory-based named entity recognition," in proceeding of the 6th conference on Natural language learning - COLING-02, Stroudsburg, PA, USA, 2002, vol. 20, pp. 1–4.
V. Krishnan and V. Ganapathy, "Named entity recognition," Stanford University, 2005.
H. Nayel and H. L. Shashirekha, "Improving NER for clinical texts by ensemble approach using segment representations," in Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 2017, pp. 197–204.
H. A. Nayel, H. L. Shashirekha, H. Shindo, and Y. Matsumoto, "Improving Multi-Word Entity Recognition for Biomedical Texts," in Proceedings International Journal of Pure and Applied Mathematics, 2018.
E. Jamalian and R. Foukerdi, "A Hybrid Data Mining Method for Customer Churn Prediction," Engineering, Technology & Applied Science Research, vol. 8, no. 3, pp. 2991–2997, Jun. 2018.
Y. Benajiba, P. Rosso, and J. M. BenedíRuiz, "ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy," in Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, 2007, pp. 143–153.
S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy, "Integrated Machine Learning Techniques for Arabic Named Entity Recognition," International Journal of Computer Science Issues, vol. 7, no. 4, pp. 27–36, Jul. 2010.
T. Zerrouki, "Tashaphyne, Arabic light stemmer." 2012, [Online]. Available: https://pypi.python.org/pypi/Tashaphyne/0.2.
A. Pasha et al., "MADAMIRA: 9th International Conference on Language Resources and Evaluation, LREC 2014," in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, 2014, Art. no. 1094–1101.
A. Ekbal and S. Bandyopadhyay, "Named Entity Recognition using Support Vector Machine: A Language Independent Approach," International Journal of Electrical, Computer, and Systems Engineering, vol. 4, no. 2, 2010.
A. Abdul-Hamid and K. Darwish, "Simplified feature set for Arabic named entity recognition," in Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, Apr. 2010, pp. 110–115.
I. A. El-Khair, "Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study," International Journal of Computing & Information Sciences, vol. 4, no. 3, pp. 119–133, 2006.
J. D. Lafferty, A. McCallum, and F. C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," in Proceedings of the Eighteenth International Conference on Machine Learning, Mar. 2001, pp. 282–289.
A. McCallum and W. Li, "Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons," in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, Edmonton, Canada, 2003, vol. 4, pp. 188–191.
M. Althobaiti, U. Kruschwitz, and M. Poesio, "Combining Minimally-supervised Methods for Arabic Named Entity Recognition," Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, May 2015.
J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of ICNN’95 - International Conference on Neural Networks, Perth, Australia, 1995, vol. 4, pp. 1942–1948, https://doi.org/10.1109/ICNN.1995.488968.
"sklearn-crfsuite — sklearn-crfsuite 0.3 documentation." https://sklearn-crfsuite.readthedocs.io/en/latest/.
D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Mathematical Programming, vol. 45, no. 1, pp. 503–528, Aug. 1989.
Downloads
How to Cite
License
Copyright (c) 2024 Ikram Belhajem
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.