Voting Strategies for Arabic Named Entity Recognition using Annotation Schemes

Authors

  • Ikram Belhajem Faculty of Sciences, Mohammed V University, Rabat, Morocco
Volume: 14 | Issue: 6 | Pages: 17690-17695 | December 2024 | https://doi.org/10.48084/etasr.8645

Abstract

Named Entity Recognition (NER) seeks to identify and classify NEs into predefined categories and is an important subtask in information extraction. Many annotation schemes have been proposed to assign suitable labels for multiword NEs within a given text. This study proposes a method to combine the results of different annotation schemes (IOB, IOE, IOBE, IOBS, IOES, and IOBES) for Arabic NER (ANER). Three voting strategies are explored, namely, majority voting, weighted voting, and weighted voting-based Particle Swarm Optimization (PSO), applied to Conditional Random Fields (CRF) classifiers, each corresponding to a certain annotation scheme. The experimental results showed that majority voting can be considered an effective combination strategy to enhance the performance of ANER systems.

Keywords:

Information Extraction, Named Entity Recognition, Machine Learning, Conditional Random Fields, annotation schemes, voting strategies

Downloads

Download data is not yet available.

References

R. Grishman and B. Sundheim, "Message Understanding Conference-6: a brief history," in Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, 1996, vol. 1, Art. no. 466.

E. F. T. K. Sang and F. De Meulder, "Introduction to the CoNLL-2003 shared task: language-independent named entity recognition," in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, vol. 4, pp. 142–147.

A. Ababneh, J. Lu, and Q. Xu, "Arabic Information Retrieval: A Relevancy Assessment Survey," Proceedings of the International Conference on Information Systems Development (ISD), Katowice, Poland, Sep. 2016.

A. Alqudsi, N. Omar, and K. Shaker, "Arabic machine translation: a survey," Artificial Intelligence Review, vol. 42, no. 4, pp. 549–572, Dec. 2014.

L. Abouenour, K. Bouzoubaa, and P. Rosso, "IDRAAQ: New Arabic Question Answering system based on Query Expansion and Passage Retrieval," in CLEF 2012 Evaluation Labs and Workshop, Rome, Italy, 2012.

Y. Benajiba, M. Diab, and P. Rosso, "Arabic Named Entity Recognition: A Feature-Driven Study," IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 5, pp. 926–934, Jul. 2009.

"Arabic - Worldwide distribution," Worlddata.info. https://www.worlddata.info/languages/arabic.php.

K. Shaalan, "A Survey of Arabic Named Entity Recognition and Classification," Computational Linguistics, vol. 40, no. 2, pp. 469–510, Jun. 2014.

W. Zaghouani, "RENAR: A Rule-Based Arabic Named Entity Recognition System," ACM Transactions on Asian Language Information Processing, vol. 11, no. 1, pp. 1–13, Mar. 2012.

M. Oudah and K. Shaalan, "NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic," Natural Language Engineering, vol. 23, no. 3, pp. 441–472, May 2017.

Y. Benajiba and P. Rosso, "Arabic Named Entity Recognition using Conditional Random Fields," in Proceedings of the Workshop on HLT and NLP within the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, pp. 143–153.

J. Li, A. Sun, J. Han, and C. Li, "A Survey on Deep Learning for Named Entity Recognition," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 50–70, Jan. 2022.

M. Oudah and K. Shaalan, "A Pipeline Arabic Named Entity Recognition using a Hybrid Approach," in Proceedings of the Coling Organizing Committee, Mumbai, India, 2012, pp. 2159–2176.

M. Konkol and M. Konopík, "Segment Representations in Named Entity Recognition," in Text, Speech, and Dialogue, Pilsen,Czech Republic, 2015, pp. 61–70.

D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel, "Nymble: a high-performance learning name-finder," in Proceedings of the fifth conference on Applied natural language processing, Washington, DC, USA, 1997, pp. 194–201.

L. A. Ramshaw and M. P. Marcus, "Text Chunking Using Transformation-Based Learning," in Natural Language Processing Using Very Large Corpora, S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, and D. Yarowsky, Eds. Dordrecht, Netherlands: Springer Netherlands, 1999, pp. 157–176.

E. F. Tjong Kim Sang, "Memory-based named entity recognition," in proceeding of the 6th conference on Natural language learning - COLING-02, Stroudsburg, PA, USA, 2002, vol. 20, pp. 1–4.

V. Krishnan and V. Ganapathy, "Named entity recognition," Stanford University, 2005.

H. Nayel and H. L. Shashirekha, "Improving NER for clinical texts by ensemble approach using segment representations," in Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 2017, pp. 197–204.

H. A. Nayel, H. L. Shashirekha, H. Shindo, and Y. Matsumoto, "Improving Multi-Word Entity Recognition for Biomedical Texts," in Proceedings International Journal of Pure and Applied Mathematics, 2018.

E. Jamalian and R. Foukerdi, "A Hybrid Data Mining Method for Customer Churn Prediction," Engineering, Technology & Applied Science Research, vol. 8, no. 3, pp. 2991–2997, Jun. 2018.

Y. Benajiba, P. Rosso, and J. M. BenedíRuiz, "ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy," in Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, 2007, pp. 143–153.

S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy, "Integrated Machine Learning Techniques for Arabic Named Entity Recognition," International Journal of Computer Science Issues, vol. 7, no. 4, pp. 27–36, Jul. 2010.

T. Zerrouki, "Tashaphyne, Arabic light stemmer." 2012, [Online]. Available: https://pypi.python.org/pypi/Tashaphyne/0.2.

A. Pasha et al., "MADAMIRA: 9th International Conference on Language Resources and Evaluation, LREC 2014," in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, 2014, Art. no. 1094–1101.

A. Ekbal and S. Bandyopadhyay, "Named Entity Recognition using Support Vector Machine: A Language Independent Approach," International Journal of Electrical, Computer, and Systems Engineering, vol. 4, no. 2, 2010.

A. Abdul-Hamid and K. Darwish, "Simplified feature set for Arabic named entity recognition," in Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, Apr. 2010, pp. 110–115.

I. A. El-Khair, "Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study," International Journal of Computing & Information Sciences, vol. 4, no. 3, pp. 119–133, 2006.

J. D. Lafferty, A. McCallum, and F. C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," in Proceedings of the Eighteenth International Conference on Machine Learning, Mar. 2001, pp. 282–289.

A. McCallum and W. Li, "Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons," in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, Edmonton, Canada, 2003, vol. 4, pp. 188–191.

M. Althobaiti, U. Kruschwitz, and M. Poesio, "Combining Minimally-supervised Methods for Arabic Named Entity Recognition," Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, May 2015.

J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of ICNN’95 - International Conference on Neural Networks, Perth, Australia, 1995, vol. 4, pp. 1942–1948, https://doi.org/10.1109/ICNN.1995.488968.

"sklearn-crfsuite — sklearn-crfsuite 0.3 documentation." https://sklearn-crfsuite.readthedocs.io/en/latest/.

D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Mathematical Programming, vol. 45, no. 1, pp. 503–528, Aug. 1989.

Downloads

How to Cite

[1]
Belhajem, I. 2024. Voting Strategies for Arabic Named Entity Recognition using Annotation Schemes. Engineering, Technology & Applied Science Research. 14, 6 (Dec. 2024), 17690–17695. DOI:https://doi.org/10.48084/etasr.8645.

Metrics

Abstract Views: 116
PDF Downloads: 169

Metrics Information

Most read articles by the same author(s)