English-Vietnamese Cross-Lingual Paraphrase Identification Using MT-DNN

Authors

  • H. V. T. Chi Faculty of Information Technology, Vietnam National University, Ho Chi Minh City - University of Science, Vietnam
  • D. L. Anh Faculty of Information Technology, Vietnam National University, Ho Chi Minh City - University of Science, Vietnam
  • N. L. Thanh Faculty of Information Technology, Vietnam National University, Ho Chi Minh City - University of Science, Vietnam
  • D. Dinh Faculty of Information Technology, Vietnam National University, Ho Chi Minh City - University of Science, Vietnam
Volume: 11 | Issue: 5 | Pages: 7598-7604 | October 2021 | https://doi.org/10.48084/etasr.4300

Abstract

Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.

Keywords:

MT-DNN, BERT, XLM-R, English, Vietnamese, cross-language, paraphrase identification

Downloads

Download data is not yet available.

References

A. Amaral, "Paraphrase Identification and Applications in Finding Answers in FAQ Databases." 2013, [Online]. Available: https://fenix.tecnico.ulisboa.pt/downloadFile/395145918749/resumo.pdf.

X. Liu, P. He, W. Chen, and J. Gao, "Multi-Task Deep Neural Networks for Natural Language Understanding," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, Jul. 2019, pp. 4487–4496.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv:1810.04805 [cs], May 2019, Accessed: Aug. 26, 2021. [Online]. Available: http://arxiv.org/abs/1810.04805.

A. Conneau et al., "Unsupervised Cross-lingual Representation Learning at Scale," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 8440–8451.

L. T. Nguyen and D. Dien, "English- Vietnamese Cross-Language Paraphrase Identification Method," in Proceedings of the Eighth International Symposium on Information and Communication Technology, New York, NY, USA, Dec. 2017, pp. 42–49.

D. Dinh and N. Le Thanh, "English–Vietnamese cross-language paraphrase identification using hybrid feature classes," Journal of Heuristics, Apr. 2019.

M. Mohamed and M. Oussalah, "A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics," Language Resources and Evaluation, vol. 54, no. 2, pp. 457–485, Jun. 2020.

U. Khan, K. Khan, F. Hassan, A. Siddiqui, and M. Afaq, "Towards Achieving Machine Comprehension Using Deep Learning on Non-GPU Machines," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4423–4427, Aug. 2019.

S. Mandava, S. Migacz, and A. F. Florea, "Pay Attention when Required," arXiv:2009.04534 [cs], May 2021, Accessed: Aug. 26, 2021. [Online]. Available: http://arxiv.org/abs/2009.04534.

B. Ahmed, G. Ali, A. Hussain, A. Baseer, and J. Ahmed, "Analysis of Text Feature Extractors using Deep Learning on Fake News," Engineering, Technology & Applied Science Research, vol. 11, no. 2, pp. 7001–7005, Apr. 2021.

R. Mihalcea, C. Corley, and C. Strapparava, "Corpus-based and knowledge-based measures of text semantic similarity," in Proceedings of the 21st national conference on Artificial intelligence, Boston, MA, USA, Jul. 2006, vol. 1, pp. 775–780.

W. Yin and H. Schütze, "Convolutional Neural Network for Paraphrase Identification," in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, May 2015, pp. 901–911.

H. Shahmohammadi, M. Dezfoulian, and M. Mansoorizadeh, "Paraphrase detection using LSTM networks and handcrafted features," Multimedia Tools and Applications, vol. 80, no. 4, pp. 6479–6492, Feb. 2021.

R. Caruana, "Multitask Learning," Machine Learning, vol. 28, no. 1, pp. 41–75, Jul. 1997.

M. Crawshaw, "Multi-Task Learning with Deep Neural Networks: A Survey," arXiv:2009.09796 [cs, stat], Sep. 2020, Accessed: Aug. 26, 2021. [Online]. Available: http://arxiv.org/abs/2009.09796.

A. Warstadt, A. Singh, and S. R. Bowman, "Neural Network Acceptability Judgments," Transactions of the Association for Computational Linguistics, vol. 7, pp. 625–641, Mar. 2019.

E. F. Tjong Kim Sang and F. De Meulder, "Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition," in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003, pp. 142–147.

H. T. M. Nguyen, Q. T. Ngo, L. X. Vu, V. M. Tran, and H. T. T. Nguyen, "VLSP Shared Task: Named Entity Recognition," Journal of Computer Science and Cybernetics, vol. 34, no. 4, pp. 283–294, 2018.

A. Breit, A. Revenko, K. Rezaee, M. T. Pilehvar, and J. Camacho-Collados, "WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context," in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, Apr. 2021, pp. 1635–1645.

I. Hendrickx et al., "SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals," in Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, Jul. 2010, pp. 33–38.

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding," in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, Nov. 2018, pp. 353–355.

Downloads

How to Cite

[1]
H. V. T. Chi, D. L. Anh, N. L. Thanh, and D. Dinh, “English-Vietnamese Cross-Lingual Paraphrase Identification Using MT-DNN”, Eng. Technol. Appl. Sci. Res., vol. 11, no. 5, pp. 7598–7604, Oct. 2021.

Metrics

Abstract Views: 70
PDF Downloads: 35

Metrics Information
Bookmark and Share