Event Detection and Classification in Tweets using Deep Learning
Received: 14 October 2024 | Revised: 10 December 2024 | Accepted: 14 December 2024 | Online: 24 December 2024
Corresponding author: Malika Noui
Abstract
Online social networks have become important sources of information and contextual data in all areas of life, including finance, elections, social events, health, sports, etc. Recently, the detection and classification of useful events presented in tweets has attracted a lot of interest. However, due to the inherent challenges associated with the nature of the events to be detected or classified, traditional approaches have not yielded satisfactory results. The use of deep learning-based text word embedding representations, such as Word2Vec, GloVe, FastText, and BERT, has shown significant efficacy in improving detection performance by considering the semantic context. This study proposes a model that uses an LSTM stacked on top of BERT representations to effectively detect and classify events in tweets. To this end, a dataset of about 310,000 event-related tweets has been collected and categorized into 50 event types based on a selected set of representative keywords. Multiple experiments were carried out on the collected dataset to evaluate the performance of the proposed model. The proposed model attained an overall accuracy greater than 94.3% and an F1 score of more than 90%, achieving state-of-the-art results in the classification of most of the event categories.
Keywords:
useful event detection, social media data, deep learning, BERT, LSTMDownloads
References
T. Sakaki, M. Okazaki, and Y. Matsuo, "Earthquake shakes Twitter users: real-time event detection by social sensors," in Proceedings of the 19th International conference on World Wide Web, Raleigh, NC, USA, Apr. 2010, pp. 851–860.
J. E. C. Saire and A. P. Briseño, "Text Mining Approach to Analyze Coronavirus Impact: Mexico City as Case of Study." medRxiv, Art. no. 2020.05.07.20094466, May 12, 2020.
A. Culotta, "Towards detecting influenza epidemics by analyzing Twitter messages," in Proceedings of the First Workshop on Social Media Analytics, Washington, DC, USA, Apr. 2010, pp. 115–122.
C. Machado et al., "1 News and Political Information Consumption in Brazil: Mapping the First Round of the 2018 Brazilian Presidential Election on Twitter," COMPROP, Data Memo 2018.4, Oct. 2018.
"Twitter Usage Statistics - Internet Live Stats." https://www.internetlivestats.com/twitter-statistics/.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv, May 24, 2019.
Y. Chen, L. Xu, K. Liu, D. Zeng, and J. Zhao, "Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 2015, vol. 1, pp. 167–176.
H. Yan, X. Jin, X. Meng, J. Guo, and X. Cheng, "Event Detection with Multi-Order Graph Convolution and Aggregated Attention," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 5765–5769.
S. Liu, Y. Chen, K. Liu, and J. Zhao, "Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms," in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, vol. 1, pp. 1789–1798.
B. Ahmed, G. Ali, A. Hussain, A. Baseer, and J. Ahmed, "Analysis of Text Feature Extractors using Deep Learning on Fake News," Engineering, Technology & Applied Science Research, vol. 11, no. 2, pp. 7001–7005, Apr. 2021.
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality," in Advances in Neural Information Processing Systems, 2013, vol. 26, [Online]. Available: https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep Learning for Hate Speech Detection in Tweets," in Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, Perth, Australia, 2017, pp. 759–760.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of Tricks for Efficient Text Classification." arXiv, Aug. 09, 2016.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations." arXiv, Feb. 09, 2020.
W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.
Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv, Jul. 26, 2019.
X. Wang et al., "MAVEN: A Massive General Domain Event Detection Dataset." arXiv, Oct. 08, 2020.
F. Yao et al., "LEVEN: A Large-Scale Chinese Legal Event Detection Dataset." arXiv, Mar. 16, 2022.
A. Lakhfif and M. T. Laskri, "A frame-based approach for capturing semantics from Arabic text for text-to-sign language MT," International Journal of Speech Technology, vol. 19, no. 2, pp. 203–228, Jun. 2016.
A. J. McMinn, Y. Moshfeghi, and J. M. Jose, "Building a large-scale corpus for evaluating event detection on twitter," in Proceedings of the 22nd ACM international conference on Information & Knowledge Management, San Francisco, CA, USA, Oct. 2013, pp. 409–418.
Walker, Christopher, Strassel, Stephanie, Medero, Julie, and Maeda, Kazuaki, "ACE 2005 Multilingual Training Corpus." Linguistic Data Consortium, Art. no. 1572864 KB, Feb. 15, 2006.
Downloads
How to Cite
License
Copyright (c) 2024 Malika Noui, Abdelaziz Lakhfif, Mohamed Amin Laouadi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.