Analysis of Text Feature Extractors using Deep Learning on Fake News
Social media and easy internet access have allowed the instant sharing of news, ideas, and information on a global scale. However, rapid spread and instant access to information/news can also enable rumors or fake news to spread very easily and rapidly. In order to monitor and minimize the spread of fake news in the digital community, fake news detection using Natural Language Processing (NLP) has attracted significant attention. In NLP, different text feature extractors and word embeddings are used to process the text data. The aim of this paper is to analyze the performance of a fake news detection model based on neural networks using 3 feature extractors: TD-IDF vectorizer, Glove embeddings, and BERT embeddings. For the evaluation, multiple metrics, namely accuracy, precision, F1, recall, AUC ROC, and AUC PR were computed for each feature extractor. All the transformation techniques were fed to the deep learning model. It was found that BERT embeddings for text transformation delivered the best performance. TD-IDF has been performed far better than Glove and competed the BERT as well at some stages.
Keywords:fake news, natural language processing, feature extractors, deep learning
T. Lima-Quintanilha, M. Torres-da-Silva, and T. Lapa, "Fake news and its impact on trust in the news. Using the Portuguese case to establish lines of differentiation," Communication & Society, vol. 32, no. 3, pp. 17-32, Apr. 2019. https://doi.org/10.15581/003.32.3.17-32
R. Baly, G. Karadzhov, D. Alexandrov, J. Glass, and P. Nakov, "Predicting Factuality of Reporting and Bias of News Media Sources," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, Oct. 2018, pp. 3528-3539. https://doi.org/10.18653/v1/D18-1389
R. Zellers et al., "Defending Against Neural Fake News," arXiv:1905.12616 [cs], Dec. 2020, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1905.12616.
H. Ahmed, I. Traore, and S. Saad, "Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques," in Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, Vancouver, Canada, Oct. 2017, pp. 127-138. https://doi.org/10.1007/978-3-319-69155-8_9
B. A. Asaad and M. Erascu, "A Tool for Fake News Detection," in 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, Sep. 2018, pp. 379-386. https://doi.org/10.1109/SYNASC.2018.00064
Abdullah-All-Tanvir, E. M. Mahir, S. Akhter, and M. R. Huq, "Detecting Fake News using Machine Learning and Deep Learning Algorithms," in 2019 7th International Conference on Smart Computing Communications (ICSCC), Sarawak, Malaysia, Jun. 2019, pp. 1-5. https://doi.org/10.1109/ICSCC.2019.8843612
S. Sangamnerkar, R. Srinivasan, M. R. Christhuraj, and R. Sukumaran, "An Ensemble Technique to Detect Fabricated News Article Using Machine Learning and Natural Language Processing Techniques," in 2020 International Conference for Emerging Technology (INCET), Belgaum, India, Jun. 2020. https://doi.org/10.1109/INCET49848.2020.9154053
D. Chopra, N. Joshi, and I. Mathur, "Improving Translation Quality By Using Ensemble Approach," Engineering, Technology & Applied Science Research, vol. 8, no. 6, pp. 3512-3514, Dec. 2018. https://doi.org/10.48084/etasr.2269
M. Biniz, S. Boukil, F. Adnani, L. Cherrat, and A. Moutaouakkil, "Arabic Text Classification Using Deep Learning Technics," International Journal of Grid and Distributed Computing, vol. 11, no. 9, pp. 103-114, Sep. 2018. https://doi.org/10.14257/ijgdc.2018.11.9.09
A. Hassan and A. Mahmood, "Efficient Deep Learning Model for Text Classification Based on Recurrent and Convolutional Layers," in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, Dec. 2017, pp. 1108-1113. https://doi.org/10.1109/ICMLA.2017.00009
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv:1810.04805 [cs], May 2019, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1810.04805.
J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532-1543. https://doi.org/10.3115/v1/D14-1162
F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," The Journal of Machine Learning Research, vol. 12, pp. 2825-2830, Nov. 2011.
"Fake News: Balanced dataset for fake news analysis," Kaggle. https://kaggle.com/hassanamin/textdb3 (accessed Mar. 19, 2021).
"Fake news: Fake News Classifier Using Bidirectional LSTM," Kaggle. https://kaggle.com/saratchendra/fake-news (accessed Mar. 19, 2021).
H. Christian, M. P. Agus, and D. Suhartono, "Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)," ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, Dec. 2016. https://doi.org/10.21512/comtech.v7i4.3746
B. Trstenjak, S. Mikac, and D. Donko, "KNN with TF-IDF based Framework for Text Categorization," Procedia Engineering, vol. 69, pp. 1356-1364, Jan. 2014. https://doi.org/10.1016/j.proeng.2014.03.129
W. K. Sari, D. P. Rini, and R. F. Malik, "Text Classification Using Long Short-Term Memory With GloVe Features," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 5, no. 2, pp. 85-100, Dec. 2019. https://doi.org/10.26555/jiteki.v5i2.15021
U. Khan, K. Khan, F. Hassan, A. Siddiqui, and M. Afaq, "Towards Achieving Machine Comprehension Using Deep Learning on Non-GPU Machines," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4423-4427, Aug. 2019. https://doi.org/10.48084/etasr.2734
T. Wolf et al., "Transformers: State-of-the-Art Natural Language Processing," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, Oct. 2020, pp. 38-45.
M. Zaheer et al., "Big Bird: Transformers for Longer Sequences," arXiv:2007.14062 [cs, stat], Jan. 2021, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/2007.14062.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781 [cs], Sep. 2013, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1301.3781.
S. Liu, H. Tao, and S. Feng, "Text Classification Research Based on Bert Model and Bayesian Network," in 2019 Chinese Automation Congress (CAC), Hangzhou, China, Nov. 2019, pp. 5842-5846. https://doi.org/10.1109/CAC48633.2019.8996183
A. Hussain, G. Ali, F. Akhtar, Z. H. Khand, and A. Ali, "Design and Analysis of News Category Predictor," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6380-6385, Oct. 2020. https://doi.org/10.48084/etasr.3825
How to Cite
MetricsAbstract Views: 372
PDF Downloads: 222
Copyright (c) 2021 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.