Distinguishing Arabic GenAI-generated Tweets and Human Tweets utilizing Machine Learning
Received: 27 June 2024 | Revised: 13 July 2024 and 21 July 204 | Accepted: 24 July 2024 | Online: 12 August 2024
Corresponding author: Noura Saad Alghamdi
Abstract
Generative Artificial Intelligence (GenAI) tools, like ChatGPT, have made it easy to create text, music, images, and other types of media. GenAI, a type of AI technology, has rapidly gained fame and popularity for its ability to generate new content. Notably, its applications allow anyone to produce natural conversations and content, making it increasingly challenging to distinguish between human-written and GenAI-generated material. The current research focuses on Arabic content to differentiate GenAI-generated content from authentic human-written content on the X platform (Twitter). Datasets from both real human-written tweets and GenAI-generated tweets were collected. Then, three Machine Learning models were built to predict whether a tweet source is GenAI-generated or human-written. The highest achieved accuracy was 93%.
Keywords:
Twitter, GenAI, Machine Learning (ML), OpenAI, ChatGPTDownloads
References
H. Yu and Y. Guo, "Generative artificial intelligence empowers educational reform: current status, issues, and prospects," Frontiers in Education, vol. 8, Jun. 2023, Art. no. 1183162.
I. Augenstein et al., "Factuality Challenges in the Era of Large Language Models." arXiv, Oct. 09, 2023.
R. Shijaku and E. Canhasi, "ChatGPT Generated Text Detection," Jan. 2023.
I. Katib, F. Y. Assiri, H. A. Abdushkour, D. Hamed, and M. Ragab, "Differentiating Chat Generative Pretrained Transformer from Humans: Detecting ChatGPT-Generated Text and Human Text Using Machine Learning," Mathematics, vol. 11, no. 15, Jan. 2023, Art. no. 3400.
R. Tang, Y.-N. Chuang, and X. Hu, "The Science of Detecting LLM-Generated Text," Communications of the ACM, vol. 67, no. 4, pp. 50–59, Nov. 2024.
N. Islam, D. Sutradhar, H. Noor, J. T. Raya, M. T. Maisha, and D. M. Farid, "Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning." arXiv, May 26, 2023.
Y. Chen, H. Kang, V. Zhai, L. Li, R. Singh, and B. Raj, "GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content." arXiv, May 17, 2023.
N. Islam, D. Sutradhar, H. Noor, J. T. Raya, M. T. Maisha, and D. M. Farid, "Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning." arXiv, May 26, 2023.
C. Vasilatos, M. Alam, T. Rahwan, Y. Zaki, and M. Maniatakos, "HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis." arXiv, Jun. 07, 2023.
S. Chakraborty, A. S. Bedi, S. Zhu, B. An, D. Manocha, and F. Huang, "On the Possibilities of AI-Generated Text Detection." arXiv, Oct. 02, 2023.
K. Hayawi, S. Shahriar, and S. Mathew, "The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models," Jul. 2023, [Online]. Available: https://www.researchgate.net/publication/
_The_Imitation_Game_Detecting_Human_and_AI-Generated_Texts_in_the_Era_of_Large_Language_Models.
M. Perkins et al., "GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education." arXiv, Mar. 28, 2024.
M. Fattah and M. A. Haq, "Tweet Prediction for Social Media using Machine Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14698–14703, Jun. 2024.
M. Madhukar and S. Verma, "Hybrid Semantic Analysis of Tweets: A Case Study of Tweets on Girl-Child in India," Engineering, Technology & Applied Science Research, vol. 7, no. 5, pp. 2014–2016, Oct. 2017.
H. Zhang and H. Shao, "Exploring the Latest Applications of OpenAI and ChatGPT: An In-Depth Survey," Computer Modeling in Engineering & Sciences, vol. 138, no. 3, pp. 2061–2102, 2024.
K. Darwish and H. Mubarak, "Farasa: A New Fast and Accurate Arabic Word Segmenter," in Tenth InternationalConference on Language Resources and Evaluation, Portoroz, Slovenia, Dec. 2016, pp. 1070–1074.
"Word Segmentation Module," Farasa. https://farasa.qcri.org/segmentation/.
Downloads
How to Cite
License
Copyright (c) 2024 Noura Saad Alghamdi, Jalal Suliman Alowibdi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.