ORBIT-CL: A Semantically-Aware and Lightweight Multimodal Transformer for Cyberbullying Detection

Saed Alqaraleh

doi:10.48084/etasr.18187

Authors

Saed Alqaraleh Department of Data Science and Artificial Intelligence, College of Information Technology, Mutah University, Karak, Jordan

Volume: 16 | Issue: 3 | Pages: 35470-35476 | June 2026 | https://doi.org/10.48084/etasr.18187

Received: 14 February 2026 | Revised: 28 March 2026 | Accepted: 4 April 2026 | Online: 25 April 2026

Corresponding author: Saed Alqaraleh

Abstract

Cyberbullying appears in complex forms, namely (1) syntactic obfuscations, (2) context-dependent multimodal memes, and (3) semantically ambiguous text (e.g., sarcasm or reclaimed slurs). The present study proposes a Contrastive and Lightweight Multimodal Transformer, ORBIT-CL, an end-to-end system addressing all three challenges. It combines lightweight multimodal feature extraction (Transformer-based OCR (TrOCR) and visual object tags) with a RoBERTa-based classifier. This classifier employs a dual-robustness objective: adversarial noise training alongside semantic contrastive learning to handle both syntactic attacks and intent ambiguity. The system uses multi-task heads for binary bullying detection and ordinal hostility and calibration modules to maintain stable moderation thresholds. The experimental protocol, which includes challenge sets, was designed to address semantic ambiguity. Using publicly available datasets (HateXplain and Hateful Memes), ORBIT-CL achieved highly competitive performance (0.88 Macro-F1) on the full HateXplain benchmark. To specifically validate its semantic robustness, it also achieved a 30-point F1 gain on a targeted challenge set of ambiguous content (e.g., sarcasm, reclaimed slurs), thereby addressing a key failure mode of prior models. Furthermore, by fusing lightweight visual tags as text, the introduced model achieved 0.895 Area Under the Receiver Operating Characteristic (AUROC) curve on Hateful Memes, demonstrating a highly efficient alternative to heavy multimodal baselines while retaining the efficiency of a text-only encoder. By unifying lightweight multimodal fusion with a dual (syntactic and semantic) robustness framework, ORBIT-CL contributes to deployable, context-aware, and efficient cyberbullying detection.

Keywords:

cyberbullying, content moderation, OCR, RoBERTa, multimodal memes

References

E. A. Vogels. "Teens and Cyberbullying 2022." Pew Research Center. https://www.pewresearch.org/internet/2022/12/15/teens-and-cyberbullying-2022/.

S. Arnon et al., "Association of Cyberbullying Experiences and Perpetration With Suicidality in Early Adolescence," JAMA Network Open, vol. 5, no. 6, June 2022, Art. no. e2218746.

T. Nitya Harshitha et al., "ProTect: a hybrid deep learning model for proactive detection of cyberbullying on social media," Frontiers in Artificial Intelligence, vol. 7, Mar. 2024, Art. no. 1269366.

T. Caselli, V. Basile, J. Mitrović, and M. Granitzer, "HateBERT: Retraining BERT for Abusive Language Detection in English," in Proceedings of the 5th Workshop on Online Abuse and Harms, Online, 2021, pp. 17–25.

A. A. Jamjoom, H. Karamti, M. Umer, S. Alsubai, T.-H. Kim, and I. Ashraf, "RoBERTaNET: Enhanced RoBERTa Transformer Based Model for Cyberbullying Detection With GloVe Features," IEEE Access, vol. 12, pp. 58950–58959, 2024.

V. Bansal, M. Tyagi, R. Sharma, V. Gupta, and Q. Xin, "A Transformer Based Approach for Abuse Detection in Code Mixed Indic Languages," ACM Transactions on Asian and Low-Resource Language Information Processing, Nov. 2022.

M. Macas, C. Wu, and W. Fuertes, "Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems," Expert Systems with Applications, vol. 238, Mar. 2024, Art. no. 122223.

H. Kang et al., "Developing continuous toxicity detection against increasing types of perturbed toxic text," Information Processing & Management, vol. 63, no. 2, Part B, Mar. 2026, Art. no. 104470.

M. Li et al., "TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 13094–13102, June 2023.

X.-F. Wang, Z.-H. He, K. Wang, Y.-F. Wang, L. Zou, and Z.-Z. Wu, "A survey of text detection and recognition algorithms based on deep learning technology," Neurocomputing, vol. 556, Nov. 2023, Art. no. 126702.

P. Yi and A. Zubiaga, "Cyberbullying Detection across Social Media Platforms via Platform-Aware Adversarial Encoding," Proceedings of the International AAAI Conference on Web and Social Media, vol. 16, no. 1, pp. 1430–1434, May 2022.

A. Aliyeva et al., "Toward Safer Digital Communication: A Deep Hybrid Model for Detecting Abusive Language on Social Networks," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27126–27132, Oct. 2025.

A. F. Alqahtani and M. Ilyas, "A Machine Learning Ensemble Model for the Detection of Cyberbullying," International Journal of Artificial Intelligence & Applications, vol. 15, no. 1, pp. 115–129, Jan. 2024.

B. Ogunleye and B. Dharmaraj, "The Use of a Large Language Model for Cyberbullying Detection," Analytics, vol. 2, no. 3, pp. 694–707, Sept. 2023.

A. Hamza et al., "Multimodal Religiously Hateful Social Media Memes Classification Based on Textual and Image Data," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 8, Aug. 2024, Art. no. 114.

D. Kiela et al., "The hateful memes challenge: detecting hate speech in multimodal memes," in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 2611–2624.

P. McCullagh, "Regression Models for Ordinal Data," Journal of the Royal Statistical Society: Series B (Methodological), vol. 42, no. 2, pp. 109–127, Jan. 1980.

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, "On Calibration of Modern Neural Networks," in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1321–1330.

B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and A. Mukherjee, "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 17, pp. 14867–14875, May 2021.

Y. Fang et al., "You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection," in Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 2021, pp. 26183–26197.

M. Züfle, V. Dankers, and I. Titov, "Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study," in Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, Singapore, 2023, pp. 112–129.

J.-B. Alayrac et al., "Flamingo: A Visual Language Model for Few-Shot Learning," in 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 23716–23736.

J. Mei, J. Chen, W. Lin, B. Byrne, and M. Tomalin, "Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning," in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 2024, pp. 5333–5347.

X. Chen et al., "PaLI-X: On Scaling up a Multilingual Vision and Language Model." arXiv, May 29, 2023.