ORBIT-CL: A Semantically-Aware and Lightweight Multimodal Transformer for Cyberbullying Detection
Received: 14 February 2026 | Revised: 28 March 2026 | Accepted: 4 April 2026 | Online: 25 April 2026
Corresponding author: Saed Alqaraleh
Abstract
Cyberbullying appears in complex forms, namely (1) syntactic obfuscations, (2) context-dependent multimodal memes, and (3) semantically ambiguous text (e.g., sarcasm or reclaimed slurs). The present study proposes a Contrastive and Lightweight Multimodal Transformer, ORBIT-CL, an end-to-end system addressing all three challenges. It combines lightweight multimodal feature extraction (Transformer-based OCR (TrOCR) and visual object tags) with a RoBERTa-based classifier. This classifier employs a dual-robustness objective: adversarial noise training alongside semantic contrastive learning to handle both syntactic attacks and intent ambiguity. The system uses multi-task heads for binary bullying detection and ordinal hostility and calibration modules to maintain stable moderation thresholds. The experimental protocol, which includes challenge sets, was designed to address semantic ambiguity. Using publicly available datasets (HateXplain and Hateful Memes), ORBIT-CL achieved highly competitive performance (0.88 Macro-F1) on the full HateXplain benchmark. To specifically validate its semantic robustness, it also achieved a 30-point F1 gain on a targeted challenge set of ambiguous content (e.g., sarcasm, reclaimed slurs), thereby addressing a key failure mode of prior models. Furthermore, by fusing lightweight visual tags as text, the introduced model achieved 0.895 Area Under the Receiver Operating Characteristic (AUROC) curve on Hateful Memes, demonstrating a highly efficient alternative to heavy multimodal baselines while retaining the efficiency of a text-only encoder. By unifying lightweight multimodal fusion with a dual (syntactic and semantic) robustness framework, ORBIT-CL contributes to deployable, context-aware, and efficient cyberbullying detection.
Keywords:
cyberbullying, content moderation, OCR, RoBERTa, multimodal memesDownloads
References
E. A. Vogels. "Teens and Cyberbullying 2022." Pew Research Center. https://www.pewresearch.org/internet/2022/12/15/teens-and-cyberbullying-2022/.
S. Arnon et al., "Association of Cyberbullying Experiences and Perpetration With Suicidality in Early Adolescence," JAMA Network Open, vol. 5, no. 6, June 2022, Art. no. e2218746.
T. Nitya Harshitha et al., "ProTect: a hybrid deep learning model for proactive detection of cyberbullying on social media," Frontiers in Artificial Intelligence, vol. 7, Mar. 2024, Art. no. 1269366.
T. Caselli, V. Basile, J. Mitrović, and M. Granitzer, "HateBERT: Retraining BERT for Abusive Language Detection in English," in Proceedings of the 5th Workshop on Online Abuse and Harms, Online, 2021, pp. 17–25.
A. A. Jamjoom, H. Karamti, M. Umer, S. Alsubai, T.-H. Kim, and I. Ashraf, "RoBERTaNET: Enhanced RoBERTa Transformer Based Model for Cyberbullying Detection With GloVe Features," IEEE Access, vol. 12, pp. 58950–58959, 2024.
V. Bansal, M. Tyagi, R. Sharma, V. Gupta, and Q. Xin, "A Transformer Based Approach for Abuse Detection in Code Mixed Indic Languages," ACM Transactions on Asian and Low-Resource Language Information Processing, Nov. 2022.
M. Macas, C. Wu, and W. Fuertes, "Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems," Expert Systems with Applications, vol. 238, Mar. 2024, Art. no. 122223.
H. Kang et al., "Developing continuous toxicity detection against increasing types of perturbed toxic text," Information Processing & Management, vol. 63, no. 2, Part B, Mar. 2026, Art. no. 104470.
M. Li et al., "TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 13094–13102, June 2023.
X.-F. Wang, Z.-H. He, K. Wang, Y.-F. Wang, L. Zou, and Z.-Z. Wu, "A survey of text detection and recognition algorithms based on deep learning technology," Neurocomputing, vol. 556, Nov. 2023, Art. no. 126702.
P. Yi and A. Zubiaga, "Cyberbullying Detection across Social Media Platforms via Platform-Aware Adversarial Encoding," Proceedings of the International AAAI Conference on Web and Social Media, vol. 16, no. 1, pp. 1430–1434, May 2022.
A. Aliyeva et al., "Toward Safer Digital Communication: A Deep Hybrid Model for Detecting Abusive Language on Social Networks," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27126–27132, Oct. 2025.
A. F. Alqahtani and M. Ilyas, "A Machine Learning Ensemble Model for the Detection of Cyberbullying," International Journal of Artificial Intelligence & Applications, vol. 15, no. 1, pp. 115–129, Jan. 2024.
B. Ogunleye and B. Dharmaraj, "The Use of a Large Language Model for Cyberbullying Detection," Analytics, vol. 2, no. 3, pp. 694–707, Sept. 2023.
A. Hamza et al., "Multimodal Religiously Hateful Social Media Memes Classification Based on Textual and Image Data," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 8, Aug. 2024, Art. no. 114.
D. Kiela et al., "The hateful memes challenge: detecting hate speech in multimodal memes," in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 2611–2624.
P. McCullagh, "Regression Models for Ordinal Data," Journal of the Royal Statistical Society: Series B (Methodological), vol. 42, no. 2, pp. 109–127, Jan. 1980.
C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, "On Calibration of Modern Neural Networks," in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1321–1330.
B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and A. Mukherjee, "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 17, pp. 14867–14875, May 2021.
Y. Fang et al., "You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection," in Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 2021, pp. 26183–26197.
M. Züfle, V. Dankers, and I. Titov, "Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study," in Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, Singapore, 2023, pp. 112–129.
J.-B. Alayrac et al., "Flamingo: A Visual Language Model for Few-Shot Learning," in 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 23716–23736.
J. Mei, J. Chen, W. Lin, B. Byrne, and M. Tomalin, "Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning," in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 2024, pp. 5333–5347.
X. Chen et al., "PaLI-X: On Scaling up a Multilingual Vision and Language Model." arXiv, May 29, 2023.
Downloads
How to Cite
License
Copyright (c) 2026 Saed Alqaraleh

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
