Adaptive Evidential Fusion for Hateful Meme Classification Utilizing the Dempster–Shafer Theory
Corresponding author: Soukaina Fatimi
Abstract
Younger generations now frequently communicate through memes, which combine images and text to express humor, emotions, or opinions. However, memes can become a serious problem on social media when they promote hateful, discriminatory, or offensive content. Detecting such hostile memes remains particularly challenging due to the complex interaction between visual and textual cues. In this work, we propose a simple yet effective multimodal fusion approach, named Multimodal Hateful Meme Classification via Dempster–Shafer Evidence Theory Fusion (MHM-DS), for hateful meme detection. Instead of training a large vision–language model, we perform late fusion of independent unimodal classifiers— Bidirectional Encoder Representations from Transformers (BERT) for text and Contrastive Language–Image Pretraining (CLIP) for images—by combining their probabilistic outputs using Dempster–Shafer Evidence Theory (DST) evidential reasoning. The proposed method explicitly models uncertainty and conflict between modalities through belief masses and an ignorance term. Experiments conducted on the Facebook AI Hateful Memes dataset (10,000 samples) show that the proposed DST-based fusion achieves 70.2% accuracy and a 70.8% Area Under the Receiver Operating Characteristic Curve (AUROC), outperforming standard late-fusion baselines and unimodal models, while remaining computationally efficient and interpretable. These results demonstrate that evidential fusion provides a robust and uncertainty-aware alternative to complex multimodal transformers for hateful meme classification.
Keywords:
multimodal classification, Dempster–Shafer Evidence Theory (DST), late fusion, hateful meme detectionDownloads
References
D. Kiela et al., "The hateful memes challenge: detecting hate speech in multimodal memes," in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 2611–2624.
Y.-C. Chen et al., "UNITER: UNiversal Image-TExt Representation Learning," in 16th European Conference on Computer Vision, Virtual Event, 2020, pp. 104–120.
L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, "VisualBERT: A Simple and Performant Baseline for Vision and Language." arXiv, Aug. 09, 2019.
J. Lu, D. Batra, D. Parikh, and S. Lee, "ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks," in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 13–23.
G. K. Kumar and K. Nandakumar, "Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features," in Proceedings of the Second Workshop on NLP for Positive Impact, Abu Dhabi, United Arab Emirates (Hybrid), 2022, pp. 171–183.
A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 8748–8763.
H. Chen et al., "A self-learning multimodal approach for fake news detection," Frontiers in Artificial Intelligence, vol. 8, Nov. 2025, Art. no. 1665798.
V. Ravi and A. S. Poornima, "SecMa: A Novel Multimodal Autoencoder Framework for Encrypted IoT Traffic Analysis and Attack Detection," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23020–23026, June 2025.
H. Kumar and M. Aruldoss, "Gated Cross-Modal Fusion Mechanism for Audio-Video-based Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20835–20841, Apr. 2025.
S. Fatimi, W. Sabbar, and A. Bekkhoucha, "Toward a Dual Attention Model for Image-Text Sentiment Classification," in 2023 IEEE Afro-Mediterranean Conference on Artificial Intelligence, Hammamet, Tunisia, 2023, pp. 01–07.
T. Denœux, "Decision-making with belief functions: A review," International Journal of Approximate Reasoning, vol. 109, pp. 87–110, June 2019.
L. A. Zadeh, "A Simple View of the Dempster-Shafer Theory of Evidence and its Implication for the Rule of Combination," AI Magazine, vol. 7, no. 2, pp. 85–90, 1986.
Y. Wu, X. Liu, and L. Guo, "A New Ensemble Clustering Method Based on Dempster-Shafer Evidence Theory and Gaussian Mixture Modeling," in 21th International Conference on Neural Information Processing (Proceedings, Part II), Kuching, Malaysia, 2014, pp. 1–8.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019, pp. 4171–4186.
Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv, July 26, 2019.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv, Mar. 01, 2020.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6105–6114.
R. Velioglu and J. Rose, "Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge." arXiv, Dec. 23, 2020.
F. Wu, G. Chen, J. Cao, Y. Yan, and Z. Li, "Multimodal Hateful Meme Classification Based on Transfer Learning and a Cross-Mask Mechanism," Electronics, vol. 13, no. 14, July 2024, Art. no. 2780.
Downloads
How to Cite
License
Copyright (c) 2026 Soukaina Fatimi, Wafae Sabbar, Abdelkrim Bekkhoucha

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
