Adaptive Evidential Fusion for Hateful Meme Classification Utilizing the Dempster–Shafer Theory

Soukaina Fatimi; Wafae Sabbar; Abdelkrim Bekkhoucha

doi:10.48084/etasr.15931

Authors

Soukaina Fatimi Machine Intelligence Laboratory (LIM), Faculty of Sciences and Techniques, Mohammedia, Hassan II University of Casablanca, Morocco
Wafae Sabbar Machine Intelligence Laboratory (LIM), Faculty of Sciences and Techniques, Mohammedia, Hassan II University of Casablanca, Morocco
Abdelkrim Bekkhoucha Machine Intelligence Laboratory (LIM), Faculty of Sciences and Techniques, Mohammedia, Hassan II University of Casablanca, Morocco

Volume: 16 | Issue: 2 | Pages: 33171-33178 | April 2026 | https://doi.org/10.48084/etasr.15931

Received: 1 November 2025 | Revised: 18 December 2025 | Accepted: 6 January 2026 | Online: 9 February 2026
Corresponding author: Soukaina Fatimi

Abstract

Younger generations now frequently communicate through memes, which combine images and text to express humor, emotions, or opinions. However, memes can become a serious problem on social media when they promote hateful, discriminatory, or offensive content. Detecting such hostile memes remains particularly challenging due to the complex interaction between visual and textual cues. In this work, we propose a simple yet effective multimodal fusion approach, named Multimodal Hateful Meme Classification via Dempster–Shafer Evidence Theory Fusion (MHM-DS), for hateful meme detection. Instead of training a large vision–language model, we perform late fusion of independent unimodal classifiers— Bidirectional Encoder Representations from Transformers (BERT) for text and Contrastive Language–Image Pretraining (CLIP) for images—by combining their probabilistic outputs using Dempster–Shafer Evidence Theory (DST) evidential reasoning. The proposed method explicitly models uncertainty and conflict between modalities through belief masses and an ignorance term. Experiments conducted on the Facebook AI Hateful Memes dataset (10,000 samples) show that the proposed DST-based fusion achieves 70.2% accuracy and a 70.8% Area Under the Receiver Operating Characteristic Curve (AUROC), outperforming standard late-fusion baselines and unimodal models, while remaining computationally efficient and interpretable. These results demonstrate that evidential fusion provides a robust and uncertainty-aware alternative to complex multimodal transformers for hateful meme classification.

Keywords:

multimodal classification, Dempster–Shafer Evidence Theory (DST), late fusion, hateful meme detection

Downloads

Download data is not yet available.

References

D. Kiela et al., "The hateful memes challenge: detecting hate speech in multimodal memes," in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 2611–2624.

Y.-C. Chen et al., "UNITER: UNiversal Image-TExt Representation Learning," in 16th European Conference on Computer Vision, Virtual Event, 2020, pp. 104–120. DOI: https://doi.org/10.1007/978-3-030-58577-8_7

L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, "VisualBERT: A Simple and Performant Baseline for Vision and Language." arXiv, Aug. 09, 2019.

J. Lu, D. Batra, D. Parikh, and S. Lee, "ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks," in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 13–23.

G. K. Kumar and K. Nandakumar, "Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features," in Proceedings of the Second Workshop on NLP for Positive Impact, Abu Dhabi, United Arab Emirates (Hybrid), 2022, pp. 171–183. DOI: https://doi.org/10.18653/v1/2022.nlp4pi-1.20

A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 8748–8763.

H. Chen et al., "A self-learning multimodal approach for fake news detection," Frontiers in Artificial Intelligence, vol. 8, Nov. 2025, Art. no. 1665798. DOI: https://doi.org/10.3389/frai.2025.1665798

V. Ravi and A. S. Poornima, "SecMa: A Novel Multimodal Autoencoder Framework for Encrypted IoT Traffic Analysis and Attack Detection," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23020–23026, June 2025. DOI: https://doi.org/10.48084/etasr.10336

H. Kumar and M. Aruldoss, "Gated Cross-Modal Fusion Mechanism for Audio-Video-based Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20835–20841, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9430

S. Fatimi, W. Sabbar, and A. Bekkhoucha, "Toward a Dual Attention Model for Image-Text Sentiment Classification," in 2023 IEEE Afro-Mediterranean Conference on Artificial Intelligence, Hammamet, Tunisia, 2023, pp. 01–07. DOI: https://doi.org/10.1109/AMCAI59331.2023.10431498

T. Denœux, "Decision-making with belief functions: A review," International Journal of Approximate Reasoning, vol. 109, pp. 87–110, June 2019. DOI: https://doi.org/10.1016/j.ijar.2019.03.009

L. A. Zadeh, "A Simple View of the Dempster-Shafer Theory of Evidence and its Implication for the Rule of Combination," AI Magazine, vol. 7, no. 2, pp. 85–90, 1986.

Y. Wu, X. Liu, and L. Guo, "A New Ensemble Clustering Method Based on Dempster-Shafer Evidence Theory and Gaussian Mixture Modeling," in 21th International Conference on Neural Information Processing (Proceedings, Part II), Kuching, Malaysia, 2014, pp. 1–8. DOI: https://doi.org/10.1007/978-3-319-12640-1_1

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019, pp. 4171–4186. DOI: https://doi.org/10.18653/v1/N19-1423

Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv, July 26, 2019.

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv, Mar. 01, 2020.

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778. DOI: https://doi.org/10.1109/CVPR.2016.90

M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6105–6114.

R. Velioglu and J. Rose, "Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge." arXiv, Dec. 23, 2020.

F. Wu, G. Chen, J. Cao, Y. Yan, and Z. Li, "Multimodal Hateful Meme Classification Based on Transfer Learning and a Cross-Mask Mechanism," Electronics, vol. 13, no. 14, July 2024, Art. no. 2780. DOI: https://doi.org/10.3390/electronics13142780