Domain-Adaptive Multitask BERT with Graph Context Modeling for Code-Mixed Hinglish Sentiment Classification

Rahul Mishra; A. Kaliappan; Sarala D. V.; T. V. Hyma Lakshmi; Prabhat Kumar Ravi; Y. Harika Devi; Satyajee Srivastava; Deepak Suresh Asudani; Sanjana M. Nagaraj; Yogesh H. Bhosale

doi:10.48084/etasr.16589

Authors

Rahul Mishra Mechanical Engineering Department, Rungta International Skills University, Bhilai Chhattisgarh, 490024, India | Mechanical Engineering Department, Rungta College of Engineering and Technology, Bhilai Chhattisgarh, 490024, India
A. Kaliappan School of Computing, SRM Institute of Science and Technology, Tiruchirappalli, Tamil Nadu, India
Sarala D. V. Computer Science and Engineering Department, BMS College of Engineering, Bangalore, India
T. V. Hyma Lakshmi Department of Electronics and Communication Engineering, Sagi Rama Krishnam Raju Engineering College, Bhimavaram, Andhra Pradesh, 534204, India
Prabhat Kumar Ravi University Institute of Computing, Chandigarh University, Mohali, Punjab, India https://orcid.org/0009-0008-1894-6052
Y. Harika Devi Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Hyderabad, Telangana, India
Satyajee Srivastava Department of Computer Science and Engineering, School of Engineering and Technology, Manav Rachna International Institute of Research and Studies (Deemed to be University), Faridabad, Haryana, India
Deepak Suresh Asudani Symbiosis Institute of Technology, Nagpur Campus, Symbiosis International (Deemed University), Pune, India
Sanjana M. Nagaraj Department of Computer Science and Business Systems, Dayananda Sagar College of Engineering, Bengaluru, India
Yogesh H. Bhosale Department of Computer Science and Engineering, CSMSS CHH. Shahu College of Engineering, Chhatrapati Sambhajinagar, India

Volume: 16 | Issue: 1 | Pages: 32336-32341 | February 2026 | https://doi.org/10.48084/etasr.16589

Received: 29 November 2025 | Revised: 27 December 2025 | Accepted: 3 January 2026 | Online: 12 January 2026

Corresponding author: Yogesh H. Bhosale

Abstract

Hinglish, a widely used Hindi–English code-mixed language on social media, presents unique challenges for sentiment analysis due to transliteration variability, script mixing, and inconsistent grammar. To address these issues, this study proposes the Cross-Lingual Domain-Adaptive Multitask Graph-Enhanced Bidirectional Encoder Representations from Transformers (CDMG-BERT), tailored for code-mixed sentiment classification. The model integrates four key components: English-to-Hinglish embedding alignment, domain-adversarial training using 92k English samples, multitask learning with auxiliary POS tagging, and graph-based token relational modeling for long-range contextual refinement. Experimental results on a 12,000-sample IIT Bombay Hinglish dataset show that CDMG-BERT outperforms mBERT, XLM-R, and deep learning baselines, with an F1-score of 84.8%. Ablation analysis suggests that each architectural module works well, with cross-lingual alignment and domain adaptation providing the largest gains. These results demonstrate that the model is robust enough to handle spelling differences, code-mixing intensity, and inconsistencies between the Roman and Devanagari scripts. This makes CDMG-BERT a good choice for sentiment analysis in a multilingual environment with limited resources.

Keywords:

Hinglish, sentiment analysis, code-mixing, cross-lingual learning, domain adaptation, transformer models

Downloads

Download data is not yet available.

References

L. Nguyen, O. Mayeux, and Z. Yuan, "Code-Switching Input for Machine Translation: A Case Study of Vietnamese–English Data," International Journal of Multilingualism, vol. 21, no. 4, pp. 2268–2289, Oct. 2024. DOI: https://doi.org/10.1080/14790718.2023.2224013

M. S. U. Miah, M. M. Kabir, T. B. Sarwar, M. Safran, S. Alfarhood, and M. F. Mridha, "A Multimodal Approach to Cross-Lingual Sentiment Analysis with Ensemble of Transformer and LLM," Scientific Reports, vol. 14, no. 1, Apr. 2024, Art. no. 9603. DOI: https://doi.org/10.1038/s41598-024-60210-7

Z. Cao, Y. Zhou, A. Yang, and S. Peng, "Deep Transfer Learning Mechanism for Fine-Grained Cross-Domain Sentiment Classification," Connection Science, vol. 33, no. 4, pp. 911–928, Oct. 2021. DOI: https://doi.org/10.1080/09540091.2021.1912711

R. Nayak and R. Joshi, "L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Language Models." arXiv, 2022.

S. K. Singh, A. Sharma, Sahil, D. Singh, S. Pandit, and U. Saghir, "Sentiment Analysis of English-Hindi Code-Mixed Text Using mBERT Model," in 3rd International Conference on Inventive Computing and Informatics, Bangalore, India, June 2025, pp. 552–556. DOI: https://doi.org/10.1109/ICICI65870.2025.11069692

K. Wang, Y. Ding, and S. C. Han, "Graph Neural Networks for Text Classification: A Survey," Artificial Intelligence Review, vol. 57, no. 8, July 2024, Art. no. 190. DOI: https://doi.org/10.1007/s10462-024-10808-0

X. Liu, S. Dai, G. Fiumara, and P. De Meo, "An Adversarial Training Method for Text Classification," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, Sept. 2023, Art. no. 101697. DOI: https://doi.org/10.1016/j.jksuci.2023.101697

A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, "Pre-Trained Language Model for Code-Mixed Text in Indonesian, Javanese, and English Using Transformer," Social Network Analysis and Mining, vol. 15, no. 1, Mar. 2025, Art. no. 30. DOI: https://doi.org/10.1007/s13278-025-01444-9

Z. Chi et al., "InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 2021, pp. 3576–3588. DOI: https://doi.org/10.18653/v1/2021.naacl-main.280

M. Ulčar and M. Robnik-Šikonja, "Cross-Lingual Alignments of ELMo Contextual Embeddings," Neural Computing and Applications, vol. 34, no. 15, pp. 13043–13061, Aug. 2022. DOI: https://doi.org/10.1007/s00521-022-07164-x

F. Hou et al., "Gradient-Aware Domain-Invariant Learning for Domain Generalization," Multimedia Systems, vol. 31, no. 1, Feb. 2025, Art. no. 40. DOI: https://doi.org/10.1007/s00530-024-01613-4

L. Zhang, X. Wei, F. Yang, C. Zhao, B. Wen, and Y. Lu, "Cross-Domain Sentiment Classification with Mere Contrastive Learning and Improved Method," in 3rd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, Sept. 2024, pp. 1–10. DOI: https://doi.org/10.1109/AICIT62434.2024.10730527

N. Sharma and B. Verma, "Recent Advances in Transfer Learning for Natural Language Processing (NLP)," in A Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing, Y. B. Singh, A. D. Mishra, P. Singh, and D. K. Yadav, Eds. Bentham Science Publishers, 2024, pp. 228–254. DOI: https://doi.org/10.2174/9789815238488124020014

S. Zhang, C. Yin, and Z. Yin, "Multimodal Sentiment Recognition with Multi-Task Learning," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 1, pp. 200–209, Feb. 2023. DOI: https://doi.org/10.1109/TETCI.2022.3224929

Z. Ye, Y. J. Kumar, G. O. Sing, F. Song, and J. Wang, "A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs," IEEE Access, vol. 10, pp. 75729–75741, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3191784

A. Kunchukuttan, P. Mehta, and P. Bhattacharyya, "The IIT Bombay English–Hindi Parallel Corpus," in Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan, May 2018.

S. Choo and W. Kim, "A Study on the Evaluation of Tokenizer Performance in Natural Language Processing," Applied Artificial Intelligence, vol. 37, no. 1, Dec. 2023, Art. no. 2175112. DOI: https://doi.org/10.1080/08839514.2023.2175112

S. Perumal and K. Kathirvelu, "Enhancing the Quality of Service in Video Game Live Streaming Using Big Data Analytics with DNN Classification and BERT-Based Sentiment Analysis," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25426–25431, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11495

R. Socher et al., "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank," in Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, Oct. 2013, pp. 1631–1642. DOI: https://doi.org/10.18653/v1/D13-1170