Domain-Adaptive Multitask BERT with Graph Context Modeling for Code-Mixed Hinglish Sentiment Classification
Received: 29 November 2025 | Revised: 27 December 2025 | Accepted: 3 January 2026 | Online: 12 January 2026
Corresponding author: Yogesh H. Bhosale
Abstract
Hinglish, a widely used Hindi–English code-mixed language on social media, presents unique challenges for sentiment analysis due to transliteration variability, script mixing, and inconsistent grammar. To address these issues, this study proposes the Cross-Lingual Domain-Adaptive Multitask Graph-Enhanced Bidirectional Encoder Representations from Transformers (CDMG-BERT), tailored for code-mixed sentiment classification. The model integrates four key components: English-to-Hinglish embedding alignment, domain-adversarial training using 92k English samples, multitask learning with auxiliary POS tagging, and graph-based token relational modeling for long-range contextual refinement. Experimental results on a 12,000-sample IIT Bombay Hinglish dataset show that CDMG-BERT outperforms mBERT, XLM-R, and deep learning baselines, with an F1-score of 84.8%. Ablation analysis suggests that each architectural module works well, with cross-lingual alignment and domain adaptation providing the largest gains. These results demonstrate that the model is robust enough to handle spelling differences, code-mixing intensity, and inconsistencies between the Roman and Devanagari scripts. This makes CDMG-BERT a good choice for sentiment analysis in a multilingual environment with limited resources.
Keywords:
Hinglish, sentiment analysis, code-mixing, cross-lingual learning, domain adaptation, transformer modelsDownloads
References
L. Nguyen, O. Mayeux, and Z. Yuan, "Code-Switching Input for Machine Translation: A Case Study of Vietnamese–English Data," International Journal of Multilingualism, vol. 21, no. 4, pp. 2268–2289, Oct. 2024. DOI: https://doi.org/10.1080/14790718.2023.2224013
M. S. U. Miah, M. M. Kabir, T. B. Sarwar, M. Safran, S. Alfarhood, and M. F. Mridha, "A Multimodal Approach to Cross-Lingual Sentiment Analysis with Ensemble of Transformer and LLM," Scientific Reports, vol. 14, no. 1, Apr. 2024, Art. no. 9603. DOI: https://doi.org/10.1038/s41598-024-60210-7
Z. Cao, Y. Zhou, A. Yang, and S. Peng, "Deep Transfer Learning Mechanism for Fine-Grained Cross-Domain Sentiment Classification," Connection Science, vol. 33, no. 4, pp. 911–928, Oct. 2021. DOI: https://doi.org/10.1080/09540091.2021.1912711
R. Nayak and R. Joshi, "L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Language Models." arXiv, 2022.
S. K. Singh, A. Sharma, Sahil, D. Singh, S. Pandit, and U. Saghir, "Sentiment Analysis of English-Hindi Code-Mixed Text Using mBERT Model," in 3rd International Conference on Inventive Computing and Informatics, Bangalore, India, June 2025, pp. 552–556. DOI: https://doi.org/10.1109/ICICI65870.2025.11069692
K. Wang, Y. Ding, and S. C. Han, "Graph Neural Networks for Text Classification: A Survey," Artificial Intelligence Review, vol. 57, no. 8, July 2024, Art. no. 190. DOI: https://doi.org/10.1007/s10462-024-10808-0
X. Liu, S. Dai, G. Fiumara, and P. De Meo, "An Adversarial Training Method for Text Classification," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, Sept. 2023, Art. no. 101697. DOI: https://doi.org/10.1016/j.jksuci.2023.101697
A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, "Pre-Trained Language Model for Code-Mixed Text in Indonesian, Javanese, and English Using Transformer," Social Network Analysis and Mining, vol. 15, no. 1, Mar. 2025, Art. no. 30. DOI: https://doi.org/10.1007/s13278-025-01444-9
Z. Chi et al., "InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 2021, pp. 3576–3588. DOI: https://doi.org/10.18653/v1/2021.naacl-main.280
M. Ulčar and M. Robnik-Šikonja, "Cross-Lingual Alignments of ELMo Contextual Embeddings," Neural Computing and Applications, vol. 34, no. 15, pp. 13043–13061, Aug. 2022. DOI: https://doi.org/10.1007/s00521-022-07164-x
F. Hou et al., "Gradient-Aware Domain-Invariant Learning for Domain Generalization," Multimedia Systems, vol. 31, no. 1, Feb. 2025, Art. no. 40. DOI: https://doi.org/10.1007/s00530-024-01613-4
L. Zhang, X. Wei, F. Yang, C. Zhao, B. Wen, and Y. Lu, "Cross-Domain Sentiment Classification with Mere Contrastive Learning and Improved Method," in 3rd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, Sept. 2024, pp. 1–10. DOI: https://doi.org/10.1109/AICIT62434.2024.10730527
N. Sharma and B. Verma, "Recent Advances in Transfer Learning for Natural Language Processing (NLP)," in A Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing, Y. B. Singh, A. D. Mishra, P. Singh, and D. K. Yadav, Eds. Bentham Science Publishers, 2024, pp. 228–254. DOI: https://doi.org/10.2174/9789815238488124020014
S. Zhang, C. Yin, and Z. Yin, "Multimodal Sentiment Recognition with Multi-Task Learning," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 1, pp. 200–209, Feb. 2023. DOI: https://doi.org/10.1109/TETCI.2022.3224929
Z. Ye, Y. J. Kumar, G. O. Sing, F. Song, and J. Wang, "A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs," IEEE Access, vol. 10, pp. 75729–75741, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3191784
A. Kunchukuttan, P. Mehta, and P. Bhattacharyya, "The IIT Bombay English–Hindi Parallel Corpus," in Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan, May 2018.
S. Choo and W. Kim, "A Study on the Evaluation of Tokenizer Performance in Natural Language Processing," Applied Artificial Intelligence, vol. 37, no. 1, Dec. 2023, Art. no. 2175112. DOI: https://doi.org/10.1080/08839514.2023.2175112
S. Perumal and K. Kathirvelu, "Enhancing the Quality of Service in Video Game Live Streaming Using Big Data Analytics with DNN Classification and BERT-Based Sentiment Analysis," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25426–25431, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11495
R. Socher et al., "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank," in Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, Oct. 2013, pp. 1631–1642. DOI: https://doi.org/10.18653/v1/D13-1170
Downloads
How to Cite
License
Copyright (c) 2026 Rahul Mishra, A. Kaliappan, D. V. Sarala, T. V. Hyma Lakshmi, Prabhat Kumar Ravi, Y. Harika Devi, Satyajee Srivastava, Deepak Asudani, Sanjana M. Nagaraj, Yogesh H. Bhosale

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
