A Deep Visual Approach to Student Engagement Analysis Using Affective and Behavioral Cues
Received: 4 August 2025 | Revised: 16 September 2025, 8 October 2025, 10 October 2025, and 26 October 2025 | Accepted: 29 October 2025 | Online: 15 January 2026
Corresponding author: Fatima Zahra Jobbid
Abstract
Assessing student engagement in educational environments is essential to support adaptive teaching strategies and enhance learning outcomes. This study presents a deep learning-based approach for automatically predicting student engagement, leveraging both behavioral and emotional cues. The proposed method integrates features derived from facial emotion recognition and head pose estimation, capturing a comprehensive representation of student affect and attention. Using the Student Engagement Dataset, a multi-layer neural network was trained to classify engagement states based on these multimodal inputs. The proposed framework achieves an accuracy of 88% on unseen validation data, demonstrating strong effectiveness in distinguishing between engaged and disengaged students. In addition, explainability analysis highlights the importance of neutral facial expressions and head orientation as key indicators of engagement, supporting the interpretability and practical relevance of the proposed approach for real-world educational environments.
Keywords:
component, student engagement, emotion recognition, head pose estimation, deep learning, artificial intelligence in educationDownloads
References
J. A. Fredricks and W. McColskey, "The Measurement of Student Engagement: A Comparative Analysis of Various Methods and Student Self-report Instruments," in Handbook of Research on Student Engagement, S. L. Christenson, A. L. Reschly, and C. Wylie, Springer US, 2012, pp. 763–782. DOI: https://doi.org/10.1007/978-1-4614-2018-7_37
C. R. Henrie, L. R. Halverson, and C. R. Graham, "Measuring student engagement in technology-mediated learning: A review," Computers & Education, vol. 90, pp. 36–53, Dec. 2015. DOI: https://doi.org/10.1016/j.compedu.2015.09.005
B. A. Braiki, S. Harous, N. Zaki, and F. Alnajjar, "Artificial intelligence in education and assessment methods," Bulletin of Electrical Engineering and Informatics, vol. 9, no. 5, pp. 1998–2007, Oct. 2020. DOI: https://doi.org/10.11591/eei.v9i5.1984
R. A. Elsheikh, M. A. Mohamed, A. M. Abou-Taleb, and M. M. Ata, "Improved facial emotion recognition model based on a novel deep convolutional structure," Scientific Reports, vol. 14, no. 1, Nov. 2024, Art. no. 29050. DOI: https://doi.org/10.1038/s41598-024-79167-8
M. Rezaee, T. Perumal, F. M. Shiri, and E. Ahmadi, "Detection of Student Engagement in E-Learning Environments Using EfficientnetV2-L Together with RNN-Based Models," Journal on Artificial Intelligence, vol. 6, no. 1, pp. 85–103, 2024. DOI: https://doi.org/10.32604/jai.2024.048911
I. Alkabbany, A. M. Ali, C. Foreman, T. Tretter, N. Hindy, and A. Farag, "An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms," Sensors, vol. 23, no. 3, Feb. 2023, Art. no. 1614. DOI: https://doi.org/10.3390/s23031614
N. Mahmood, S. M. Bhatti, H. Dawood, M. R. Pradhan, and H. Ahmad, "Measuring Student Engagement through Behavioral and Emotional Features Using Deep-Learning Models," Algorithms, vol. 17, no. 10, Oct. 2024, Art. no. 458. DOI: https://doi.org/10.3390/a17100458
Y. Zhao, J. Xu, and X. Huang, "Multimodal Engagement Recognition by Fusing Transformer and Bi-LSTM," in Emotional Intelligence, vol. 2450, X. Huang and Q. Mao, Springer Nature Singapore, 2025, pp. 173–181. DOI: https://doi.org/10.1007/978-981-96-5084-2_12
C. C. Ma et al., "Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation." arXiv, 2024.
A. Gupta, A. D’Cunha, K. Awasthi, and V. Balasubramanian, "DAiSEE: Towards User Engagement Recognition in the Wild." arXiv, 2016.
K. Delgado et al., "Student Engagement Dataset," in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, Canada, Oct. 2021, pp. 3621–3629. DOI: https://doi.org/10.1109/ICCVW54120.2021.00405
S. Malekshahi, J. M. Kheyridoost, and O. Fatemi, "A General Model for Detecting Learner Engagement: Implementation and Evaluation." arXiv, 2024.
C. Qian, J. A. L. Marques, A. R. De Alexandria, and S. J. Fong, "Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions," Sensors, vol. 25, no. 5, Feb. 2025, Art. no. 1478. DOI: https://doi.org/10.3390/s25051478
M. Talele and R. Jain, "A Comparative Analysis of CNNs and ResNet50 for Facial Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20693–20701, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9849
Y. Nie, R. Pan, Q. Zhang, X. Xu, G. Li, and H. Cai, "Face Expression Recognition via Product-Cross Dual Attention and Neutral-Aware Anchor Loss," in Computational Visual Media, vol. 14593, F. L. Zhang and A. Sharf, Springer Nature Singapore, 2024, pp. 70–90. DOI: https://doi.org/10.1007/978-981-97-2092-7_4
W. Du, "Facial emotion recognition based on improved ResNet," Applied and Computational Engineering, vol. 21, no. 1, pp. 242–248, Oct. 2023. DOI: https://doi.org/10.54254/2755-2721/21/20231152
Y. Jin, Z. You, and N. Cai, "Simplified Inception Module Based Hadamard Attention Mechanism for Medical Image Classification," Journal of Computer and Communications, vol. 11, no. 06, pp. 1–18, 2023. DOI: https://doi.org/10.4236/jcc.2023.116001
J. Yu, Y. Liu, R. Fan, and G. Sun, "MixCut:A Data Augmentation Method for Facial Expression Recognition." arXiv, 2024.
C. Lugaresi et al., "MediaPipe: A Framework for Building Perception Pipelines." arXiv, 2019.
M. Velayuthan, A. Gawesha, P. Velayuthan, N. Kodagoda, D. Kasthurirathna, and P. Samarasinghe, "GADS: A Super Lightweight Model for Head Pose Estimation." arXiv, 2025.
X. Zhu, X. Liu, Z. Lei, and S. Z. Li, "Face Alignment in Full Pose Range: A 3D Total Solution," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 78–92, Jan. 2019. DOI: https://doi.org/10.1109/TPAMI.2017.2778152
S. Li, W. Deng, and J. Du, "Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 2584–2593. DOI: https://doi.org/10.1109/CVPR.2017.277
X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, "Face Alignment Across Large Poses: A 3D Solution." [Online]. Available: http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm.
P. Sharma et al., "Student Engagement Detection Using Emotion Analysis, Eye Tracking and Head Movement with Machine Learning," in Technology and Innovation in Learning, Teaching and Education, vol. 1720, A. Reis, J. Barroso, P. Martins, A. Jimoyiannis, R. Y. M. Huang, and R. Henriques, Springer Nature Switzerland, 2022, pp. 52–68. DOI: https://doi.org/10.1007/978-3-031-22918-3_5
X. Lu, H. Zhang, Q. Zhang, and X. Han, "Multi-Channel Expression Recognition Network Based on Channel Weighting," Applied Sciences, vol. 13, no. 3, Feb. 2023, Art. no. 1968. DOI: https://doi.org/10.3390/app13031968
R. Singh et al., "Efficientnet for Human fer using Transfer Learning," ICTACT Journal on Soft Computing, vol. 13, no. 1, pp. 2792–2797, Oct. 2022. DOI: https://doi.org/10.21917/ijsc.2022.0397
J. H. Chowdhury, Q. Liu, and S. Ramanna, "Simple Histogram Equalization Technique Improves Performance of VGG Models on Facial Emotion Recognition Datasets," Algorithms, vol. 17, no. 6, June 2024, Art. no. 238. DOI: https://doi.org/10.3390/a17060238
G. Xingang, A. Ang, D. Martinez, C. Chao, and S. Ziqi, "Facial expression recognition based on convolutional network attention mechanism," Insights of Automation in Manufacturing, vol. 1, no. 2, pp. 64–77, Oct. 2024. DOI: https://doi.org/10.59782/iam.v1i2.227
K. Wu and Z. Chen, "Enhancing Real-World Facial Expression Recognition: A Deep Learning Approach based on Attention Mechanisms," in 2023 3rd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Wuhan, China, Dec. 2023, pp. 338–342. DOI: https://doi.org/10.1109/CEI60616.2023.10527835
R. Sun, Z. Zhang, H. Liu, L. Zhao, Q. Zhou, and Z. Liu, "DacFER: Dual Attention Correction Learning for Efficient Facial Expression Recognition," in 2024 7th International Conference on Electronics Technology (ICET), Chengdu, China, May 2024, pp. 941–945. DOI: https://doi.org/10.1109/ICET61945.2024.10672990
"HPC-MARWAN." https://hpc.marwan.ma/index.php/en/.
Downloads
How to Cite
License
Copyright (c) 2025 Fatima Zahra Jobbid, Aissam Berrahou, Hassan Berbia

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
