Prediction of Higher Education Student Dropout based on Regularized Regression Models
Received: 6 August 2024 | Revised: 7 September 2024 | Accepted: 22 September 2024 | Online: 7 November 2024
Corresponding author: Bouchra Bouihi
Abstract
This study explores the critical topic of student dropout in higher education institutions. To allow early and precise interventions and to provide a multifaceted view of student performance, this study combined two predictive models for dropout classification and score prediction. At first, a logistic regression model was developed to predict student dropout at an early stage. Then, to enhance dropout prediction, a second-degree polynomial regression model was used to predict student results based on available academic variables (access, tests, exams, projects, and assignments) from a Moodle course. Dealing with a limited dataset is a key challenge due to the high risk of overfitting. To address this issue and achieve a balance between overfitting, data size, and model complexity, the predictive models were evaluated with L1 (Lasso) and L2 (Ridge) regularization terms. The regularization techniques of the predictive models led to an accuracy of up to 89% and an R2 score of up to 86%.
Keywords:
logistic regression, polynomial regression, regularization, dropout prediction, lasso, ridgeDownloads
References
M. Alsuwaiket, A. H. Blasi, and R. A. Al-Msie’deen, "Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education," Engineering, Technology & Applied Science Research, vol. 9, no. 3, pp. 4287–4291, Jun. 2019.
B. Alsubhi et al., "Effective Feature Prediction Models for Student Performance," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11937–11944, Oct. 2023.
S. Kim, E. Choi, Y.-K. Jun, and S. Lee, "Student Dropout Prediction for University with High Precision and Recall," Applied Sciences, vol. 13, no. 10, Jan. 2023, Art. no. 6275.
W. Hämäläinen and M. Vinni, "Classifiers for Educational Data Mining," in Handbook of Educational Data Mining, CRC Press, 2010.
L. Ji, X. Zhang, and L. Zhang, "Research on the Algorithm of Education Data Mining Based on Big Data," in 2020 IEEE 2nd International Conference on Computer Science and Educational Informatization (CSEI), Xinxiang, China, Jun. 2020, pp. 344–350.
A. E. Hoerl and R. W. Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems," Technometrics, vol. 12, no. 1, pp. 55–67, Feb. 1970.
J. Kabathova and M. Drlik, "Towards Predicting Student’s Dropout in University Courses Using Different Machine Learning Techniques," Applied Sciences, vol. 11, no. 7, Jan. 2021, Art. no. 3130.
S. Halawa, D. Greene, and J. Mitchell, "Dropout Prediction in MOOCs using Learner Activity Features," eLearning Papers, no. 37 (This special issue of the eLearning Papers is based on the contributions made to the EMOOCS 2014 con), 2014.
F. Ennibras, E. S. Aoula, and B. Bouihi, "AI in Preventing Dropout in Distance Higher Education: A Systematic Literature Review," in 2024 4th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), FEZ, Morocco, May 2024, pp. 1–7.
J. J. da Silva and N. T. Roman, "Predicting Dropout in Higher Education: a Systematic Review," in Simpósio Brasileiro de Informática na Educação (SBIE), Nov. 2021, pp. 1107–1117.
G. Gray, C. McGuinness, and P. Owende, "An application of classification models to predict learner progression in tertiary education," in 2014 IEEE International Advance Computing Conference (IACC), Gurgaon, India, Feb. 2014, pp. 549–554.
B. R. Cuji Chacha, W. L. Gavilanes López, V. X. Vicente Guerrero, and W. G. Villacis Villacis, "Student Dropout Model Based on Logistic Regression," in Applied Technologies, Quito, Ecuador, 2020, pp. 321–333.
M. Vaarma and H. Li, "Predicting student dropouts with machine learning: An empirical study in Finnish higher education," Technology in Society, vol. 76, Mar. 2024, Art. no. 102474.
A. B. Altamimi, "Big Data in Education: Students at Risk as a Case Study," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11705–11714, Oct. 2023.
L. Kemper, G. Vorhoff, and B. U. Wigger, "Predicting student dropout: A machine learning approach," European Journal of Higher Education, vol. 10, no. 1, pp. 28–47, Jan. 2020.
Y. Yang, "Sparse Logistic Regression with the Hybrid L1/2+1 Regularization," in 2021 6th International Conference on Mathematics and Artificial Intelligence, Chengdu, China, Mar. 2021, pp. 8–13.
P. Dabhade, R. Agarwal, K. P. Alameen, A. T. Fathima, R. Sridharan, and G. Gopakumar, "Educational data mining for predicting students’ academic performance using machine learning algorithms," Materials Today: Proceedings, vol. 47, pp. 5260–5267, Jan. 2021.
A. Kukkar, R. Mohana, A. Sharma, and A. Nayyar, "A novel methodology using RNN + LSTM + ML for predicting student’s academic performance," Education and Information Technologies, vol. 29, no. 11, pp. 14365–14401, Aug. 2024.
Q. Huang and Y. Zeng, "Improving academic performance predictions with dual graph neural networks," Complex & Intelligent Systems, vol. 10, no. 3, pp. 3557–3575, Jun. 2024.
Q. Huang and Y. Zeng, "Improving academic performance predictions with dual graph neural networks," Complex & Intelligent Systems, vol. 10, no. 3, pp. 3557–3575, Jun. 2024.
Y. Yamasari, N. Rochmawati, R. E. Putra, A. Qoiriah, Asmunin, and W. Yustanti, "Predicting the Students Performance using Regularization-based Linear Regression," in 2021 Fourth International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, Oct. 2021, pp. 1–5.
O. W. Adejo and T. Connolly, "Predicting student academic performance using multi-model heterogeneous ensemble approach," Journal of Applied Research in Higher Education, vol. 10, no. 1, pp. 61–75, Jan. 2018.
E. Evangelista and B. Sy, "An approach for improved students’ performance prediction using homogeneous and heterogeneous ensemble methods," International Journal of Electrical and Computer Engineering, vol. 12, no. 5, pp. 5226–5235, Oct. 2022.
U. Michelucci, "Logistic Regression from Scratch," in Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks, U. Michelucci, Ed. Berkeley, CA, USA: Apress, 2018, pp. 391–401.
K. H. Pho, S. Ly, S. Ly, and T. M. Lukusa, "Comparison among Akaike Information Criterion, Bayesian Information Criterion and Vuong’s test in Model Selection: A Case Study of Violated Speed Regulation in Taiwan," Journal of Advanced Engineering and Computation, vol. 3, no. 1, pp. 293–303, Mar. 2019.
Downloads
How to Cite
License
Copyright (c) 2024 Bouchra Bouihi, Abdelmajid Bousselham, Essaadia Aoula, Fatna ENNIBRAS, Adel Deraoui
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.