Big Data in Education: Students at Risk as a Case Study
Received: 11 July 2023 | Revised: 10 August 2023 and 16 August 2023 | Accepted: 21 August 2023 | Online: 13 October 2023
Corresponding author: Ahmed B. Altamimi
Abstract
This paper analyzes various machine learning algorithms to predict student failure in a specific educational dataset and a specific environment. The paper handles the prediction of student failure given the students' grades, course difficulty level, and GPA, differing from most of the provided studies in the literature, where focus is given to the surrounding environment. The main aim is to early detect students at risk of academic underperformance and implement specific interventions to enhance their academic outcomes. A diverse set of eleven Machine Learning (ML) algorithms was used to analyze the dataset. The data went through preprocessing, and features were engineered to effectively capture essential information that may impact students' academic performance. A meticulous process for model selection and evaluation was utilized to compare the algorithms' performance with regard to metrics such as accuracy, precision, recall, F-score, specificity, and balanced accuracy. Our results demonstrate significant variability in the performance of the different algorithms, with Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs) showing the highest overall performance, followed closely by Gradient Boosting Classifier (GBC), Neuro-Fuzzy, and Random Forest (RF). The other algorithms exhibit varying performance levels, with the Recurrent Neural Networks (RNNs) showing the weakest results in recall and F-score. Educational institutions can use the insight gained from this study to make data-driven decisions and design targeted interventions to help students at risk succeed academically. Furthermore, the methodology presented in this paper can be generalized and applied to other educational datasets for similar predictive purposes.
Keywords:
machine learning algorithms, big data, accuracy, F-score, precisionDownloads
References
I. Guellil and K. Boukhalfa, "Social big data mining: A survey focused on opinion mining and sentiments analysis," in 2015 12th International Symposium on Programming and Systems (ISPS), Algiers, Algeria, Apr. 2015. DOI: https://doi.org/10.1109/ISPS.2015.7244976
S. Sharma and V. Mangat, "Technology and Trends to Handle Big Data: Survey," in 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, India, Oct. 2015, pp. 266–271. DOI: https://doi.org/10.1109/ACCT.2015.121
F. Xia, W. Wang, T. M. Bekele, and H. Liu, "Big Scholarly Data: A Survey," IEEE Transactions on Big Data, vol. 3, no. 1, pp. 18–35, Mar. 2017. DOI: https://doi.org/10.1109/TBDATA.2016.2641460
J. Zhang, X. Yao, G. Han, and Y. Gui, "A survey of recent technologies and challenges in big data utilizations," in 2015 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Korea, Jul. 2015, pp. 497–499. DOI: https://doi.org/10.1109/ICTC.2015.7354594
J. V. Gautam, H. B. Prajapati, V. K. Dabhi, and S. Chaudhary, "A survey on job scheduling algorithms in Big data processing," in 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, Mar. 2015, pp. 1–11. DOI: https://doi.org/10.1109/ICECCT.2015.7226035
A. Fahad et al., "A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis," IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, Sep. 2014. DOI: https://doi.org/10.1109/TETC.2014.2330519
S. Gole and B. Tidke, "A survey of big data in social media using data mining techniques," in 2015 International Conference on Advanced Computing and Communication Systems, Coimbatore, India, Jan. 2015. DOI: https://doi.org/10.1109/ICACCS.2015.7324059
J. Wang, Y. Wu, N. Yen, S. Guo, and Z. Cheng, "Big Data Analytics for Emergency Communication Networks: A Survey," IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1758–1778, 2016. DOI: https://doi.org/10.1109/COMST.2016.2540004
S. Yu, M. Liu, W. Dou, X. Liu, and S. Zhou, "Networking for Big Data: A Survey," IEEE Communications Surveys & Tutorials, vol. 19, no. 1, pp. 531–549, 2017. DOI: https://doi.org/10.1109/COMST.2016.2610963
D. Ramesh, P. Suraj, and L. Saini, "Big data analytics in healthcare: A survey approach," in 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), Durgapur, India, Jan. 2016. DOI: https://doi.org/10.1109/MicroCom.2016.7522520
M. Pandey, R. Litoriya, and P. Pandey, "Mobile applications in context of big data: A survey," in 2016 Symposium on Colossal Data Analysis and Networking (CDAN), Indore, India, Mar. 2016. DOI: https://doi.org/10.1109/CDAN.2016.7570942
M. Saberi, A. Karduck, O. K. Hussain, and E. Chang, "Challenges in Efficient Customer Recognition in Contact Centre: State-of-the-Art Survey by Focusing on Big Data Techniques Applicability," in 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS), Ostrava, Czech Republic, Sep. 2016, pp. 548–554. DOI: https://doi.org/10.1109/INCoS.2016.136
Y. Hou, J. Xu, Y. Huang, and X. Ma, "A big data application to predict depression in the university based on the reading habits," in 2016 3rd International Conference on Systems and Informatics (ICSAI), Shanghai, China, Aug. 2016, pp. 1085–1089. DOI: https://doi.org/10.1109/ICSAI.2016.7811112
R. M. Mathew and R. Gunasundari, "A Cluster-based Undersampling Technique for Multiclass Skewed Datasets", Eng. Technol. Appl. Sci. Res., vol. 13, no. 3, pp. 10785–10790, Jun. 2023. DOI: https://doi.org/10.48084/etasr.5844
A. B. Rashid, R. R. R. Ikram, Y. Thamilarasan, L. Salahuddin, N. F. A. Yusof, and Z. B. Rashid, "A Student Learning Style Auto-Detection Model in a Learning Management System," Engineering, Technology & Applied Science Research, vol. 13, no. 3, pp. 11000–11005, Jun. 2023. DOI: https://doi.org/10.48084/etasr.5751
S. Joseph, N. Mduma, and D. Nyambo, "A Deep Learning Model for Predicting Stock Prices in Tanzania," Engineering, Technology & Applied Science Research, vol. 13, no. 2, pp. 10517–10522, Apr. 2023. DOI: https://doi.org/10.48084/etasr.5710
B. Veloso, M. A. Barbosa, H. Faria, F. S. Marcondes, D. Durães, and P. Novais, "A Systematic Review on Student Failure Prediction," in Methodologies and Intelligent Systems for Technology Enhanced Learning, Workshops, 12th International Conference, 2023, pp. 43–52. DOI: https://doi.org/10.1007/978-3-031-20257-5_5
H. Waheed, S.-U. Hassan, R. Nawaz, N. R. Aljohani, G. Chen, and D. Gasevic, "Early prediction of learners at risk in self-paced education: A neural network approach," Expert Systems with Applications, vol. 213, Mar. 2023, Art. no. 118868. DOI: https://doi.org/10.1016/j.eswa.2022.118868
Nurmalitasari, Z. Awang Long, and M. F. Mohd Noor, "The Predictive Learning Analytics for Student Dropout Using Data Mining Technique: A Systematic Literature Review," in Advances in Technology Transfer Through IoT and IT Solutions, A. Ismail, F. N. Zulkipli, Z. Awang Long, and A. Öchsner, Eds. Springer Nature Switzerland, 2023, pp. 9–17. DOI: https://doi.org/10.1007/978-3-031-25178-8_2
A. Kukkar, R. Mohana, A. Sharma, and A. Nayyar, "Prediction of student academic performance based on their emotional wellbeing and interaction on various e-learning platforms," Education and Information Technologies, vol. 28, no. 8, pp. 9655–9684, Aug. 2023. DOI: https://doi.org/10.1007/s10639-022-11573-9
M. Saarinen, R. Bertram, K. Aunola, J. Pankkonen, and T. V. Ryba, "Student Athletes’ Causal Attributions for Sport and School Achievement in Relation to Sport Dropout and Grade Point Average," Journal of Sport & Exercise Psychology, vol. 45, no. 1, pp. 15–25, Feb. 2023. DOI: https://doi.org/10.1123/jsep.2022-0115
L. K. Smirani, H. A. Yamani, L. J. Menzli, and J. A. Boulahia, "Using Ensemble Learning Algorithms to Predict Student Failure and Enabling Customized Educational Paths," Scientific Programming, vol. 2022, Apr. 2022, Art. no. e3805235. DOI: https://doi.org/10.1155/2022/3805235
T. A. Kustitskaya, A. A. Kytmanov, and M. V. Noskov, "Early Student-at-Risk Detection by Current Learning Performance and Learning Behavior Indicators," Cybernetics and Information Technologies, vol. 22, no. 1, pp. 117–133, Mar. 2022. DOI: https://doi.org/10.2478/cait-2022-0008
M. Yağcı, "Educational data mining: prediction of students’ academic performance using machine learning algorithms," Smart Learning Environments, vol. 9, no. 1, Mar. 2022, Art. no. 11. DOI: https://doi.org/10.1186/s40561-022-00192-z
J. Hao, J. Gan, and L. Zhu, "MOOC performance prediction and personal performance improvement via Bayesian network," Education and Information Technologies, vol. 27, no. 5, pp. 7303–7326, Jun. 2022. DOI: https://doi.org/10.1007/s10639-022-10926-8
D. Alboaneen, M. Almelihi, R. Alsubaie, R. Alghamdi, L. Alshehri, and R. Alharthi, "Development of a Web-Based Prediction System for Students’ Academic Performance," Data, vol. 7, no. 2, Feb. 2022, Art. no. 21. DOI: https://doi.org/10.3390/data7020021
S. S. Shreem, H. Turabieh, S. Al Azwari, and F. Baothman, "Enhanced binary genetic algorithm as a feature selection to predict student performance," Soft Computing, vol. 26, no. 4, pp. 1811–1823, Feb. 2022. DOI: https://doi.org/10.1007/s00500-021-06424-7
M. Imran, S. Latif, D. Mehmood, and M. S. Shah, "Student Academic Performance Prediction using Supervised Learning Techniques," International Journal of Emerging Technologies in Learning (iJET), vol. 14, no. 14, pp. 92–104, Jul. 2019. DOI: https://doi.org/10.3991/ijet.v14i14.10310
S. Kotsiantis, K. Patriarcheas, and M. Xenos, "A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education," Knowledge-Based Systems, vol. 23, no. 6, pp. 529–535, Aug. 2010. DOI: https://doi.org/10.1016/j.knosys.2010.03.010
C. Márquez-Vera, A. Cano, C. Romero, A. Y. M. Noaman, H. Mousa Fardoun, and S. Ventura, "Early dropout prediction using data mining: a case study with high school students," Expert Systems, vol. 33, no. 1, pp. 107–124, 2016. DOI: https://doi.org/10.1111/exsy.12135
M. Fei and D.-Y. Yeung, "Temporal Models for Predicting Student Dropout in Massive Open Online Courses," in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, Aug. 2015, pp. 256–263. DOI: https://doi.org/10.1109/ICDMW.2015.174
Downloads
How to Cite
License
Copyright (c) 2023 Ahmed B. Altamimi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.