Ensemble Learning for Diabetes Prediction: An Integration of TabNet and Neural Oblivious Decision Ensembles (NODE)
Received: 28 September 2025 | Revised: 16 October 2025 and 27 October 2025 | Accepted: 29 October 2025 | Online: 8 December 2025
Corresponding author: Majid Rahardi
Abstract
The accurate prediction of diabetes risk is paramount for advancing healthcare and personalized medicine. This study presents a comparative analysis of advanced deep learning models for structured data, focusing on two novel architectures, Neural Oblivious Decision Ensembles (NODE) and TabNet. The method encompasses comprehensive data preprocessing, including a critical technique to address the imbalanced nature of the dataset (oversampling). Finally, a combined modeling approach (a soft-voting ensemble) was implemented to combine the predictive probabilities from the trained individual models. The soft-voting ensemble demonstrated strong performance, achieving a validation accuracy of 93.55, a precision of 92.60, a recall of 94.58, and an F1-score of 93.58. These findings underscore the potential of advanced deep learning techniques, especially when combined in an ensemble, to provide highly reliable and accurate diabetes risk prediction from complex tabular data.
Keywords:
ensemble learning, deep learning, diabetes prediction, TabNet, NODEDownloads
References
J. Niu, Y. Liu, H. Peng, J. Chen, and L. Chen, "Early-stage diabetic retinopathy: gut-metabolic triggers, immune-neurodegeneration, and interventions," Graefe’s Archive for Clinical and Experimental Ophthalmology, vol. 263, no. 10, pp. 2723–2736, Oct. 2025. DOI: https://doi.org/10.1007/s00417-025-06906-6
R. Candido et al., "Retrospective cohort study on treatment outcomes of early vs late onset gestational diabetes mellitus," Acta Diabetologica, vol. 62, no. 6, pp. 881–889, Nov. 2024. DOI: https://doi.org/10.1007/s00592-024-02405-y
S. S. Bhat, M. Banu, G. A. Ansari, and V. Selvam, "A risk assessment and prediction framework for diabetes mellitus using machine learning algorithms," Healthcare Analytics, vol. 4, Dec. 2023, Art. no. 100273. DOI: https://doi.org/10.1016/j.health.2023.100273
M. Kumar, Sushant, and A. K. Yadav, "Speech signal’s phase information based Alzheimer’s disease detection using deep learning," International Journal of Speech Technology, vol. 28, no. 2, pp. 397–410, Jun. 2025. DOI: https://doi.org/10.1007/s10772-025-10193-1
L. E. S. E. Oliveira et al., "SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks," Journal of Biomedical Semantics, vol. 13, no. 1, Dec. 2022, Art. no. 13. DOI: https://doi.org/10.1186/s13326-022-00269-1
U. M. Butt, S. Letchmunan, M. Ali, F. H. Hassan, A. Baqir, and H. H. R. Sherazi, "Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications," Journal of Healthcare Engineering, vol. 2021, pp. 1–17, Sep. 2021. DOI: https://doi.org/10.1155/2021/9930985
H. Kaur and V. Kumari, "Predictive modelling and analytics for diabetes using a machine learning approach," Applied Computing and Informatics, vol. 18, no. 1/2, pp. 90–100, Mar. 2022. DOI: https://doi.org/10.1016/j.aci.2018.12.004
K. Oliullah, M. H. Rasel, Md. M. Islam, Md. R. Islam, Md. A. H. Wadud, and Md. Whaiduzzaman, "A stacked ensemble machine learning approach for the prediction of diabetes," Journal of Diabetes & Metabolic Disorders, vol. 23, no. 1, pp. 603–617, Nov. 2023. DOI: https://doi.org/10.1007/s40200-023-01321-2
H. Zaky et al., "Machine learning based model for the early detection of Gestational Diabetes Mellitus," BMC Medical Informatics and Decision Making, vol. 25, no. 1, Mar. 2025, Art. no. 130. DOI: https://doi.org/10.1186/s12911-025-02947-3
M. A. Nematollahi et al., "A cohort study on the predictive capability of body composition for diabetes mellitus using machine learning," Journal of Diabetes & Metabolic Disorders, vol. 23, no. 1, pp. 773–781, Nov. 2023. DOI: https://doi.org/10.1007/s40200-023-01350-x
M. Mustafa, "Diabetes prediction dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset.
A. Thakur, T. Zhu, V. Abrol, J. Armstrong, Y. Wang, and D. A. Clifton, "Data encoding for healthcare data democratization and information leakage prevention," Nature Communications, vol. 15, no. 1, Feb. 2024, Art. no. 1582. DOI: https://doi.org/10.1038/s41467-024-45777-z
M. Rath and H. Date, "Quantum data encoding: a comparative analysis of classical-to-quantum mapping techniques and their impact on machine learning accuracy," EPJ Quantum Technology, vol. 11, no. 1, Dec. 2024, Art. no. 72. DOI: https://doi.org/10.1140/epjqt/s40507-024-00285-3
R. S. Selina, M. Rahardi, A. Aminuddin, F. F. Abdulloh, H. Badi, and B. P. Asaddulloh, "Optimizing Diabetes Diagnosis Using Machine Learning With SMOTE and Feature Selection," in 2025 International Conference on Computer Sciences, Engineering, and Technology Innovation (ICoCSETI), Jakarta, Indonesia, Jan. 2025, pp. 647–652. DOI: https://doi.org/10.1109/ICoCSETI63724.2025.11020043
M. Rahardi, A. Aminuddin, F. F. Abdulloh, B. P. Asaddulloh, H. R. Enriquez, and K. Kusnawi, "Analyzing the Impact of Data Resampling on Stroke Prediction using Machine Learning," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20790–20797, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9736
A. K. Salih, A. K. Faraj, M. A. Ahmed, and A. N. A. Al-Hasnawi, "The Impact of Data Splitting Strategy on Drilling Rate Prediction in the Rumaila Oil Field," Petroleum Chemistry, vol. 64, no. 7, pp. 781–786, Jul. 2024. DOI: https://doi.org/10.1134/S0965544124050025
H. Babaei, M. Zamani, and S. Mohammadi, "The impact of data splitting methods on machine learning models: A case study for predicting concrete workability," Machine Learning for Computational Science and Engineering, vol. 1, no. 1, Jun. 2025, Art. no. 21. DOI: https://doi.org/10.1007/s44379-025-00021-3
Y. Fan and P. Waldmann, "Tabular deep learning: a comparative study applied to multi-task genome-wide prediction," BMC Bioinformatics, vol. 25, no. 1, Oct. 2024, Art. no. 322. DOI: https://doi.org/10.1186/s12859-024-05940-1
S. Popov, S. Morozov, and A. Babenko, "Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data." arXiv, Sep. 19, 2019.
P. Wielopolski, O. Furman, and M. Zięba, "NodeFlow: Towards End-to-End Flexible Probabilistic Regression on Tabular Data," Entropy, vol. 26, no. 7, Jul. 2024, Art. no. 593. DOI: https://doi.org/10.3390/e26070593
T. Kumar and R. L. Ujjwal, "TabNet unveils predictive insights: a deep learning approach for Parkinson’s disease prognosis," International Journal of System Assurance Engineering and Management, Jul. 2024. DOI: https://doi.org/10.1007/s13198-024-02450-4
S. Yingze, S. Yingxu, Z. Xin, Z. Jie, and Y. Degang, "Comparative analysis of the TabNet algorithm and traditional machine learning algorithms for landslide susceptibility assessment in the Wanzhou Region of China," Natural Hazards, vol. 120, no. 8, pp. 7627–7652, Jun. 2024. DOI: https://doi.org/10.1007/s11069-024-06521-4
A. Rezaeezade and L. Batina, "Regularizers to the rescue: fighting overfitting in deep learning-based side-channel analysis," Journal of Cryptographic Engineering, vol. 14, no. 4, pp. 609–629, Nov. 2024. DOI: https://doi.org/10.1007/s13389-024-00361-5
C. Wang, X. Yu, C. Bai, Q. Zhang, and Z. Wang, "Ensemble successor representations for task generalization in offline-to-online reinforcement learning," Science China Information Sciences, vol. 67, no. 7, Jul. 2024, Art. no. 172203. DOI: https://doi.org/10.1007/s11432-023-4028-1
K. Akyol, E. Uçar, Ü. Atila, and M. Uçar, "An ensemble approach for classification of tympanic membrane conditions using soft voting classifier," Multimedia Tools and Applications, vol. 83, no. 32, pp. 77809–77830, Feb. 2024. DOI: https://doi.org/10.1007/s11042-024-18631-z
Downloads
How to Cite
License
Copyright (c) 2025 Majid Rahardi, Ferian Fauzi Abdulloh, Ahlihi Masruro, Bima Pramudya Asaddulloh, Afrig Aminuddin, Nafiatun Sholihah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
