Lightweight Hybrid Feature Engineering and Fusion with Variational Autoencoder and XGBoost for Nutritional Data Classification

Authors

  • Wiwien Hadikurniawati Faculty of Information Technology, Satya Wacana Christian University, Salatiga 50715, Indonesia | Faculty of Information Technology and Industry, Universitas Stikubank, Semarang 50241, Indonesia
  • Kristoko Dwi Hartomo Faculty of Information Technology, Satya Wacana Christian University, Salatiga 50715, Indonesia
  • Irwan Sembiring Faculty of Information Technology, Satya Wacana Christian University, Salatiga 50715, Indonesia
Volume: 16 | Issue: 1 | Pages: 30859-30868 | February 2026 | https://doi.org/10.48084/etasr.15310

Abstract

Stunting remains a significant global health challenge, particularly among children under five years old. Early detection of nutritional status is crucial to prevent long-term impacts on both physical and cognitive development. This study proposes a lightweight approach based on hybrid feature engineering and fusion for stunting status classification without requiring complex data balancing strategies. The proposed method integrates basic anthropometric features with Rule-Based (RB) derived features (BMI, weight, and height changes since birth, growth rates, and z-score combinations), along with latent features generated by a Variational Autoencoder (VAE) that captures non-linear patterns in tabular data. All features are then concatenated and classified using XGBoost. The evaluation with 5-fold cross-validation demonstrated highly competitive performance, yielding an average F1-score of 0.9919 ± 0.0066 on the testing data. These findings highlight that a lightweight yet informative approach can serve as a practical solution to support early stunting detection, particularly in public health settings with limited resources.

Keywords:

stunting prediction, feature engineering, Variational Autoencoder, XGBoost, lightweight model

Downloads

Download data is not yet available.

References

Levels and Trends in Child Malnutrition Child Malnutrition: Key Findings of the 2023, 1st ed. Geneva, Switzerland: World Health Organization, 2023.

The State of Food Security and Nutrition in the World 2023. Urbanization, agrifood systems transformation and healthy diets across the rural–urban continuum. Rome, Italy: United Nations International Children's Emergency Fund, United Nations, 2023.

S. Habimana and E. Biracyaza, "Risk Factors of Stunting Among Children Under 5 Years of Age in the Eastern and Western Provinces of Rwanda: Analysis of Rwanda Demographic and Health Survey 2014/2015," Pediatric Health, Medicine and Therapeutics, vol. 10, pp. 115–130, Oct. 2019. DOI: https://doi.org/10.2147/PHMT.S222198

T. Vaivada, N. Akseer, S. Akseer, A. Somaskandan, M. Stefopulos, and Z. A. Bhutta, " Stunting in Childhood: An Overview of Global Burden, Trends, Determinants, and Drivers of Decline," The American Journal of Clinical Nutrition, vol. 112, pp. 777S-791S, Sept. 2020. DOI: https://doi.org/10.1093/ajcn/nqaa159

Buku Saku Hasil Survei Status Gizi Indonesia (SSGI). Jakarta, Indonesia: Badan Kebijakan Pembangunan Kesehatan, Kementerian Kesehatan Ri, 2022.

H. D. S. Ferreira, "Anthropometric Assessment of Children's Nutritional Status: A New Approach Based on an Adaptation of Waterlow's Classification," BMC Pediatrics, vol. 20, no. 1, Dec. 2020, Art. no. 65. DOI: https://doi.org/10.1186/s12887-020-1940-6

H. M. Fenta, T. Zewotir, and E. K. Muluneh, " A Machine Learning Classifier Approach for Identifying the Determinants of Under-five Child Undernutrition in Ethiopian Administrative Zones," BMC Medical Informatics and Decision Making, vol. 21, no. 1, Oct. 2021, Art. no. 291. DOI: https://doi.org/10.1186/s12911-021-01652-1

A. B. Zemariam et al., " Prediction of Stunting and Its Socioeconomic Determinants Among Adolescent Girls in Ethiopia using Machine Learning Algorithms," PLOS ONE, vol. 20, no. 1, Jan. 2025, Art. no. e0316452. DOI: https://doi.org/10.1371/journal.pone.0316452

H. Shen, H. Zhao, and Y. Jiang, "Machine Learning Algorithms for Predicting Stunting among Under-Five Children in Papua New Guinea," Children, vol. 10, no. 10, Sept. 2023, Art. no. 1638. DOI: https://doi.org/10.3390/children10101638

D. Priyanto, H. Hairani, K. Marzuki, and M. Innuddin, "Optimization of Random Forest for Health Data Classification using PCA and K-Means SMOTE-ENN," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27646–27652, Oct. 2025. DOI: https://doi.org/10.48084/etasr.12976

A. Abu-Errub, "Improving Early Autism Detection with Chi-Square Feature Selection, Machine Learning, and Explainable AI," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27954–27959, Oct. 2025. DOI: https://doi.org/10.48084/etasr.12999

M. Al-Duais et al., "Comparative Analysis of Machine Learning and Deep learning Techniques for Early Prediction of Breast Cancer," Journal of Future Artificial Intelligence and Technologies, vol. 2, no. 2, pp. 242–254, June 2025. DOI: https://doi.org/10.62411/faith.3048-3719-68

F. Mustofa, A. N. Safriandono, A. R. Muslikh, and D. R. I. M. Setiadi, "Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest," Journal of Computing Theories and Applications, vol. 1, no. 1, pp. 41–48, Sept. 2023. DOI: https://doi.org/10.33633/jcta.v1i1.9190

W. Hadikurniawati, K. D. Hartomo, I. Sembiring, and C. Arthur, "A Dual-Fusion Hybrid Model with Attention for Stunting Prediction among Children under Five Years," Journal of Applied Data Sciences, vol. 6, no. 3, pp. 1985–1998, Sept. 2025. DOI: https://doi.org/10.47738/jads.v6i3.831

Y. Jin et al., "Classification of Alzheimer's Disease using Robust TabNet Neural Networks on Genetic Data," Mathematical Biosciences and Engineering, vol. 20, no. 5, pp. 8358–8374, 2023. DOI: https://doi.org/10.3934/mbe.2023366

K. Pyar, "Integrating Convolutional Neural Network and Weighted Moving Average for Enhanced Human Fall Detection Performance," Journal of Computing Theories and Applications, vol. 2, no. 1, pp. 13–21, May 2024. DOI: https://doi.org/10.62411/jcta.10428

T. Sugihartono, "Optimizing Stunting Detection through SMOTE and Machine Learning: A Comparative Study of XGBoost, Random Forest, SVM, and k-NN," Journal of Applied Data Sciences, vol. 6, no. 1, pp. 667–682, Jan. 2024. DOI: https://doi.org/10.47738/jads.v6i1.494

S. Hussain et al., "An Enhanced Random Forest (ERF)-based Machine Learning Framework for Resampling, Prediction, and Classification of Mobile Applications using Textual Features," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19776–19781, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9148

M. G. S. P. Kusuma et al., "Integrating Quantum, Deep, and Classic Features with Attention-Guided AdaBoost for Medical Risk Prediction," Journal of Computing Theories and Applications, vol. 3, no. 2, pp. 171–189, Oct. 2025. DOI: https://doi.org/10.62411/jcta.14873

M. N. Aisy, S. A. Wulandari, and D. R. I. M. Setiadi, "A Probabilistic Feature-Augmented GRU-Attention Model for Chronic Disease Prediction on Imbalanced Data," Journal of Future Artificial Intelligence and Technologies, vol. 2, no. 2, pp. 282–293, July 2025. DOI: https://doi.org/10.62411/faith.3048-3719-100

F. Ferré, S. Allassonnière, C. Chadebec, and V. Minville, "Generating Artificial Patients with Reliable Clinical Characteristics using a Geometry-Based Variational Autoencoder: Proof-of-Concept Feasibility Study," Journal of Medical Internet Research, vol. 27, Apr. 2025, Art. no. e63130. DOI: https://doi.org/10.2196/63130

N. Simidjievski et al., "Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice," Frontiers in Genetics, vol. 10, Dec. 2019, Art. no. 1205. DOI: https://doi.org/10.3389/fgene.2019.01205

C. C. Odiakaose et al., "Hypertension Detection via Tree-Based Stack Ensemble with SMOTE-Tomek Data Balance and XGBoost Meta-Learner," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 3, pp. 269–283, Dec. 2024. DOI: https://doi.org/10.62411/faith.3048-3719-43

D. R. I. M. Setiadi, K. Nugroho, A. R. Muslikh, S. W. Iriananda, and A. A. Ojugo, "Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 1, pp. 23–38, May 2024. DOI: https://doi.org/10.62411/faith.2024-11

R. Kansal and C. Diwaker, "Efficiency Determination of Various Machine Learning Techniques for Sentiment Analysis on Social Media Platforms," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25584–25589, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11158

M. K. Ayele, G. A. Baye, S. H. Yesuf, A. A. Engda, and E. T. Mitiku, " Predicting Stunting Status Among Under Five Children in Ethiopia using Ensemblemachine Learning Algorithms," Scientific Reports, vol. 15, no. 1, July 2025, Art. no. 27907. DOI: https://doi.org/10.1038/s41598-025-03206-1

Md. M. Islam, N. Md. S. J. Kibria, S. Kumar, D. C. Roy, and Md. R. Karim, " Prediction of Undernutrition and Identification of its Influencing Predictors Among Under-five Children in Bangladesh using Explainable Machine Learning Algorithms," PLOS ONE, vol. 19, no. 12, Dec. 2024, Art. no. e0315393. DOI: https://doi.org/10.1371/journal.pone.0315393

N. Hasdyna, R. K. Dinata, Rahmi, and T. I. Fajri, "Hybrid Machine Learning for Stunting Prevalence: A Novel Comprehensive Approach to Its Classification, Prediction, and Clustering Optimization in Aceh, Indonesia," Informatics, vol. 11, no. 4, Nov. 2024, Art. no. 89. DOI: https://doi.org/10.3390/informatics11040089

T. R. Noviandy, G. M. Idroes, and I. Hardi, "An Interpretable Machine Learning Strategy for Antimalarial Drug Discovery with LightGBM and SHAP," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 2, pp. 84–95, Aug. 2024. DOI: https://doi.org/10.62411/faith.2024-16

C. A. Akiotu et al., "A Predictive Nutritional Assessment System for Vegetarians using Artificial Neural Networks," Journal of Future Artificial Intelligence and Technologies, vol. 2, no. 2, pp. 294–312, Aug. 2025. DOI: https://doi.org/10.62411/faith.3048-3719-117

A. Schreuder, E. Corpeleijn, and T. Vrijkotte, " Modelling Individual Infancy Growth Trajectories to Predict Excessive Gain in BMI z-score: A Comparison of Growth Measures in the Abcd and Gecko Drenthe Cohorts," BMC Public Health, vol. 23, no. 1, Dec. 2023, Art. no. 2428. DOI: https://doi.org/10.1186/s12889-023-17354-4

J. Mkungudza, H. S. Twabi, and S. O. M. Manda, " Development of a Diagnostic Predictive Model for Determining Child Stunting in Malawi: A Comparative Analysis of Variable Selection Approaches," BMC Medical Research Methodology, vol. 24, no. 1, Aug. 2024, Art. no. 175. DOI: https://doi.org/10.1186/s12874-024-02283-6

A. Hendy et al., " Supervised Machine Learning for Classification and Prediction of Stunting Among Under-five Egyptian Children," BMC Pediatrics, vol. 25, no. 1, Sept. 2025, Art. no. 681. DOI: https://doi.org/10.1186/s12887-025-06138-x

R. Wei and A. Mahmood, "Recent Advances in Variational Autoencoders with Representation Learning for Biomedical Informatics: A Survey," IEEE Access, vol. 9, pp. 4939–4956, 2021. DOI: https://doi.org/10.1109/ACCESS.2020.3048309

D. P. Gomari et al., "Variational Autoencoders learn transferrable representations of metabolomics data," Communications Biology, vol. 5, no. 1, June 2022, Art. no. 645. DOI: https://doi.org/10.1038/s42003-022-03579-3

R. Wei, C. Garcia, A. El-Sayed, V. Peterson, and A. Mahmood, "Variations in Variational Autoencoders - A Comparative Evaluation," IEEE Access, vol. 8, pp. 153651–153670, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3018151

Downloads

How to Cite

[1]
W. Hadikurniawati, K. D. Hartomo, and I. Sembiring, “Lightweight Hybrid Feature Engineering and Fusion with Variational Autoencoder and XGBoost for Nutritional Data Classification”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 1, pp. 30859–30868, Feb. 2026.

Metrics

Abstract Views: 99
PDF Downloads: 64

Metrics Information