The Impact of Data Preprocessing Order on LASSO and Elastic Net Capabilities
Received: 14 November 2024 | Revised: 15 December 2024 and 29 December 2024 | Accepted: 1 January 2025 | Online: 2 February 2025
Corresponding author: Khuneswari Gopal Pillay
Abstract
The Food Security Index (FSI) evaluates affordability, accessibility, utilization, and food availability. However, previous research on food security in Malaysia has primarily focused on production, neglecting a detailed analysis of economic factors. The Overnight Policy Rate (OPR), set by Bank Negara Malaysia (BNM), regulates economic activity by controlling the interest rate at which commercial banks borrow and lend overnight. This study explores the impact of data preprocessing sequences on the performance of LASSO and Elastic Net regression models in predicting Malaysia's FSI. Using macroeconomic data from 2010 to 2023, this study evaluates the effects of different sequences of outlier detection and missing data imputation. The findings reveal that the LASSO model achieves the highest accuracy and the lowest error rates with outlier detection performed after imputation. This study underscores the importance of preprocessing order in enhancing model reliability and provides insight into the economic factors that influence food security in Malaysia. The results show that OPR reduces Malaysia's FSI by 0.151 units, while inflation increases it by 0.022. The LASSO regression model offers a novel perspective on the economic factors influencing food security, providing a more comprehensive understanding of food security in Malaysia.
Keywords:
preprocessing sequences, missing data, outliers, LASSO, Elastic NetDownloads
References
"Global Strategic Framework for Food Security & Nutrition (GSF)." https://www.fao.org/cfs/policy-products/onlinegsf/en/.
Martin, "Goal 2: Zero Hunger," United Nations Sustainable Development. https://www.un.org/sustainabledevelopment/hunger/.
A. K. Tiwari, S. Nasreen, M. Shahbaz, and S. Hammoudeh, "Time-frequency causality and connectedness between international prices of energy, food, industry, agriculture and metals," Energy Economics, vol. 85, Jan. 2020, Art. no. 104529.
R. Kollmann, "Effects of Covid-19 on Euro area GDP and inflation: demand vs. supply disturbances," International Economics and Economic Policy, vol. 18, no. 3, pp. 475–492, Jul. 2021.
A. O. El Alaoui, H. B. Jusoh, S. A. Yussof, and M. H. Hanifa, "Evaluation of monetary policy: Evidence of the role of money from Malaysia," The Quarterly Review of Economics and Finance, vol. 74, pp. 119–128, Nov. 2019.
B. Kuma and G. Gata, "Factors affecting food price inflation in Ethiopia: An autoregressive distributed lag approach," Journal of Agriculture and Food Research, vol. 12, Jun. 2023, Art. no. 100548.
S. Batra and S. Sachdeva, "Organizing standardized electronic healthcare records data for mining," Health Policy and Technology, vol. 5, no. 3, pp. 226–242, Sep. 2016.
J. V. den Broeck, S. A. Cunningham, R. Eeckels, and K. Herbst, "Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities," PLOS Medicine, vol. 2, no. 10, 2005, Art. no. e267.
P. Misra and A. S. Yadav, "Impact of Preprocessing Methods on Healthcare Predictions," in Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), 2019.
E. de Jonge and M. van der Loo, An Introduction to Data Cleaning with R. Statistics Netherlands, 2013.
C. Quintano, R. Castellano, and A. Rocca, "Influence of outliers on some multiple imputation methods," Advances in Methodology and Statistics, vol. 7, no. 1, Jan. 2010.
J. Jeong and C. Kim, "Effect of outliers on the variable selection by the regularized regression," Communications for Statistical Applications and Methods, vol. 25, no. 2, pp. 235–243, Mar. 2018.
F. E. Grubbs, "Procedures for Detecting Outlying Observations in Samples," Technometrics, vol. 11, no. 1, pp. 1–21, Feb. 1969.
E. M. Raouhi, M. Lachgar, and A. Kartit, "Comparative Study of Regression and Regularization Methods: Application to Weather and Climate Data," in WITS 2020, vol. 745, S. Bennani, Y. Lakhrissi, G. Khaissidi, A. Mansouri, and Y. Khamlichi, Eds. Springer Singapore, 2022, pp. 233–240.
Department of Statistics Malaysia, "Data Catalogue | OpenDOSM." https://open.dosm.gov.my.
S. Van Buuren, "Multiple imputation of discrete and continuous data by fully conditional specification," Statistical Methods in Medical Research, vol. 16, no. 3, pp. 219–242, Jun. 2007.
S. V. Buuren and K. Groothuis-Oudshoorn, "mice : Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, vol. 45, no. 3, 2011.
J. Laurikkala, M. Juhola, and E. Kentala, "Informal identification of outliers in medical data," in Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, vol. 1, no. 1, pp. 20–24.
C. Chen and L. M. Liu, "Joint Estimation of Model Parameters and Outlier Effects in Time Series," Journal of the American Statistical Association, vol. 88, no. 421, pp. 284–297, Mar. 1993.
N. Shrestha, "Detecting Multicollinearity in Regression Analysis," American Journal of Applied Mathematics and Statistics, vol. 8, no. 2, pp. 39–42, Jun. 2020.
R. Tibshirani, "Regression Shrinkage and Selection Via the Lasso," Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 58, no. 1, pp. 267–288, Jan. 1996.
H. Zou and T. Hastie, "Regularization and Variable Selection Via the Elastic Net," Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 67, no. 2, pp. 301–320, Apr. 2005.
J. Friedman et al., "glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models." Jun. 02, 2008.
E. A. Mohammed, C. Naugler, and B. H. Far, "Emerging Business Intelligence Framework for a Clinical Laboratory Through Big Data Analytics," in Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, Elsevier, 2015, pp. 577–602.
M. A. A. Abdullah, L. Jesintha, G. P. Khuneswari, S. A. M. Jamil, and O. R. Olaniran, "Comparison of Multiple Regression and Model Averaging Model-Building Approach for Missing Data with Multiple Imputation," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18502–18508, Dec. 2024.
Downloads
How to Cite
License
Copyright (c) 2025 Geneveive Yii Ven Tang, Khuneswari Gopal Pillay, Aida Binti Mustapha

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.