The Impact of Data Preprocessing Order on LASSO and Elastic Net Capabilities

Authors

  • Geneveive Yii Ven Tang Department of Mathematics and Statistics, Faculty of Applied Science and Technology, University Tun Hussein Onn Malaysia, Campus Pagoh, Muar, Johor, Malaysia
  • Khuneswari Gopal Pillay Department of Mathematics and Statistics, Faculty of Applied Science and Technology, University Tun Hussein Onn Malaysia, Campus Pagoh, Muar, Johor, Malaysia https://orcid.org/0000-0001-9111-0931
  • Aida Mustapha Department of Mathematics and Statistics, Faculty of Applied Science and Technology, University Tun Hussein Onn Malaysia, Campus Pagoh, Muar, Johor, Malaysia https://orcid.org/0000-0002-9077-4995
Volume: 15 | Issue: 1 | Pages: 20264-20270 | February 2025 | https://doi.org/10.48084/etasr.9611

Abstract

The Food Security Index (FSI) evaluates affordability, accessibility, utilization, and food availability. However, previous research on food security in Malaysia has primarily focused on production, neglecting a detailed analysis of economic factors. The Overnight Policy Rate (OPR), set by Bank Negara Malaysia (BNM), regulates economic activity by controlling the interest rate at which commercial banks borrow and lend overnight. This study explores the impact of data preprocessing sequences on the performance of LASSO and Elastic Net regression models in predicting Malaysia's FSI. Using macroeconomic data from 2010 to 2023, this study evaluates the effects of different sequences of outlier detection and missing data imputation. The findings reveal that the LASSO model achieves the highest accuracy and the lowest error rates with outlier detection performed after imputation. This study underscores the importance of preprocessing order in enhancing model reliability and provides insight into the economic factors that influence food security in Malaysia. The results show that OPR reduces Malaysia's FSI by 0.151 units, while inflation increases it by 0.022. The LASSO regression model offers a novel perspective on the economic factors influencing food security, providing a more comprehensive understanding of food security in Malaysia.

Keywords:

preprocessing sequences, missing data, outliers, LASSO, Elastic Net

Downloads

Download data is not yet available.

References

"Global Strategic Framework for Food Security & Nutrition (GSF)." https://www.fao.org/cfs/policy-products/onlinegsf/en/.

Martin, "Goal 2: Zero Hunger," United Nations Sustainable Development. https://www.un.org/sustainabledevelopment/hunger/.

A. K. Tiwari, S. Nasreen, M. Shahbaz, and S. Hammoudeh, "Time-frequency causality and connectedness between international prices of energy, food, industry, agriculture and metals," Energy Economics, vol. 85, Jan. 2020, Art. no. 104529.

R. Kollmann, "Effects of Covid-19 on Euro area GDP and inflation: demand vs. supply disturbances," International Economics and Economic Policy, vol. 18, no. 3, pp. 475–492, Jul. 2021.

A. O. El Alaoui, H. B. Jusoh, S. A. Yussof, and M. H. Hanifa, "Evaluation of monetary policy: Evidence of the role of money from Malaysia," The Quarterly Review of Economics and Finance, vol. 74, pp. 119–128, Nov. 2019.

B. Kuma and G. Gata, "Factors affecting food price inflation in Ethiopia: An autoregressive distributed lag approach," Journal of Agriculture and Food Research, vol. 12, Jun. 2023, Art. no. 100548.

S. Batra and S. Sachdeva, "Organizing standardized electronic healthcare records data for mining," Health Policy and Technology, vol. 5, no. 3, pp. 226–242, Sep. 2016.

J. V. den Broeck, S. A. Cunningham, R. Eeckels, and K. Herbst, "Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities," PLOS Medicine, vol. 2, no. 10, 2005, Art. no. e267.

P. Misra and A. S. Yadav, "Impact of Preprocessing Methods on Healthcare Predictions," in Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), 2019.

E. de Jonge and M. van der Loo, An Introduction to Data Cleaning with R. Statistics Netherlands, 2013.

C. Quintano, R. Castellano, and A. Rocca, "Influence of outliers on some multiple imputation methods," Advances in Methodology and Statistics, vol. 7, no. 1, Jan. 2010.

J. Jeong and C. Kim, "Effect of outliers on the variable selection by the regularized regression," Communications for Statistical Applications and Methods, vol. 25, no. 2, pp. 235–243, Mar. 2018.

F. E. Grubbs, "Procedures for Detecting Outlying Observations in Samples," Technometrics, vol. 11, no. 1, pp. 1–21, Feb. 1969.

E. M. Raouhi, M. Lachgar, and A. Kartit, "Comparative Study of Regression and Regularization Methods: Application to Weather and Climate Data," in WITS 2020, vol. 745, S. Bennani, Y. Lakhrissi, G. Khaissidi, A. Mansouri, and Y. Khamlichi, Eds. Springer Singapore, 2022, pp. 233–240.

Department of Statistics Malaysia, "Data Catalogue | OpenDOSM." https://open.dosm.gov.my.

S. Van Buuren, "Multiple imputation of discrete and continuous data by fully conditional specification," Statistical Methods in Medical Research, vol. 16, no. 3, pp. 219–242, Jun. 2007.

S. V. Buuren and K. Groothuis-Oudshoorn, "mice : Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, vol. 45, no. 3, 2011.

J. Laurikkala, M. Juhola, and E. Kentala, "Informal identification of outliers in medical data," in Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, vol. 1, no. 1, pp. 20–24.

C. Chen and L. M. Liu, "Joint Estimation of Model Parameters and Outlier Effects in Time Series," Journal of the American Statistical Association, vol. 88, no. 421, pp. 284–297, Mar. 1993.

N. Shrestha, "Detecting Multicollinearity in Regression Analysis," American Journal of Applied Mathematics and Statistics, vol. 8, no. 2, pp. 39–42, Jun. 2020.

R. Tibshirani, "Regression Shrinkage and Selection Via the Lasso," Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 58, no. 1, pp. 267–288, Jan. 1996.

H. Zou and T. Hastie, "Regularization and Variable Selection Via the Elastic Net," Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 67, no. 2, pp. 301–320, Apr. 2005.

J. Friedman et al., "glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models." Jun. 02, 2008.

E. A. Mohammed, C. Naugler, and B. H. Far, "Emerging Business Intelligence Framework for a Clinical Laboratory Through Big Data Analytics," in Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, Elsevier, 2015, pp. 577–602.

M. A. A. Abdullah, L. Jesintha, G. P. Khuneswari, S. A. M. Jamil, and O. R. Olaniran, "Comparison of Multiple Regression and Model Averaging Model-Building Approach for Missing Data with Multiple Imputation," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18502–18508, Dec. 2024.

Downloads

How to Cite

[1]
Tang, G.Y.V., Pillay, K.G. and Mustapha, A. 2025. The Impact of Data Preprocessing Order on LASSO and Elastic Net Capabilities. Engineering, Technology & Applied Science Research. 15, 1 (Feb. 2025), 20264–20270. DOI:https://doi.org/10.48084/etasr.9611.

Metrics

Abstract Views: 23
PDF Downloads: 15

Metrics Information