Adaptive Method for Feature Selection in the Machine Learning Context
Received: 2 April 2024 | Revised: 11 April 2024 | Accepted: 12 April 2024 | Online: 16 April 2024
Corresponding author: Yamen El Touati
Abstract
Feature selection is a fundamental aspect of machine learning that is crucial for improving the accuracy and efficiency of models. It carefully analyzes the abundance of data to identify the most significant characteristics, hence improving the accuracy of predictions and minimizing the likelihood of model overfitting. This technique not only optimizes model training by reducing computational requirements, but also enhances the model's interpretability, resulting in more transparent and reliable predictions. The deliberate omission of unnecessary variables is a process of improving the model and also constitutes a crucial measure toward achieving more flexible and comprehensible results in machine learning. An analysis to assess the effectiveness of feature selection on regression models was conducted. The impact was measured using Mean Squared Error (MSE) metrics. A variety of regression algorithms were evaluated, and then feature selection techniques, including statistical and algorithmic methods, such as SelectKBest, PCA, and RFE with Linear Regression and Random Forest, were applied. After selecting the features, linear models demonstrated improvements in mean squared error (MSE), highlighting the value of removing unnecessary data. This study emphasizes the subtle impact of feature selection on model performance, calling for a tailored strategy to maximize prediction accuracy.
Keywords:
cloud computing, cyber security, preventive approach, prediction techniques, artificial intelligenceDownloads
References
A. L’Heureux, K. Grolinger, H. F. Elyamany, and M. A. M. Capretz, "Machine Learning With Big Data: Challenges and Approaches," IEEE Access, vol. 5, pp. 7776–7797, 2017. DOI: https://doi.org/10.1109/ACCESS.2017.2696365
Y. Peng and M. H. Nagata, "An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data," Chaos, Solitons, and Fractals, vol. 139, Oct. 2020, Art. no. 110055. DOI: https://doi.org/10.1016/j.chaos.2020.110055
S. Khalid, T. Khalil, and S. Nasreen, "A survey of feature selection and feature extraction techniques in machine learning," in 2014 Science and Information Conference, London, UK, Dec. 2014, pp. 372–378. DOI: https://doi.org/10.1109/SAI.2014.6918213
S. Nuanmeesri and W. Sriurai, "Multi-Layer Perceptron Neural Network Model Development for Chili Pepper Disease Diagnosis Using Filter and Wrapper Feature Selection Methods," Engineering, Technology & Applied Science Research, vol. 11, no. 5, pp. 7714–7719, Oct. 2021. DOI: https://doi.org/10.48084/etasr.4383
K. Kira and L. A. Rendell, "A Practical Approach to Feature Selection," in Machine Learning Proceedings 1992, D. Sleeman and P. Edwards, Eds. San Francisco, CA, USA: Morgan Kaufmann, 1992, pp. 249–256. DOI: https://doi.org/10.1016/B978-1-55860-247-2.50037-1
H.-H. Hsu and C.-W. Hsieh, "Feature Selection via Correlation Coefficient Clustering," Journal of Software, vol. 5, no. 12, pp. 1371–1377, Dec. 2010. DOI: https://doi.org/10.4304/jsw.5.12.1371-1377
N. V. Kimmatkar and B. V. Babu, "Human Emotion Detection with Electroencephalography Signals and Accuracy Analysis Using Feature Fusion Techniques and a Multimodal Approach for Multiclass Classification," Engineering, Technology & Applied Science Research, vol. 12, no. 4, pp. 9012–9017, Aug. 2022. DOI: https://doi.org/10.48084/etasr.5073
F. L. da Silva, M. L. Grassi Sella, T. M. Francoy, and A. H. R. Costa, "Evaluating classification and feature selection techniques for honeybee subspecies identification using wing images," Computers and Electronics in Agriculture, vol. 114, pp. 68–77, Jun. 2015. DOI: https://doi.org/10.1016/j.compag.2015.03.012
P. More and P. Mishra, "Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time Network Threat Detection," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6270–6275, Oct. 2020. DOI: https://doi.org/10.48084/etasr.3801
Z. Liu, J. Yang, L. Wang, and Y. Chang, "A novel relation aware wrapper method for feature selection," Pattern Recognition, vol. 140, Aug. 2023, Art. no. 109566. DOI: https://doi.org/10.1016/j.patcog.2023.109566
D. K. Singh and M. Shrivastava, "Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7130–7134, Jun. 2021. DOI: https://doi.org/10.48084/etasr.4149
C.-W. Chen, Y.-H. Tsai, F.-R. Chang, and W.-C. Lin, "Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results," Expert Systems, vol. 37, no. 5, 2020, Art. no. e12553. DOI: https://doi.org/10.1111/exsy.12553
R. Tibshirani, "The lasso method for variable selection in the Cox model," Statistics in Medicine, vol. 16, no. 4, pp. 385–395, Feb. 1997. DOI: https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
C. "Ann" Ratanamahatana and D. Gunopulos, "Feature selection for the naive bayesian classifier using decision trees," Applied Artificial Intelligence, vol. 17, no. 5–6, pp. 475–487, May 2003. DOI: https://doi.org/10.1080/713827175
T. Suryakanthi, "Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*," International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612–619, Jan. 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0110277
Y. Yuan, L. Wu, and X. Zhang, "Gini-Impurity Index Analysis," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3154–3169, 2021. DOI: https://doi.org/10.1109/TIFS.2021.3076932
E. Shalev, "Countries_Happiness/country_statsd.csv at master · Elaishalev/Countries_Happiness," GitHub. https://github.com/Elaishalev/Countries_Happiness/blob/master/country_statsd.csv.
Downloads
How to Cite
License
Copyright (c) 2024 Yamen El Touati, Jihane Ben Slimane, Taoufik Saidani
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.