Intrusion Detection: Boruta Feature Selection and Semi-Supervised Outlier Clustering with Multi-Dataset Evaluation
Received: 27 August 2025 | Revised: 14 October 2025 | Accepted: 29 October 2025 | Online: 7 November 2025
Corresponding author: Agni Isador Harsapranata
Abstract
Intrusion Detection Systems (IDSs) remain essential as network attacks continue to increase in both volume and sophistication. This study presents a unified, dataset-agnostic preprocessing framework that integrates Boruta-based feature selection with class-wise semi-supervised clustering for outlier reduction before classification. The proposed pipeline standardizes encoding and scaling, prevents label leakage, selects relevant features, filters noise, and maps labels to a binary normal/intrusion classification task. The framework is evaluated on three benchmark datasets, NSL-KDD, UNSW-NB15, and CIC-IDS2017, using five representative classifiers: Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), Convolutional Neural Network (CNN), and Long Short-Term Memory (LTSM), all under a consistent experimental protocol. Ablation studies and paired statistical significance tests are conducted to quantify the individual effects of feature selection and outlier filtering. Results on CIC-IDS2017 demonstrate that the entire pipeline yields consistent and often statistically significant improvements over a simplified baseline. On NSL-KDD, performance gains are model-dependent, whereas on UNSW-NB15, the framework remains competitive with the baseline. Overall test accuracies range from 90.7% to 99.96%, with the best-performing models achieving an AUC-ROC of approximately 1.00. These findings indicate that combining Boruta with semi-supervised outlier reduction provides an effective and generalizable preprocessing strategy for IDS, particularly in heterogeneous network traffic environments.
Keywords:
intrusion detection, network security, boruta, outlier reduction, machine learningDownloads
References
E. S. Shombot, G. Dusserre, R. Bestak, and N. B. Ahmed, ''An application for predicting phishing attacks: A case of implementing a support vector machine learning model,'' Cyber Security and Applications, vol. 2, 2024, Art. no. 100036. DOI: https://doi.org/10.1016/j.csa.2024.100036
C.M. Nalayini, J. Katiravan, S. Geetha, and C. J. I. Eunaicy, ''A novel dual optimized IDS to detect DDoS attack in SDN using hyper tuned RFE and deep grid network,'' Cyber Security and Applications, vol. 2, 2024, Art. no. 100042. DOI: https://doi.org/10.1016/j.csa.2024.100042
T. B. Shana, N. Kumari, M. Agarwal, S. Mondal, and U. Rathnayake, ''Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning,'' Intelligent Systems with Applications, vol. 27, Sept. 2025, Art. no. 200543. DOI: https://doi.org/10.1016/j.iswa.2025.200543
A. Grandhi and S. K. Singh, ''Interrelated dynamic biased feature selection and classification model using enhanced gorilla troops optimizer for intrusion detection,'' Alexandria Engineering Journal, vol. 114, pp. 312–330, Feb. 2025. DOI: https://doi.org/10.1016/j.aej.2024.10.100
F. Alhayan et al., ''Design of advanced intrusion detection in cybersecurity using ensemble of deep learning models with an improved beluga whale optimization algorithm,'' Alexandria Engineering Journal, vol. 121, pp. 90–102, May 2025. DOI: https://doi.org/10.1016/j.aej.2025.02.069
J. Wang et al., ''A Two-Layer Network Intrusion Detection Method Incorporating LSTM and Stacking Ensemble Learning,'' Computers, Materials & Continua, vol. 83, no. 3, pp. 5129–5153, 2025. DOI: https://doi.org/10.32604/cmc.2025.062094
"NSL-KDD." Canadian Institute for Cybersecurity, 2009, [Online]. Available: https://www.unb.ca/cic/datasets/nsl.html.
S. Choudhary and N. Kesswani, ''Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 Datasets using Deep Learning in IoT,'' Procedia Computer Science, vol. 167, pp. 1561–1573, 2020. DOI: https://doi.org/10.1016/j.procs.2020.03.367
K V K. Chithanya and L. V. Reddy, ''Automatic intrusion detection model with secure data storage on cloud using adaptive cyclic shift transposition with enhanced ANFIS classifier,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100073. DOI: https://doi.org/10.1016/j.csa.2024.100073
N. Moustafa, J. Slay, and G. Creech, ''Novel Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation on Large-Scale Networks,'' IEEE Transactions on Big Data, vol. 5, no. 4, pp. 481–494, Dec. 2019. DOI: https://doi.org/10.1109/TBDATA.2017.2715166
N. Moustafa and J. Slay, ''The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,'' Information Security Journal: A Global Perspective, vol. 25, no. 1–3, pp. 18–31, Apr. 2016. DOI: https://doi.org/10.1080/19393555.2015.1125974
N. Moustafa and J. Slay, ''UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),'' in 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, Nov. 2015, pp. 1–6. DOI: https://doi.org/10.1109/MilCIS.2015.7348942
A. M. Alsaffar, M. Nouri-Baygi, and H. M. Zolbanin, ''Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning,'' Journal of Big Data, vol. 11, no. 1, Sept. 2024, Art. no. 133. DOI: https://doi.org/10.1186/s40537-024-00994-7
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ''Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization:,'' in Proceedings of the 4th International Conference on Information Systems Security and Privacy, Madeira, Portugal, 2018, pp. 108–116. DOI: https://doi.org/10.5220/0006639801080116
I. H. Hassan, M. Abdullahi, M. M. Aliyu, S. A. Yusuf, and A. Abdulrahim, ''An improved binary manta ray foraging optimization algorithm based feature selection and random forest classifier for network intrusion detection,'' Intelligent Systems with Applications, vol. 16, Nov. 2022, Art. no. 200114. DOI: https://doi.org/10.1016/j.iswa.2022.200114
G. O. Anyanwu, C. I. Nwakanma, J. M. Lee, and D. S. Kim, ''Novel hyper-tuned ensemble Random Forest algorithm for the detection of false basic safety messages in Internet of Vehicles,'' ICT Express, vol. 9, no. 1, pp. 122–129, Feb. 2023. DOI: https://doi.org/10.1016/j.icte.2022.06.003
O. H. Abdulganiyu, T. A. Tchakoucht, A. E. H. Alaoui, and Y. K. Saheed, ''Attention-driven multi-model architecture for unbalanced network traffic intrusion detection via extreme gradient boosting,'' Intelligent Systems with Applications, vol. 26, June 2025, Art. no. 200519. DOI: https://doi.org/10.1016/j.iswa.2025.200519
A. Alabdulatif, ''GuardianAI: Privacy-preserving federated anomaly detection with differential privacy,'' Array, vol. 26, July 2025, Art. no. 100381. DOI: https://doi.org/10.1016/j.array.2025.100381
T. Q. Al-Ghadi, S. Manickam, I. D. M. Widia, E. R. N. Wulandari, and S. Karuppayah, ''Leveraging federated learning for DoS attack detection in IoT networks based on ensemble feature selection and deep learning models,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100098. DOI: https://doi.org/10.1016/j.csa.2025.100098
D. M. Dhanvijay, M. M. Dhanvijay, and V. H. Kamble, ''Cyber intrusion detection using ensemble of deep learning with prediction scoring based optimized feature sets for IOT networks,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100088. DOI: https://doi.org/10.1016/j.csa.2025.100088
F. S. Alsubaei, ''Smart deep learning model for enhanced IoT intrusion detection,'' Scientific Reports, vol. 15, no. 1, July 2025, Art. no. 20577. DOI: https://doi.org/10.1038/s41598-025-06363-5
D. Kosmanos et al., ''A novel Intrusion Detection System against spoofing attacks in connected Electric Vehicles,'' Array, vol. 5, Mar. 2020, Art. no. 100013. DOI: https://doi.org/10.1016/j.array.2019.100013
F. J. Abdullayeva, ''Distributed denial of service attack detection in E-government cloud via data clustering,'' Array, vol. 15, Sept. 2022, Art. no. 100229. DOI: https://doi.org/10.1016/j.array.2022.100229
Downloads
How to Cite
License
Copyright (c) 2025 Agni Isador Harsapranata, Eko Sediyono, Hindriyanto Dwi Purnomo

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
