Intrusion Detection: Boruta Feature Selection and Semi-Supervised Outlier Clustering with Multi-Dataset Evaluation

Agni Isador Harsapranata; Eko Sediyono; Hindriyanto Dwi Purnomo

doi:10.48084/etasr.14351

Authors

Agni Isador Harsapranata Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Indonesia
Eko Sediyono Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Indonesia
Hindriyanto Dwi Purnomo Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Indonesia

Volume: 15 | Issue: 6 | Pages: 30283-30289 | December 2025 | https://doi.org/10.48084/etasr.14351

Received: 27 August 2025 | Revised: 14 October 2025 | Accepted: 29 October 2025 | Online: 7 November 2025

Corresponding author: Agni Isador Harsapranata

Abstract

Intrusion Detection Systems (IDSs) remain essential as network attacks continue to increase in both volume and sophistication. This study presents a unified, dataset-agnostic preprocessing framework that integrates Boruta-based feature selection with class-wise semi-supervised clustering for outlier reduction before classification. The proposed pipeline standardizes encoding and scaling, prevents label leakage, selects relevant features, filters noise, and maps labels to a binary normal/intrusion classification task. The framework is evaluated on three benchmark datasets, NSL-KDD, UNSW-NB15, and CIC-IDS2017, using five representative classifiers: Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), Convolutional Neural Network (CNN), and Long Short-Term Memory (LTSM), all under a consistent experimental protocol. Ablation studies and paired statistical significance tests are conducted to quantify the individual effects of feature selection and outlier filtering. Results on CIC-IDS2017 demonstrate that the entire pipeline yields consistent and often statistically significant improvements over a simplified baseline. On NSL-KDD, performance gains are model-dependent, whereas on UNSW-NB15, the framework remains competitive with the baseline. Overall test accuracies range from 90.7% to 99.96%, with the best-performing models achieving an AUC-ROC of approximately 1.00. These findings indicate that combining Boruta with semi-supervised outlier reduction provides an effective and generalizable preprocessing strategy for IDS, particularly in heterogeneous network traffic environments.

Keywords:

intrusion detection, network security, boruta, outlier reduction, machine learning

Downloads

Download data is not yet available.

References

E. S. Shombot, G. Dusserre, R. Bestak, and N. B. Ahmed, ''An application for predicting phishing attacks: A case of implementing a support vector machine learning model,'' Cyber Security and Applications, vol. 2, 2024, Art. no. 100036. DOI: https://doi.org/10.1016/j.csa.2024.100036

C．M． Nalayini, J. Katiravan, S． Geetha, and C. J． I． Eunaicy, ''A novel dual optimized IDS to detect DDoS attack in SDN using hyper tuned RFE and deep grid network,'' Cyber Security and Applications, vol. 2, 2024, Art. no. 100042. DOI: https://doi.org/10.1016/j.csa.2024.100042

T. B. Shana, N. Kumari, M. Agarwal, S. Mondal, and U. Rathnayake, ''Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning,'' Intelligent Systems with Applications, vol. 27, Sept. 2025, Art. no. 200543. DOI: https://doi.org/10.1016/j.iswa.2025.200543

A. Grandhi and S. K. Singh, ''Interrelated dynamic biased feature selection and classification model using enhanced gorilla troops optimizer for intrusion detection,'' Alexandria Engineering Journal, vol. 114, pp. 312–330, Feb. 2025. DOI: https://doi.org/10.1016/j.aej.2024.10.100

F. Alhayan et al., ''Design of advanced intrusion detection in cybersecurity using ensemble of deep learning models with an improved beluga whale optimization algorithm,'' Alexandria Engineering Journal, vol. 121, pp. 90–102, May 2025. DOI: https://doi.org/10.1016/j.aej.2025.02.069

J. Wang et al., ''A Two-Layer Network Intrusion Detection Method Incorporating LSTM and Stacking Ensemble Learning,'' Computers, Materials & Continua, vol. 83, no. 3, pp. 5129–5153, 2025. DOI: https://doi.org/10.32604/cmc.2025.062094

"NSL-KDD." Canadian Institute for Cybersecurity, 2009, [Online]. Available: https://www.unb.ca/cic/datasets/nsl.html.

S. Choudhary and N. Kesswani, ''Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 Datasets using Deep Learning in IoT,'' Procedia Computer Science, vol. 167, pp. 1561–1573, 2020. DOI: https://doi.org/10.1016/j.procs.2020.03.367

K V K. Chithanya and L. V． Reddy, ''Automatic intrusion detection model with secure data storage on cloud using adaptive cyclic shift transposition with enhanced ANFIS classifier,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100073. DOI: https://doi.org/10.1016/j.csa.2024.100073

N. Moustafa, J. Slay, and G. Creech, ''Novel Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation on Large-Scale Networks,'' IEEE Transactions on Big Data, vol. 5, no. 4, pp. 481–494, Dec. 2019. DOI: https://doi.org/10.1109/TBDATA.2017.2715166

N. Moustafa and J. Slay, ''The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,'' Information Security Journal: A Global Perspective, vol. 25, no. 1–3, pp. 18–31, Apr. 2016. DOI: https://doi.org/10.1080/19393555.2015.1125974

N. Moustafa and J. Slay, ''UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),'' in 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, Nov. 2015, pp. 1–6. DOI: https://doi.org/10.1109/MilCIS.2015.7348942

A. M. Alsaffar, M. Nouri-Baygi, and H. M. Zolbanin, ''Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning,'' Journal of Big Data, vol. 11, no. 1, Sept. 2024, Art. no. 133. DOI: https://doi.org/10.1186/s40537-024-00994-7

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ''Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization:,'' in Proceedings of the 4th International Conference on Information Systems Security and Privacy, Madeira, Portugal, 2018, pp. 108–116. DOI: https://doi.org/10.5220/0006639801080116

I. H. Hassan, M. Abdullahi, M. M. Aliyu, S. A. Yusuf, and A. Abdulrahim, ''An improved binary manta ray foraging optimization algorithm based feature selection and random forest classifier for network intrusion detection,'' Intelligent Systems with Applications, vol. 16, Nov. 2022, Art. no. 200114. DOI: https://doi.org/10.1016/j.iswa.2022.200114

G. O. Anyanwu, C. I. Nwakanma, J. M. Lee, and D. S. Kim, ''Novel hyper-tuned ensemble Random Forest algorithm for the detection of false basic safety messages in Internet of Vehicles,'' ICT Express, vol. 9, no. 1, pp. 122–129, Feb. 2023. DOI: https://doi.org/10.1016/j.icte.2022.06.003

O. H. Abdulganiyu, T. A. Tchakoucht, A. E. H. Alaoui, and Y. K. Saheed, ''Attention-driven multi-model architecture for unbalanced network traffic intrusion detection via extreme gradient boosting,'' Intelligent Systems with Applications, vol. 26, June 2025, Art. no. 200519. DOI: https://doi.org/10.1016/j.iswa.2025.200519

A. Alabdulatif, ''GuardianAI: Privacy-preserving federated anomaly detection with differential privacy,'' Array, vol. 26, July 2025, Art. no. 100381. DOI: https://doi.org/10.1016/j.array.2025.100381

T. Q. Al-Ghadi, S. Manickam, I. D. M. Widia, E. R. N. Wulandari, and S. Karuppayah, ''Leveraging federated learning for DoS attack detection in IoT networks based on ensemble feature selection and deep learning models,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100098. DOI: https://doi.org/10.1016/j.csa.2025.100098

D. M. Dhanvijay, M. M. Dhanvijay, and V. H. Kamble, ''Cyber intrusion detection using ensemble of deep learning with prediction scoring based optimized feature sets for IOT networks,'' Cyber Security and Applications, vol. 3, Dec. 2025, Art. no. 100088. DOI: https://doi.org/10.1016/j.csa.2025.100088

F. S. Alsubaei, ''Smart deep learning model for enhanced IoT intrusion detection,'' Scientific Reports, vol. 15, no. 1, July 2025, Art. no. 20577. DOI: https://doi.org/10.1038/s41598-025-06363-5

D. Kosmanos et al., ''A novel Intrusion Detection System against spoofing attacks in connected Electric Vehicles,'' Array, vol. 5, Mar. 2020, Art. no. 100013. DOI: https://doi.org/10.1016/j.array.2019.100013

F. J. Abdullayeva, ''Distributed denial of service attack detection in E-government cloud via data clustering,'' Array, vol. 15, Sept. 2022, Art. no. 100229. DOI: https://doi.org/10.1016/j.array.2022.100229