Performance Comparison of Ensemble Learning and Supervised Algorithms in Classifying Multi-label Network Traffic Flow

Authors

  • M. Machoke School of Computational and Communication Science and Engineering, Department of Information Technology System Development and Management, NM-AIST, Tanzania
  • J. Mbelwa University of Dar es Salaam, Tanzania
  • J. Agbinya School of Information Technology and Engineering, Melbourne Institute of Technology, Australia
  • A. E. Sam School of Computational and Communication Science and Engineering (DoCSE), Department of Communication Science and Engineering (CoSE), The Nelson Mandela African Institution of Science and Technology, Tanzania

Abstract

Network traffic classification is of significant importance. It helps identify network anomalies and assists in taking measures to avoid them. However, classifying network traffic correctly is a challenging task. This study aims to compare ensemble learning methods with normal supervised classification to come up with improved classification methods. Three types of network traffic were classified (Benign, Malicious, and Outliers). The data were collected experimentally by using Paessler Router Traffic Grapher software and online and were analyzed by R software. The datasets were used to train five supervised models (k-nearest neighbors, mixture discriminant analysis, Naïve Bayes, C5.0 classification model, and regularized discriminant analysis). The models were trained by 70% of the samples and the rest 30% were used for validation. The same samples were used separately in predicting individual accuracy. The results were compared to the ensemble learning models which were built with the use of the same datasets. Among the five supervised classifiers, k-nearest neighbors and C5.0 classification scored the highest accuracy of 0.868 and 0.761. The ensemble learning classifiers Bagging (Random Forest) and Boosting (eXtreme Gradient Boosting) had accuracy of 0.904 and 0.902 respectively. The results show that the ensemble learning method has higher accuracy compared to the normal supervised classifiers. Therefore, it can be used to detect malicious activities in network traffic as well as anomalies with improved accuracy.

Keywords:

security , ensemble, malicious, anomalies

Downloads

Download data is not yet available.

References

G. Aceto, V. Persico, and A. Pescape, "The role of Information and Communication Technologies in healthcare: taxonomies, perspectives, and challenges," Journal of Network and Computer Applications, vol. 107, pp. 125–154, Apr. 2018. DOI: https://doi.org/10.1016/j.jnca.2018.02.008

S. Morgan, "The 2020 Data Attack Surface Report," Arcserve, 2020.

J. Shi, C. Pan, W. Zhang, and M. Chen, "Performance Analysis for User-Centric Dense Networks With mmWave," IEEE Access, vol. 7, pp. 14537–14548, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2893403

TCRA, "A: TELECOM SERVICES," 2021

G. Ali, M. Ally Dida, and A. Elikana Sam, "Two-Factor Authentication Scheme for Mobile Money: A Review of Threat Models and Countermeasures," Future Internet, vol. 12, no. 10, Oct. 2020, Art. no. 160. DOI: https://doi.org/10.3390/fi12100160

N. B. Amor, S. Benferhat, and Z. Elouedi, "Naive Bayes vs decision trees in intrusion detection systems," in ACM Symposium on Applied Computing, Nicosia, Cyprus, Mar. 2004, pp. 420–424. DOI: https://doi.org/10.1145/967900.967989

X. Liu et al., "Attention-based bidirectional GRU networks for efficient HTTPS traffic classification," Information Sciences, vol. 541, pp. 297–315, Dec. 2020. DOI: https://doi.org/10.1016/j.ins.2020.05.035

P. Barford and D. Plonka, "Characteristics of network traffic flow anomalies," in 1st ACM SIGCOMM Workshop on Internet measurement, San Francisco, CA, USA, Nov. 2001, pp. 69–73. DOI: https://doi.org/10.1145/505202.505211

L. Machlica, K. Bartos, and M. Sofka, "Learning detectors of malicious web requests for intrusion detection in network traffic," arXiv:1702.02530 [cs, stat], Feb. 2017, Accessed: Apr. 20, 2022.

S. Manaseer, O. Al-Nahar, and A. Hyassat, "Network Traffic Modeling, Case Study: The University of Jordan," International Journal of Recent Technology and Engineering, vol. 7, no. 5, pp. 13–16, Jan. 2019.

K. Aldriwish, "A Deep Learning Approach for Malware and Software Piracy Threat Detection," Engineering, Technology & Applied Science Research, vol. 11, no. 6, pp. 7757–7762, Dec. 2021. DOI: https://doi.org/10.48084/etasr.4412

Z. Liu, N. Su, Y. Qin, J. Lu, and X. Li, "A Deep Random Forest Model on Spark for Network Intrusion Detection," Mobile Information Systems, vol. 2020, Dec. 2020, Art. no. e6633252. DOI: https://doi.org/10.1155/2020/6633252

K. Demertzis, K. Tsiknas, D. Takezis, C. Skianis, and L. Iliadis, "Darknet Traffic Big-Data Analysis and Network Management for Real-Time Automating of the Malicious Intent Detection Process by a Weight Agnostic Neural Networks Framework," Electronics, vol. 10, no. 7, Jan. 2021, Art. no. 781. DOI: https://doi.org/10.3390/electronics10070781

A. D’Alconzo, I. Drago, A. Morichetta, M. Mellia, and P. Casas, "A Survey on Big Data for Network Traffic Monitoring and Analysis," IEEE Transactions on Network and Service Management, vol. 16, no. 3, pp. 800–813, Sep. 2019. DOI: https://doi.org/10.1109/TNSM.2019.2933358

R. de O. Schmidt, R. Sadre, and A. Pras, "Gaussian traffic revisited," in IFIP Networking Conference, Brooklyn, NY, USA, Dec. 2013, pp. 1–9.

M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255–260, Jul. 2015. DOI: https://doi.org/10.1126/science.aaa8415

C. Dong, C. Zhang, Z. Lu, B. Liu, and B. Jiang, "CETAnalytics: Comprehensive effective traffic information analytics for encrypted traffic classification," Computer Networks, vol. 176, Jul. 2020, Art. no. 107258. DOI: https://doi.org/10.1016/j.comnet.2020.107258

W. Ruan, Y. Liu, and R. Zhao, "Pattern Discovery in DNS Query Traffic," Procedia Computer Science, vol. 17, pp. 80–87, Jan. 2013. DOI: https://doi.org/10.1016/j.procs.2013.05.012

H. He, X. Luo, F. Ma, C. Che, and J. Wang, "Network traffic classification based on ensemble learning and co-training," Science in China Series F: Information Sciences, vol. 52, no. 2, pp. 338–346, Feb. 2009. DOI: https://doi.org/10.1007/s11432-009-0050-8

B. Yamansavascilar, M. A. Guvensan, A. G. Yavuz, and M. E. Karsligil, "Application identification via network traffic classification," in International Conference on Computing, Networking and Communications, Silicon Valley, CA, USA, Jan. 2017, pp. 843–848. DOI: https://doi.org/10.1109/ICCNC.2017.7876241

B. Zhang, Z. Liu, Y. Jia, J. Ren, and X. Zhao, "Network Intrusion Detection Method Based on PCA and Bayes Algorithm," Security and Communication Networks, vol. 2018, Nov. 2018, Art. no. e1914980. DOI: https://doi.org/10.1155/2018/1914980

G. Harinahalli Lokesh and G. BoreGowda, "Phishing website detection based on effective machine learning approach," Journal of Cyber Security Technology, vol. 5, no. 1, pp. 1–14, Jan. 2021. DOI: https://doi.org/10.1080/23742917.2020.1813396

A. Hussein, J. Agbinya, and I. Satti, "A Survey on Data mining Techniques for Water Flow Forecasting," Australian Journal of Basic and Applied Sciences, vol. 14, no. 3, pp. 13–27, 2019.

S. E. Gomez, L. Hernandez-Callejo, B. C. Martinez, and A. J. Sanchez-Esguevillas, "Exploratory study on Class Imbalance and solutions for Network Traffic Classification," Neurocomputing, vol. 343, pp. 100–119, May 2019. DOI: https://doi.org/10.1016/j.neucom.2018.07.091

M. Soysal and E. G. Schmidt, "Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison," Performance Evaluation, vol. 67, no. 6, pp. 451–467, Jun. 2010. DOI: https://doi.org/10.1016/j.peva.2010.01.001

F. Dehghani, N. Movahhedinia, M. R. Khayyambashi, and S. Kianian, "Real-Time Traffic Classification Based on Statistical and Payload Content Features," in 2nd International Workshop on Intelligent Systems and Applications, Wuhan, China, Dec. 2010, pp. 1–4. DOI: https://doi.org/10.1109/IWISA.2010.5473467

R. M. AlZoman and M. J. F. Alenazi, "A Comparative Study of Traffic Classification Techniques for Smart City Networks," Sensors, vol. 21, no. 14, Jan. 2021, Art. no. 4677. DOI: https://doi.org/10.3390/s21144677

D. Chopra, N. Joshi, and I. Mathur, "Improving Translation Quality By Using Ensemble Approach," Engineering, Technology & Applied Science Research, vol. 8, no. 6, pp. 3512–3514, Dec. 2018. DOI: https://doi.org/10.48084/etasr.2269

A. J. Wyner, M. Olson, J. Bleich, and D. Mease, "Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers," Journal of Machine Learning Research, vol. 18, no. 48, pp. 1–33, 2017.

G. S. Oreku, F. J. Mtenzi, and C. A. Shoniregun, "Traffic classification and packet detections to facilitate networks security," International Journal of Internet Technology and Secured Transactions, vol. 3, no. 3, pp. 240–252, Jan. 2011. DOI: https://doi.org/10.1504/IJITST.2011.041294

R. Mills, "LUFlow Network Intrusion Detection Data Set." https://www.kaggle.com/mryanm/luflow-network-intrusion-detection-data-set (accessed Apr. 20, 2022).

B. Hudson, "Understanding Encrypted Traffic Using ‘Joy’ for Monitoring and Forensics," presented at the Cisco Live!, Orlando, FL, USA, Jun. 10, 2018.

R Core Team, R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018.

J. J. Allaire, "RStudio: Integrated Development Environment for R," presented at the The R User Conference 2011, Coventry, UK, Aug. 2011.

J. Hauke and T. Kossowski, "Comparison of Values of Pearson’s and Spearman’s Correlation Coefficients on the Same Sets of Data," Quaestiones Geographicae, vol. 30, no. 2, pp. 87–93, Jun. 2011. DOI: https://doi.org/10.2478/v10117-011-0021-1

J. I. Daoud, "Multicollinearity and Regression Analysis," Journal of Physics: Conference Series, vol. 949, Sep. 2017, Art. no. 012009. DOI: https://doi.org/10.1088/1742-6596/949/1/012009

J. R. Quinlan, "Book Review : C4 . 5 : Programs for Machine Learning," Machine Learning, vol. 16, pp. 235–240, 1994. DOI: https://doi.org/10.1007/BF00993309

M. Kuhn, "The caret Package," Journal of Statistical Software, vol. 28, Jan. 2012.

S. Garg, "An evaluation of investor acceptability for physical gold using classification (Decision Tree)," Materials Today: Proceedings, vol. 37, pp. 950–954, Jan. 2021. DOI: https://doi.org/10.1016/j.matpr.2020.06.177

D. Hamilton, R. Pacheco, B. Myers, and B. Peltzer, "kNN vs. SVM: A comparison of algorithms," in Fire Continuum-Preparing for the future of wildland fire, Missoula, USA, Dec. 2018, vol. 78, pp. 95–109.

W. Feng, J. Sun, L. Zhang, C. Cao, and Q. Yang, "A support vector machine based naive Bayes algorithm for spam filtering," in 35th International Performance Computing and Communications Conference, Las Vegas, NV, USA, Dec. 2016, pp. 1–8. DOI: https://doi.org/10.1109/PCCC.2016.7820655

L. Jiang, Z. Cai, and D. Wang, "Improving Naive Bayes for Classification," International Journal of Computers and Applications, vol. 32, no. 3, pp. 328–332, Jan. 2010. DOI: https://doi.org/10.2316/Journal.202.2010.3.202-2747

A. Callado et al., "A Survey on Internet Traffic Identification," IEEE Communications Surveys Tutorials, vol. 11, no. 3, pp. 37–52, 2009. DOI: https://doi.org/10.1109/SURV.2009.090304

S. Chen, G. I. Webb, L. Liu, and X. Ma, "A novel selective naïve Bayes algorithm," Knowledge-Based Systems, vol. 192, Mar. 2020, Art. no. 105361. DOI: https://doi.org/10.1016/j.knosys.2019.105361

O. Aouedi, K. Piamrat, and B. Parrein, "Performance evaluation of feature selection and tree-based algorithms for traffic classification," in International Conference on Communications Workshops, Montreal, QC, Canada, Jun. 2021. DOI: https://doi.org/10.1109/ICCWorkshops50388.2021.9473580

J. J. Estevez-Pereira, D. Fernandez, and F. J. Novoa, "Network Anomaly Detection Using Machine Learning Techniques," Proceedings, vol. 54, no. 1, 2020, Art. no. 8. DOI: https://doi.org/10.3390/proceedings2020054008

V. Dutta, M. Choras, M. Pawlicki, and R. Kozik, "A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection," Sensors, vol. 20, no. 16, Jan. 2020, Art. no. 4583. DOI: https://doi.org/10.3390/s20164583

M. Singh, G. Srivastava, and P. Kumar, "Internet Traffic Classification Using Machine Learning," International Journal of Database Theory and Application, vol. 9, pp. 45–54, Dec. 2016. DOI: https://doi.org/10.14257/ijdta.2016.9.12.05

T. C. Obasi, "Encrypted Network Traffic Classification using Ensemble Learning Techniques," M.S. thesis, Carleton University, Ottawa, ON, Canada, 2020.

D. K. Singh and M. Shrivastava, "Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7130–7134, Jun. 2021. DOI: https://doi.org/10.48084/etasr.4149

Downloads

How to Cite

[1]
M. Machoke, J. Mbelwa, J. Agbinya, and A. E. Sam, “Performance Comparison of Ensemble Learning and Supervised Algorithms in Classifying Multi-label Network Traffic Flow”, Eng. Technol. Appl. Sci. Res., vol. 12, no. 3, pp. 8667–8674, Jun. 2022.

Metrics

Abstract Views: 653
PDF Downloads: 505

Metrics Information