Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree

M. Z. Gashti

Abstract


Spam emails is probable the main problem faced by most e-mail users. There are many features in spam email detection and some of these features have little effect on detection and cause skew detection and classification of spam email. Thus, Feature Selection (FS) is one of the key topics in spam email detection systems. With choosing the important and effective features in classification, its performance can be optimized. Selector features has the task of finding a subset of features to improve the accuracy of its predictions. In this paper, a hybrid of Harmony Search Algorithm (HSA) and decision tree is used for selecting the best features and classification. The obtained results on Spam-base dataset show that the rate of recognition accuracy in the proposed model is 95.25% which is high in comparison with models such as SVM, NB, J48 and MLP. Also, the accuracy of the proposed model on the datasets of Ling-spam and PU1 is high in comparison with models such as NB, SVM and LR.


Keywords


Spam Email; Harmony Search Algorithm; Decision Tree

Full Text:

PDF

References


S. Liu, Y. Wang, J. Zhang, C. Chen, Y. Xiang, “Addressing the class imbalance problem in twitter spam detection using ensemble learning”, Computers & Security, 2016 (in press)

A. Heydari, M.A. Tavakoli, N. Salim, Z. Heydari, “Detection of review spam: A survey”, Expert Systems with Applications, Vol. 42, No. 7, pp. 3634-3642, 2015

T. Ouyang, S. Ray, M. Allman, M. Rabinovich, “A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise”, Computer Networks, Vol. 59, pp. 101-121, 2014

N. Perez-Diaz, D. Ruano-Ordas, J. R. Mendez, J. F. Galvez, F. Fdez-Riverola, “Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification”, Applied Soft Computing, Vol. 12, No. 11, pp. 3671-3682, 2012

https://archive.ics.uci.edu/ml/datasets/Spambase

Z. W. Geem, J. H. Kim, G. V. Loganathan, “A New Heuristic Optimization Algorithm: Harmony Search”, Simulation, Vol. 76, No. 2, pp. 60-68, 2001

J. R. Quinlan, Induction of Decision Trees, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986

http://www.csmining.org/index.php/spam-email-datasets-.html

S. Ali, S. Ozawa, J. Nakazato, T. Ban, J. Shimamura, “An autonomous online malicious spam email detection system using extended RBF network”, 2015 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-7, 2015

S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008

S. Salehi, A. Selamat, M. Bostanian, “Enhanced genetic algorithm for spam detection in email”, IEEE 2nd International Conference on Software Engineering and Service Science, pp. 594-597, 2011

M. Prilepok, T. Jezowicz, J. Platos, V. Snasel, “Spam detection using compression and PSO”, IEEE Fourth International Conference on Computational Aspects of Social Networks (CASoN), pp. 263-270, 2012

S. B. Rathod, T. M. Pattewar, “Content based spam detection in email using Bayesian classifier”, IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 1257-1261, 2015

M. Prilepok, M. Kudelka, “Spam Detection Based on Nearest Community Classifier”, IEEE International Conference on Intelligent Networking and Collaborative Systems, pp. 354-359, 2015

S. Salehi, A. Selamat, O. Krejcar, K. Kuca, “Fuzzy Granular Classifier Approach for Spam Detection”, Journal of Intelligent & Fuzzy Systems, vol. 32, no. 2, pp. 1355-1363, 2017

A. R. Behjat, A. Mustapha, H. Nezamabadipour, M. Nasir Sulaiman, N. Mustapha, “A PSO-Based Feature Subset Selection for Application of Spam/Non-spam Detection”, in Soft Computing Applications and Intelligent Systems, Communications in Computer and Information Science, Vol. 378, Springer, Berlin, Heidelberg, 2013

R. S. Michalski, I. Bratko, M. Kubat, Machine Learning and Data Mining: Methods and Applications, New York: Wiley, 1998

D. Francois, Binary classification performances measure cheat sheet, 2009

I. Idris, A. Selamat, “Improved email spam detection model with negative selection algorithm and particle swarm optimization”, Applied Soft Computing, Vol. 22, pp. 11-27, 2014

Y. Zhang, H. Y. Li, M. Niranjan, P. Rockett, “Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering”, Lecture Notes in Computer Science, Genetic Programming, Berlin/Heidelberg, Springer, Vol. 4971, pp. 325-336, 2008

T. Fagbola, S. Olabiyisi, A. Adigun, “Hybrid GA-SVM for efficient feature selection in e-mail classification”, Comput. Eng. Intell. Syst, Vol. 3, No. 3, pp. 17-28, 2012

A. K. Uysal, S. Gunal, “A novel probabilistic feature selection method for text classification”, Knowl. Based Syst., Vol. 36, pp. 226-235, 2012

L. Ozgur, T. Gungor, F. Gurgen, “Spam mail detection using artificial neural network and bayesian filter:, in: Z. Yang, H. Yin, R. Everson (Eds.), Intelligent Data Engineering and Automated Learning- IDEAL 2004, Springer, Berlin/Heidelberg, 2004, pp. 505-510, 2004.

R. Ariaeinejad, A. Sadeghian, “Spam Detection System: A New Approach based on Interval Type-2 Fuzzy Sets”, 24th Canadian Conference on Electrical and Computer Engineering (CCECE, 2011), 2011

I. Idris, A. Selamat, N.T. Nguyen, S. Omatu, O. Krejcar, K. Kuca, M. Penhaker, “A Combined Negative Selection algorithm-Particle Swarm Optimization for an Email Spam Detection System”, Engineering Applications of Artificial Intelligence, Vol. 39, pp. 33-44, 2015

S. Sharma, A. Arora, “Adaptive Approach for Spam Detection”, International Journal of Computer Science Issues, Vol. 10, No. 4, No 1, pp. 23-26, 2013

S. S. Shinde, R. Patil, “Improving Spam Mail Filtering using Classification Algorithms with Discretization Filter”, International Journal of Emerging Technologies in Computational and Applied Sciences, Vol. 10, No. 1, pp. 82-87, 2014.

M. Rathi, V. Pareek, “Spam Mail Detection through Data Mining-A Comparative Performance Analysis”, International Journal of Modern Education and Computer Science, Vol. 12, pp. 31-39, 2013

S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008




eISSN: 1792-8036     pISSN: 2241-4487