Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree

Authors

  • M. Z. Gashti Department of Computer Engineering, Payame Noor University, Iran

Abstract

Spam emails is probable the main problem faced by most e-mail users. There are many features in spam email detection and some of these features have little effect on detection and cause skew detection and classification of spam email. Thus, Feature Selection (FS) is one of the key topics in spam email detection systems. With choosing the important and effective features in classification, its performance can be optimized. Selector features has the task of finding a subset of features to improve the accuracy of its predictions. In this paper, a hybrid of Harmony Search Algorithm (HSA) and decision tree is used for selecting the best features and classification. The obtained results on Spam-base dataset show that the rate of recognition accuracy in the proposed model is 95.25% which is high in comparison with models such as SVM, NB, J48 and MLP. Also, the accuracy of the proposed model on the datasets of Ling-spam and PU1 is high in comparison with models such as NB, SVM and LR.

Keywords:

Spam Email, Harmony Search Algorithm, Decision Tree

Downloads

Download data is not yet available.

References

S. Liu, Y. Wang, J. Zhang, C. Chen, Y. Xiang, “Addressing the class imbalance problem in twitter spam detection using ensemble learning”, Computers & Security, 2016 (in press) DOI: https://doi.org/10.1016/j.cose.2016.12.004

A. Heydari, M.A. Tavakoli, N. Salim, Z. Heydari, “Detection of review spam: A survey”, Expert Systems with Applications, Vol. 42, No. 7, pp. 3634-3642, 2015 DOI: https://doi.org/10.1016/j.eswa.2014.12.029

T. Ouyang, S. Ray, M. Allman, M. Rabinovich, “A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise”, Computer Networks, Vol. 59, pp. 101-121, 2014 DOI: https://doi.org/10.1016/j.comnet.2013.08.031

N. Perez-Diaz, D. Ruano-Ordas, J. R. Mendez, J. F. Galvez, F. Fdez-Riverola, “Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification”, Applied Soft Computing, Vol. 12, No. 11, pp. 3671-3682, 2012 DOI: https://doi.org/10.1016/j.asoc.2012.05.024

https://archive.ics.uci.edu/ml/datasets/Spambase

Z. W. Geem, J. H. Kim, G. V. Loganathan, “A New Heuristic Optimization Algorithm: Harmony Search”, Simulation, Vol. 76, No. 2, pp. 60-68, 2001 DOI: https://doi.org/10.1177/003754970107600201

J. R. Quinlan, Induction of Decision Trees, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986 DOI: https://doi.org/10.1007/BF00116251

http://www.csmining.org/index.php/spam-email-datasets-.html

S. Ali, S. Ozawa, J. Nakazato, T. Ban, J. Shimamura, “An autonomous online malicious spam email detection system using extended RBF network”, 2015 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-7, 2015 DOI: https://doi.org/10.1109/IJCNN.2015.7280826

S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008

S. Salehi, A. Selamat, M. Bostanian, “Enhanced genetic algorithm for spam detection in email”, IEEE 2nd International Conference on Software Engineering and Service Science, pp. 594-597, 2011 DOI: https://doi.org/10.1109/ICSESS.2011.5982390

M. Prilepok, T. Jezowicz, J. Platos, V. Snasel, “Spam detection using compression and PSO”, IEEE Fourth International Conference on Computational Aspects of Social Networks (CASoN), pp. 263-270, 2012 DOI: https://doi.org/10.1109/CASoN.2012.6412413

S. B. Rathod, T. M. Pattewar, “Content based spam detection in email using Bayesian classifier”, IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 1257-1261, 2015 DOI: https://doi.org/10.1109/ICCSP.2015.7322709

M. Prilepok, M. Kudelka, “Spam Detection Based on Nearest Community Classifier”, IEEE International Conference on Intelligent Networking and Collaborative Systems, pp. 354-359, 2015 DOI: https://doi.org/10.1109/INCoS.2015.75

S. Salehi, A. Selamat, O. Krejcar, K. Kuca, “Fuzzy Granular Classifier Approach for Spam Detection”, Journal of Intelligent & Fuzzy Systems, vol. 32, no. 2, pp. 1355-1363, 2017 DOI: https://doi.org/10.3233/JIFS-169133

A. R. Behjat, A. Mustapha, H. Nezamabadipour, M. Nasir Sulaiman, N. Mustapha, “A PSO-Based Feature Subset Selection for Application of Spam/Non-spam Detection”, in Soft Computing Applications and Intelligent Systems, Communications in Computer and Information Science, Vol. 378, Springer, Berlin, Heidelberg, 2013 DOI: https://doi.org/10.1007/978-3-642-40567-9_16

R. S. Michalski, I. Bratko, M. Kubat, Machine Learning and Data Mining: Methods and Applications, New York: Wiley, 1998

D. Francois, Binary classification performances measure cheat sheet, 2009

I. Idris, A. Selamat, “Improved email spam detection model with negative selection algorithm and particle swarm optimization”, Applied Soft Computing, Vol. 22, pp. 11-27, 2014 DOI: https://doi.org/10.1016/j.asoc.2014.05.002

Y. Zhang, H. Y. Li, M. Niranjan, P. Rockett, “Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering”, Lecture Notes in Computer Science, Genetic Programming, Berlin/Heidelberg, Springer, Vol. 4971, pp. 325-336, 2008 DOI: https://doi.org/10.1007/978-3-540-78671-9_28

T. Fagbola, S. Olabiyisi, A. Adigun, “Hybrid GA-SVM for efficient feature selection in e-mail classification”, Comput. Eng. Intell. Syst, Vol. 3, No. 3, pp. 17-28, 2012

A. K. Uysal, S. Gunal, “A novel probabilistic feature selection method for text classification”, Knowl. Based Syst., Vol. 36, pp. 226-235, 2012 DOI: https://doi.org/10.1016/j.knosys.2012.06.005

L. Ozgur, T. Gungor, F. Gurgen, “Spam mail detection using artificial neural network and bayesian filter:, in: Z. Yang, H. Yin, R. Everson (Eds.), Intelligent Data Engineering and Automated Learning- IDEAL 2004, Springer, Berlin/Heidelberg, 2004, pp. 505-510, 2004. DOI: https://doi.org/10.1007/978-3-540-28651-6_74

R. Ariaeinejad, A. Sadeghian, “Spam Detection System: A New Approach based on Interval Type-2 Fuzzy Sets”, 24th Canadian Conference on Electrical and Computer Engineering (CCECE, 2011), 2011 DOI: https://doi.org/10.1109/CCECE.2011.6030477

I. Idris, A. Selamat, N.T. Nguyen, S. Omatu, O. Krejcar, K. Kuca, M. Penhaker, “A Combined Negative Selection algorithm-Particle Swarm Optimization for an Email Spam Detection System”, Engineering Applications of Artificial Intelligence, Vol. 39, pp. 33-44, 2015 DOI: https://doi.org/10.1016/j.engappai.2014.11.001

S. Sharma, A. Arora, “Adaptive Approach for Spam Detection”, International Journal of Computer Science Issues, Vol. 10, No. 4, No 1, pp. 23-26, 2013

S. S. Shinde, R. Patil, “Improving Spam Mail Filtering using Classification Algorithms with Discretization Filter”, International Journal of Emerging Technologies in Computational and Applied Sciences, Vol. 10, No. 1, pp. 82-87, 2014.

M. Rathi, V. Pareek, “Spam Mail Detection through Data Mining-A Comparative Performance Analysis”, International Journal of Modern Education and Computer Science, Vol. 12, pp. 31-39, 2013 DOI: https://doi.org/10.5815/ijmecs.2013.12.05

S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008 DOI: https://doi.org/10.1109/ARES.2008.136

Downloads

How to Cite

[1]
M. Z. Gashti, “Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree”, Eng. Technol. Appl. Sci. Res., vol. 7, no. 3, pp. 1713–1718, Jun. 2017.

Metrics

Abstract Views: 1177
PDF Downloads: 595

Metrics Information