A Comparative Analysis of Classification Algorithms on Diverse Datasets

M. Alghobiri

doi:10.48084/etasr.1952

Authors

M. Alghobiri Management Information Systems Department, King Khalid University, Abha, Saudi Arabia

Volume: 8 | Issue: 2 | Pages: 2790-2795 | April 2018 | https://doi.org/10.48084/etasr.1952

Corresponding author: M. Alghobiri

Abstract

Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

Keywords:

data mining, classification algorithms, diverse, dataset

References

N. M. Ramos, J. M. Delgado, R. M. Almeida, M. L. Simoes, S. Manuel, Appliation of Data Mining Techniques in the Analysis of Indoor Hygrothermal Conditions, Springer, 2015 DOI: https://doi.org/10.1007/978-3-319-22294-3

B. Bakhshinategh, O. R. Zaiane, S. ElAtia, D. Ipperciel, “Educational data mining applications and tasks: A survey of the last 10 years”, Education and Information Technologies, Vol. 23, No. 1, pp. 537-553, 2018 DOI: https://doi.org/10.1007/s10639-017-9616-z

F. Ahmed, M. Samorani, C. Bellinger, O. R. Zaiane, “Advantage of integration in big data: Feature generation in multi-relational databases for imbalanced learning”, IEEE International Conference on Big Data, Washington, DC, USA, pp. 532-539, December 5-8, 2016 DOI: https://doi.org/10.1109/BigData.2016.7840644

P. G. Clark, C. Gao, J. W. Grzymala-Busse, “MLEM2 Rule Induction Algorithm with Multiple Scanning Discretization”, Smart Innovation, Systems and Technologies, Vol. 72, pp. 218-227, Springer, 2017 DOI: https://doi.org/10.1007/978-3-319-59421-7_20

H. U. Khan, A. Daud, U. Ishfaq, T. Amjad, N. Aljohani, R. A. Abbasi, J. S. Alowibdi, “Modelling to identify influential bloggers in the blogosphere: a survey”, Computers in Human Behavior, Vol. 68, pp. 64-82, 2017 DOI: https://doi.org/10.1016/j.chb.2016.11.012

H. U. Khan, A. Daud, T. A. Malik, “MIIB: A Metric to identify top influential bloggers in a community”, PloS One, Vol. 10, p. e0138359, 2015 DOI: https://doi.org/10.1371/journal.pone.0138359

U. Ishfaq, H. U. Khan, K. Iqbal, “Modeling to find the top bloggers using sentiment features”, International Conference on Computing, Electronic and Electrical Engineering (ICE Cube), Quetta, Pakistan, pp. 227-233, April 11-12, 2016 DOI: https://doi.org/10.1109/ICECUBE.2016.7495229

U. Ishfaq, H. U. Khan, K. Iqbal, “Identifying the influential bloggers: a modular approach based on sentiment analysis”, Journal of Web Engineering, Vol. 16, pp. 505-523, 2017

H. U. Khan, “Mixed-sentiment classification of web forum posts using lexical and non-lexical features”, Journal of Web Engineering, Vol. 16, pp. 161-176, 2017

H. U. Khan, A. Daud, “Using machine learning techniques for subjectivity analysis based on lexical and non-lexical features”, International Arab Journal of Information Technology, Vol. 14, No. 4, 2017

A. Patel, S. Gandhi, S. Shetty, B. Tekwani, “Heart Disease Prediction Using Data Mining”, International Research Journal of Engineering and Technology, Vol. 4, No. 1, pp. 1705-1707, 2017

T. Pranckevicius, V. Marcinkevicius, “Comparison of Naïve Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification”, Baltic Journal of Modern Computing, Vol. 5, No. 2, pp. 221-232, 2017 DOI: https://doi.org/10.22364/bjmc.2017.5.2.05

P. V. Ngoc, C. V. T. Ngoc, T. V. T. Ngoc, D. N. Duy, “A C4. 5 algorithm for english emotional classification”, in: Evolving Systems, pp. 1-27, Springer Berlin Heidelberg, 2017 DOI: https://doi.org/10.1007/s12530-017-9180-1

C. Sibona, J. Brickey, “A Statistical Comparison of Classification Algorithms on a Single Data Set”, AMCIS 2012 Proceedings, pp. 1-13, AIS Electronic Library, 2012

A. Beque, K. Coussement, R. Gayler, S. Lessmann, “Approaches for credit scorecard calibration: An empirical analysis”, Knowledge-Based Systems, Vol. 134, pp. 213-227, 2017 DOI: https://doi.org/10.1016/j.knosys.2017.07.034

N. S. Ketkar, L. B. Holder, D. J. Cook, “Empirical comparison of graph classification algorithms”, IEEE Symposium on Computational Intelligence and Data Mining, Nashville, USA, pp. 259-266, March 30-April 2, 2009 DOI: https://doi.org/10.1109/CIDM.2009.4938658

R. Dixit, H. Singh, “Comparison of detection and classification algorithms using boolean and fuzzy techniques”, Advances in Fuzzy Systems, Vol. 2012, Article No. 406204, 2012 DOI: https://doi.org/10.1155/2012/406204

T. R. Patil, V. Thakare, S. Sherekar, “A Combined Naïve Bayes and URL Analysis Based Adaptive Technique for Email Classification”, International Journal of Electronics, Communication and Soft Computing Science & Engineering, Special Issue: International Conference on “Advances In Computing, Communication and Intelligence”, pp. 88-90, 2014

M. Esmaeili, A. Arjomandzadeh, R. Shams, M. Zahedi, “An Anti-Spam System using Naive Bayes Method and Feature Selection Methods”, International Journal of Computer Applications, Vol. 165, No. 4, pp. 1-5, 2017 DOI: https://doi.org/10.5120/ijca2017913842

D. D. Arifin, M. A. Bijaksana, “Enhancing spam detection on mobile phone Short Message Service (SMS) performance using FP-growth and Naive Bayes Classifier”, IEEE Asia Pacific Conference on Wireless and Mobile, Bandung, Indonesia, pp. 80-84, September 13-15, 2016

X. Zhuang, Y. Zhu, C.-C. Chang, Q. Peng, F. Khurshid, “A unified score propagation model for web spam demotion algorithm”, Information Retrieval Journal, Vol. 20, No. 6, pp. 547-574, 2017 DOI: https://doi.org/10.1007/s10791-017-9307-9

O. F. Arar, K. Ayan, “A Feature Dependent Naive Bayes Approach and Its Application to the Software Defect Prediction Problem”, Applied Soft Computing, Vol. 59, pp. 197-209, 2017 DOI: https://doi.org/10.1016/j.asoc.2017.05.043

L. Jiang, C. Li, S. Wang, L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification”, Engineering Applications of Artificial Intelligence, Vol. 52, pp. 26-39, 2016 DOI: https://doi.org/10.1016/j.engappai.2016.02.002

Y. An, S. Sun, S. Wang, “Naive Bayes classifiers for music emotion classification based on lyrics”, IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, pp. 635-638, May 24-26, 2017 DOI: https://doi.org/10.1109/ICIS.2017.7960070

H. Lad, M. A. Mehta, “Feature Based Object Mining and Tagging Algorithm for Digital Images”, in: Proceedings of International Conference on Communication and Networks, Singapore, Advances in Intelligent Systems and Computing, Vol. 508, pp. 345-352, 2017 DOI: https://doi.org/10.1007/978-981-10-2750-5_36

H. Zhang, Q. Li, J. Liu, J. Shang, X. Du, L. Zhao, N. Wang, T. Dong, “Crop classification and acreage estimation in North Korea using phenology features”, GIScience & Remote Sensing, Vol. 54, No. 3, pp. 381-406, 2017 DOI: https://doi.org/10.1080/15481603.2016.1276255

P. Delimata, B. Marszał-Paszek, M. Moshkov, P. Paszek, A. Skowron, Z. Suraj, “Comparison of some classification algorithms based on deterministic and nondeterministic decision rules”, in: Transactions on Rough Sets XII, Springer, pp. 90-105, 2010 DOI: https://doi.org/10.1007/978-3-642-14467-7_5

D. Oreski, S. Oreski, B. Klicek, “Effects of dataset characteristics on the performance of feature selection techniques”, Applied Soft Computing, Vol. 52, pp. 109-119, 2017 DOI: https://doi.org/10.1016/j.asoc.2016.12.023

L. Jiang, D. Wang, Z. Cai, X. Yan, “Survey of improving naive bayes for classification”, Lecture Notes in Computer Science, Vol. 4632, Springer, Berlin, Heidelberg, pp. 134-145, 2007 DOI: https://doi.org/10.1007/978-3-540-73871-8_14

C. Cortes, V. Vapnik, “Support-vector networks”, Machine learning, Vol. 20, pp. 273-297, 1995 DOI: https://doi.org/10.1007/BF00994018

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations, Vol. 11, No. 1, pp. 10-18, 2009 DOI: https://doi.org/10.1145/1656274.1656278

R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “WEKA-Experiences with a Java Open-Source Project”, Journal of Machine Learning Research, Vol. 11, pp. 2533-2541, 2010

A Comparative Analysis of Classification Algorithms on Diverse Datasets

Authors

Abstract

Keywords:

References

Downloads

How to Cite

Metrics

License

template

Download the latest version of our template (March 13, 2026)