Constrained K-Means Classification

Authors

  • P. N. Smyrlis Department of Informatics and Telecommunications Engineering, University of Western Macedonia, Greece
  • D. C. Tsouros Department of Informatics and Telecommunications Engineering, University of Western Macedonia, Greece
  • M. G. Tsipouras Department of Informatics and Telecommunications Engineering, University of Western Macedonia, Greece
Volume: 8 | Issue: 4 | Pages: 3203-3208 | August 2018 | https://doi.org/10.48084/etasr.2149

Abstract

Classification-via-clustering (CvC) is a widely used method, using a clustering procedure to perform classification tasks. In this paper, a novel K-Means-based CvC algorithm is presented, analysed and evaluated. Two additional techniques are employed to reduce the effects of the limitations of K-Means. A hypercube of constraints is defined for each centroid and weights are acquired for each attribute of each class, for the use of a weighted Euclidean distance as a similarity criterion in the clustering procedure. Experiments are made with 42 well–known classification datasets. The experimental results demonstrate that the proposed algorithm outperforms CvC with simple K-Means.

Keywords:

classification-via-clustering, k-means, supervised learning

Downloads

Download data is not yet available.

References

P. N. Tan, Introduction to data mining, Pearson Education India, 2006

R. J. Schalkoff, Artificial neural networks. Vol. 1. McGraw-Hill, New York, 1997

P. Langley, W. Iba, K. Thompson, “An analysis of Bayesian classifiers”, Tenth National Conference On Artificial Intelligence (AAAI-92), USA, July 12–16, 1992

A. Chidanand, F. Damerau, S. M. Weiss, “Automated learning of decision rules for text categorization”, ACM Transactions on Information Systems (TOIS), Vol. 12, No. 3, pp. 233-251, 1994 DOI: https://doi.org/10.1145/183422.183423

S. K. M. Wong, W. Ziarko, On optimal decision rules in decision tables, University of Regina, Computer Science Department, 1985

J. R. Quinlan, “Induction of decision trees”, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986 DOI: https://doi.org/10.1007/BF00116251

J. R. Quinlan, C4.5: Programming for machine learning, Morgan Kauffmann, 1993

L. Rokach, “Ensemble-based classifiers”, Artificial Intelligence Review, Vol. 33, No. 1-2, pp. 1-39, 2010 DOI: https://doi.org/10.1007/s10462-009-9124-7

M. I. Lopez, J. M. Luna, C. Romero, S. Ventura, “Classification-via-clustering for predicting final marks based on student participation in forums”, International Educational Data Mining Society, 2012

P. V. Gorsevski, P. E. Gessler, P. Jankowski, “Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard”, Journal of Geographical Systems, Vol. 5, No. 3, pp. 223-251, 2003 DOI: https://doi.org/10.1007/s10109-003-0113-0

J. Erman, M. Arlitt, A. Mahanti, “Traffic classification using clustering algorithms”, SIGCOMM Workshop on Mining Network Data, Pisa, Italy, pp. 281-286, September 11-16, 2006 DOI: https://doi.org/10.1145/1162678.1162679

M. Panda, M. R. Patra, “A novel Classification-via-clustering method for anomaly based network intrusion detection system”, International Journal of Recent Trends in Engineering, Vol. 2, No. 1, pp. 1-6, 2009

P. A. Burrough, P. F. M. van Gaans, R. A. MacMillan, “High-resolution landform classification using fuzzy k-means”, Fuzzy Sets and Systems, Vol. 113, No. 1, pp. 37-52, 2000 DOI: https://doi.org/10.1016/S0165-0114(99)00011-1

E. Reuben, B. Pfahringer, G. Holmes “Clustering for classification”, 7th International Conference on Information Technology in Asia, Malaysia, July 12-13, 2011

S. Lloyd, “Least squares quantization in pcm”, IEEE Transactions on Information Theory, Vol. 28, No. 2, pp. 129-137, 1982 DOI: https://doi.org/10.1109/TIT.1982.1056489

J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Vol. 1, No. 14, 1967

K. Wagstaff, C. Cardie, S. Rogers, S. Schrodl, “Constrained k-means clustering with background knowledge”, Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584, Morgan Kaufmann, 2001

I. Davidson, S. S. Ravi, “Clustering with constraints: Feasibility issues and the k-means algorithm”, Proceedings of the 2005 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2005 DOI: https://doi.org/10.1137/1.9781611972757.13

P. S. Bradley, K. P. Bennett, A. Demiriz, “Constrained k-means clustering”, Microsoft Research, Redmond, pp. 1-8, 2000

A. Likas, N. Vlassis, J. J. Verbeek, “The global k-means clustering algorithm”, Pattern Recognition, Vol. 36, No. 2, pp. 451-461, 2003 DOI: https://doi.org/10.1016/S0031-3203(02)00060-2

J. C. Bezdek, R. Ehrlich, W. Full, “FCM: The fuzzy c-means clustering algorithm”, Computers & Geosciences, Vol. 10, No. 2-3, pp. 191-203, 1984 DOI: https://doi.org/10.1016/0098-3004(84)90020-7

D. Dua, K. E. Taniskidou, UCI Machine Learning Repository, Irvine, University of California, School of Information and Computer Science, 2017

R. Payam, L. Tang, H. Liu, “Cross-validation” in: Encyclopedia of Database Systems, pp. 532-538, Springer, 2009 DOI: https://doi.org/10.1007/978-0-387-39940-9_565

G. Tsoumakas, I. Katakis, “Multi-label classification: An overview”, International Journal of Data Warehousing and Mining, Vol. 3, No. 3, 2006 DOI: https://doi.org/10.4018/jdwm.2007070101

M. Inaba, K. Naoki, I. Hiroshi, “Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering”, Proceedings of the 10th ACM Annual Symposium on Computational Geometry, pp. 332-339, ACM, 1994 DOI: https://doi.org/10.1145/177424.178042

P. S. Bradley, U. M. Fayyad, “Refining Initial Points for K-Means Clustering”, Proceedings of the 15th International Conference on Machine Learning (ICML98), pp. 91- 99, Morgan Kaufmann, 1998

J. M. Pena, J. A. Lozano, P. Larranaga, “An empirical comparison of four initialization methods for the k-means algorithm”, Pattern Recognition Letters, Vol. 20, No. 10, pp. 1027-1040, 1999 DOI: https://doi.org/10.1016/S0167-8655(99)00069-0

S. Kehar, D. Malik, N. Sharma, “Evolving limitations in K-means algorithm in data mining and their removal”, International Journal of Computational Engineering & Management, Vol. 12, pp. 105-109, 2011

M. Hall, F. Eibe, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter , Vol. 11, No. 1, pp. 10-18, 2009 DOI: https://doi.org/10.1145/1656274.1656278

T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural Computation, Vol. 10, No. 7, pp. 1895-1923, 1998 DOI: https://doi.org/10.1162/089976698300017197

D. C. Tsouros, P. N. Smyrlis, M. G. Tsipouras, D. G. Tsalikakis, N. Giannakeas, A. T. Tzallas, P. Manousou, “Automated collagen proportional area extraction in liver biopsy images using a novel classification-via-clustering algorithm”, IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), Greece, June 22-24, 2017 DOI: https://doi.org/10.1109/CBMS.2017.99

Downloads

How to Cite

[1]
P. N. Smyrlis, D. C. Tsouros, and M. G. Tsipouras, “Constrained K-Means Classification”, Eng. Technol. Appl. Sci. Res., vol. 8, no. 4, pp. 3203–3208, Aug. 2018.

Metrics

Abstract Views: 414
PDF Downloads: 220

Metrics Information
Bookmark and Share

Most read articles by the same author(s)