Constrained K-Means Classification

P. N. Smyrlis, D. C. Tsouros, M. G. Tsipouras

Abstract


Classification-via-clustering (CvC) is a widely used method, using a clustering procedure to perform classification tasks. In this paper, a novel K-Means-based CvC algorithm is presented, analysed and evaluated. Two additional techniques are employed to reduce the effects of the limitations of K-Means. A hypercube of constraints is defined for each centroid and weights are acquired for each attribute of each class, for the use of a weighted Euclidean distance as a similarity criterion in the clustering procedure. Experiments are made with 42 well–known classification datasets. The experimental results demonstrate that the proposed algorithm outperforms CvC with simple K-Means.


Keywords


classification-via-clustering; k-means; supervised learning

Full Text:

PDF

References


P. N. Tan, Introduction to data mining, Pearson Education India, 2006

R. J. Schalkoff, Artificial neural networks. Vol. 1. McGraw-Hill, New York, 1997

P. Langley, W. Iba, K. Thompson, “An analysis of Bayesian classifiers”, Tenth National Conference On Artificial Intelligence (AAAI-92), USA, July 12–16, 1992

A. Chidanand, F. Damerau, S. M. Weiss, “Automated learning of decision rules for text categorization”, ACM Transactions on Information Systems (TOIS), Vol. 12, No. 3, pp. 233-251, 1994

S. K. M. Wong, W. Ziarko, On optimal decision rules in decision tables, University of Regina, Computer Science Department, 1985

J. R. Quinlan, “Induction of decision trees”, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986

J. R. Quinlan, C4.5: Programming for machine learning, Morgan Kauffmann, 1993

L. Rokach, “Ensemble-based classifiers”, Artificial Intelligence Review, Vol. 33, No. 1-2, pp. 1-39, 2010

M. I. Lopez, J. M. Luna, C. Romero, S. Ventura, “Classification-via-clustering for predicting final marks based on student participation in forums”, International Educational Data Mining Society, 2012

P. V. Gorsevski, P. E. Gessler, P. Jankowski, “Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard”, Journal of Geographical Systems, Vol. 5, No. 3, pp. 223-251, 2003

J. Erman, M. Arlitt, A. Mahanti, “Traffic classification using clustering algorithms”, SIGCOMM Workshop on Mining Network Data, Pisa, Italy, pp. 281-286, September 11-16, 2006

M. Panda, M. R. Patra, “A novel Classification-via-clustering method for anomaly based network intrusion detection system”, International Journal of Recent Trends in Engineering, Vol. 2, No. 1, pp. 1-6, 2009

P. A. Burrough, P. F. M. van Gaans, R. A. MacMillan, “High-resolution landform classification using fuzzy k-means”, Fuzzy Sets and Systems, Vol. 113, No. 1, pp. 37-52, 2000

E. Reuben, B. Pfahringer, G. Holmes “Clustering for classification”, 7th International Conference on Information Technology in Asia, Malaysia, July 12-13, 2011

S. Lloyd, “Least squares quantization in pcm”, IEEE Transactions on Information Theory, Vol. 28, No. 2, pp. 129-137, 1982

J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Vol. 1, No. 14, 1967

K. Wagstaff, C. Cardie, S. Rogers, S. Schrodl, “Constrained k-means clustering with background knowledge”, Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584, Morgan Kaufmann, 2001

I. Davidson, S. S. Ravi, “Clustering with constraints: Feasibility issues and the k-means algorithm”, Proceedings of the 2005 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2005

P. S. Bradley, K. P. Bennett, A. Demiriz, “Constrained k-means clustering”, Microsoft Research, Redmond, pp. 1-8, 2000

A. Likas, N. Vlassis, J. J. Verbeek, “The global k-means clustering algorithm”, Pattern Recognition, Vol. 36, No. 2, pp. 451-461, 2003

J. C. Bezdek, R. Ehrlich, W. Full, “FCM: The fuzzy c-means clustering algorithm”, Computers & Geosciences, Vol. 10, No. 2-3, pp. 191-203, 1984

D. Dua, K. E. Taniskidou, UCI Machine Learning Repository, Irvine, University of California, School of Information and Computer Science, 2017

R. Payam, L. Tang, H. Liu, “Cross-validation” in: Encyclopedia of Database Systems, pp. 532-538, Springer, 2009

G. Tsoumakas, I. Katakis, “Multi-label classification: An overview”, International Journal of Data Warehousing and Mining, Vol. 3, No. 3, 2006

M. Inaba, K. Naoki, I. Hiroshi, “Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering”, Proceedings of the 10th ACM Annual Symposium on Computational Geometry, pp. 332-339, ACM, 1994

P. S. Bradley, U. M. Fayyad, “Refining Initial Points for K-Means Clustering”, Proceedings of the 15th International Conference on Machine Learning (ICML98), pp. 91- 99, Morgan Kaufmann, 1998

J. M. Pena, J. A. Lozano, P. Larranaga, “An empirical comparison of four initialization methods for the k-means algorithm”, Pattern Recognition Letters, Vol. 20, No. 10, pp. 1027-1040, 1999

S. Kehar, D. Malik, N. Sharma, “Evolving limitations in K-means algorithm in data mining and their removal”, International Journal of Computational Engineering & Management, Vol. 12, pp. 105-109, 2011

M. Hall, F. Eibe, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter , Vol. 11, No. 1, pp. 10-18, 2009

T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural Computation, Vol. 10, No. 7, pp. 1895-1923, 1998

D. C. Tsouros, P. N. Smyrlis, M. G. Tsipouras, D. G. Tsalikakis, N. Giannakeas, A. T. Tzallas, P. Manousou, “Automated collagen proportional area extraction in liver biopsy images using a novel classification-via-clustering algorithm”, IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), Greece, June 22-24, 2017




eISSN: 1792-8036     pISSN: 2241-4487