A Modified Approach of OPTICS Algorithm for Data Streams

M. Shukla, Y. P. Kosta, M. Jayswal

Abstract


Data are continuously evolving from a huge variety of applications in huge volume and size. They are fast changing, temporally ordered and thus data mining has become a field of major interest. A mining technique such as clustering is implemented in order to process data streams and generate a set of similar objects as an individual group. Outliers generated in this process are the noisy data points that shows abnormal behavior compared to the normal data points. In order to obtain the clusters of pure quality outliers should be efficiently discovered and discarded. In this paper, a concept of pruning is applied on the stream optics algorithm along with the identification of real outliers, which reduces memory consumption and increases the speed for identifying potential clusters.


Keywords


two phase; cluster quality; clustering technique; pruning; time and space complexity; threshold value

Full Text:

PDF

References


L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, R. Motwani “Streaming-Data Algorithms for High-Quality Clustering”, 18th International Conference on Data Engineering, pp. 685-694, February 26-March 1, 2002

C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Clustering Evolving Data Streams”, International Conference on Very Large Databases, Vol. 29, pp. 81-92, 2003

C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Projected Clustering of High Dimensional Data Streams”, Thirtieth International Conference On Very Large Data Bases, Vol. 30, pp. 852-863, 2004

F. Cao, M. Ester, W. Qian, A. Zhou, “Density-based Clustering over an Evolving Data Stream with Noise”, SIAM International Conference on Data Mining and Secure Data Management (SDM), Vol. 6, pp. 328-339, 2006

Li-xiong, H. Hai, G. Yun-fei, and C. Fu-cai, “rDenStream: A Clustering Algorithm over an Evolving Data Stream”, International Conference on Information Engineering and Computer Science, pp. 1-4, December 19-20, 2009

K. Udommanetanakit, T. Rakthanmanon, K. Waiyamai, “E-Stream: Evolution-Based Technique for Stream Clustering”, Lecture Notes in Computer Science, Vol. 4632, pp. 606-616, 2007

C. Dharni, M. Bnasal, “An improvement of DBSCAN Algorithm to analyze cluster for large datasets”, IEEE International Conference on MOOC Innovation and Technology in Education (MITE), pp. 42-46, 2013

M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander, “OPTICS : Ordering Points To Identify the Clustering Structure”, ACM SIGMOD, Vol. 28, No. 2, pp. 49-60, 1999

L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, “Density-based clustering of data streams at multiple resolutions”, ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, No. 3, pp. 1-28, 2009

I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H. Kriegel, “Density-based projected clustering over high dimensional data streams”, Society of Industrial and Applied Mathematics (SIAM) International Conference on Data Mining, pp. 987-998, 2012

A. Amini, T. Y. Wah, “DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window”, International Conference on Data Mining Computer Engineering, pp. 206-211, 2012

Y. Cao, H. He, H. Man, “SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps”, IEEE Transactions on Neural Networks Learning Systems., Vol. 23, No. 8, pp. 1254-1268, 2012.

A. Amini, T. Y. Wah, “LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream”, Journal of Computer Communication, Vol. 1, No. 5, pp. 26-31, 2013

P. P. Rodrigues, J. Gama, J. P. Pedroso, “ODAC: Hierarchical Clustering of Time Series Data Streams”, IEEE Transaction on Knowledge Data Engineering , Vol. 20, No. 5, pp. 615-627, 2008

T. Zhang, R. Ramakrishnan, M. Livny, “BIRCH: An Efficient Data Clustering Databases Method for Very Large”, ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996

E. Keogh, S. Chu, D. Hart, M. Pazzani, “An online algorithm for segmenting time series”, International Conference on Data Mining, pp. 289-296, 2001

Kavita, P. Bedi, “Clustering of Categorized Text Data Using Cobweb Algorithm”, International Journal Computer Science and Information Technology Research, Vol. 3, No. 3, pp. 249-254, 2015

M. Khalilian, N. Mustapha, “Data Stream Clustering: Challenges and Issues”, International Multi Conference of Engineers and Computer Scientists, Vol. 1, Hong Kong, March 17-19, 2010

A. Amini, T. Y. H. Saboohi, “On Density-Based Data Streams Clustering Algorithms:A Survey”, Journal of Computer Science and Technology, Vol. 29, No. 1, pp.116-141, 2014

M. Shukla, Y. P. Kosta, P. Chauhan, “Analysis and evaluation of outlier detection algorithms in data streams”, IEEE International Conference on Computer, Communication and Control (IC4), pp. 1-8, September 10-12, 2015

P. Chauhan, M. Shukla, “A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm”, IEEE International Conference on Advances in Computer Engineering and Applications (ICACEA), pp. 580-585, 2015

M. Kamber, J. Han, Data Mining: Concepts and Techniques, Second edition, Elsevier, 2001




eISSN: 1792-8036     pISSN: 2241-4487