A Modified Approach of OPTICS Algorithm for Data Streams
Data are continuously evolving from a huge variety of applications in huge volume and size. They are fast changing, temporally ordered and thus data mining has become a field of major interest. A mining technique such as clustering is implemented in order to process data streams and generate a set of similar objects as an individual group. Outliers generated in this process are the noisy data points that shows abnormal behavior compared to the normal data points. In order to obtain the clusters of pure quality outliers should be efficiently discovered and discarded. In this paper, a concept of pruning is applied on the stream optics algorithm along with the identification of real outliers, which reduces memory consumption and increases the speed for identifying potential clusters.
L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, R. Motwani “Streaming-Data Algorithms for High-Quality Clustering”, 18th International Conference on Data Engineering, pp. 685-694, February 26-March 1, 2002
C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Clustering Evolving Data Streams”, International Conference on Very Large Databases, Vol. 29, pp. 81-92, 2003
C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Projected Clustering of High Dimensional Data Streams”, Thirtieth International Conference On Very Large Data Bases, Vol. 30, pp. 852-863, 2004
F. Cao, M. Ester, W. Qian, A. Zhou, “Density-based Clustering over an Evolving Data Stream with Noise”, SIAM International Conference on Data Mining and Secure Data Management (SDM), Vol. 6, pp. 328-339, 2006
Li-xiong, H. Hai, G. Yun-fei, and C. Fu-cai, “rDenStream: A Clustering Algorithm over an Evolving Data Stream”, International Conference on Information Engineering and Computer Science, pp. 1-4, December 19-20, 2009
K. Udommanetanakit, T. Rakthanmanon, K. Waiyamai, “E-Stream: Evolution-Based Technique for Stream Clustering”, Lecture Notes in Computer Science, Vol. 4632, pp. 606-616, 2007
C. Dharni, M. Bnasal, “An improvement of DBSCAN Algorithm to analyze cluster for large datasets”, IEEE International Conference on MOOC Innovation and Technology in Education (MITE), pp. 42-46, 2013
M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander, “OPTICS : Ordering Points To Identify the Clustering Structure”, ACM SIGMOD, Vol. 28, No. 2, pp. 49-60, 1999
L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, “Density-based clustering of data streams at multiple resolutions”, ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, No. 3, pp. 1-28, 2009
I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H. Kriegel, “Density-based projected clustering over high dimensional data streams”, Society of Industrial and Applied Mathematics (SIAM) International Conference on Data Mining, pp. 987-998, 2012
A. Amini, T. Y. Wah, “DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window”, International Conference on Data Mining Computer Engineering, pp. 206-211, 2012
Y. Cao, H. He, H. Man, “SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps”, IEEE Transactions on Neural Networks Learning Systems., Vol. 23, No. 8, pp. 1254-1268, 2012.
A. Amini, T. Y. Wah, “LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream”, Journal of Computer Communication, Vol. 1, No. 5, pp. 26-31, 2013
P. P. Rodrigues, J. Gama, J. P. Pedroso, “ODAC: Hierarchical Clustering of Time Series Data Streams”, IEEE Transaction on Knowledge Data Engineering , Vol. 20, No. 5, pp. 615-627, 2008
T. Zhang, R. Ramakrishnan, M. Livny, “BIRCH: An Efficient Data Clustering Databases Method for Very Large”, ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996
E. Keogh, S. Chu, D. Hart, M. Pazzani, “An online algorithm for segmenting time series”, International Conference on Data Mining, pp. 289-296, 2001
Kavita, P. Bedi, “Clustering of Categorized Text Data Using Cobweb Algorithm”, International Journal Computer Science and Information Technology Research, Vol. 3, No. 3, pp. 249-254, 2015
M. Khalilian, N. Mustapha, “Data Stream Clustering: Challenges and Issues”, International Multi Conference of Engineers and Computer Scientists, Vol. 1, Hong Kong, March 17-19, 2010
A. Amini, T. Y. H. Saboohi, “On Density-Based Data Streams Clustering Algorithms:A Survey”, Journal of Computer Science and Technology, Vol. 29, No. 1, pp.116-141, 2014
M. Shukla, Y. P. Kosta, P. Chauhan, “Analysis and evaluation of outlier detection algorithms in data streams”, IEEE International Conference on Computer, Communication and Control (IC4), pp. 1-8, September 10-12, 2015
P. Chauhan, M. Shukla, “A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm”, IEEE International Conference on Advances in Computer Engineering and Applications (ICACEA), pp. 580-585, 2015
M. Kamber, J. Han, Data Mining: Concepts and Techniques, Second edition, Elsevier, 2001
MetricsAbstract Views: 399
PDF Downloads: 121
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.