Enhancing Data Streaming Clustering Algorithms for AutoML in Cloud Environments: A Novel Design Approach
Received: 23 August 2024 | Revised: 9 November 2024 | Accepted: 20 November 2024 | Online: 4 December 2024
Corresponding author: Madhuri H. Parekh
Abstract
The objective of this revision is to enhance existing AutoCloud clustering technology, which demonstrates optimal performance when dealing with clusters of specific dimensions and arrangements. AutoCloud uses the TEDA framework to break down the clustering challenge into two smaller problems, called micro cluster and macro cluster. AutoCloud is an innovative method that eliminates the requirement for any pre-existing understanding of datasets, where clusters can develop and combine when new information and explanations are presented. This study proposes an experimental configuration to generate microclusters and data clouds without imposing a certain topology on static datasets. MLAutoCloud uses a modified distance-based technique, utilizing the big data framework and incorporating the adjusted random index value with the TEDA framework for streaming data. The MLAutoCloud technique yielded optimal cluster numbers and achieved excellent data collection results, as seen in the test results on different datasets. Estimating thickness despite changes in the underlying assumptions is a process that could modify the variables used to provide data. The MLAutoCloud method is an effective way to generate a cloud clustering algorithm in the data streaming section.
Keywords:
AutoCloud, clustering, data streams, big data, machine learning, spark, eccentricity, outliers, anomalyDownloads
References
Q. Song, J. Ni, and G. Wang, "A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1–14, Jan. 2013.
I. Czarnowski and P. Jędrzejowicz, "Kernel-Based Fuzzy C-Means Clustering Algorithm for RBF Network Initialization," in Intelligent Decision Technologies 2016, 2016, pp. 337–347.
J. S. R. Jang, C. T. Sun, and E. Mizutani, "Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review]," IEEE Transactions on Automatic Control, vol. 42, no. 10, pp. 1482–1484, Oct. 1997.
C. G. Bezerra, B. S. J. Costa, L. A. Guedes, and P. P. Angelov, "An evolving approach to data streams clustering based on typicality and eccentricity data analytics," Information Sciences, vol. 518, pp. 13–28, May 2020.
J. Maia et al., "Evolving clustering algorithm based on mixture of typicalities for stream data mining," Future Generation Computer Systems, vol. 106, pp. 672–684, May 2020.
B. S. J. Costa, P. P. Angelov, and L. A. Guedes, "Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier," Neurocomputing, vol. 150, pp. 289–303, Feb. 2015.
B. S. J. Costa, P. P. Angelov, and L. A. Guedes, "Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier," Neurocomputing, vol. 150, pp. 289–303, Feb. 2015.
J. Gama, P. P. Rodrigues, E. Spinosa, and A. Carvalho, "Knowledge Discovery from Data Streams," in Web Intelligence and Security, IOS Press, 2010, pp. 125–138.
A. Lemos, W. Caminhas, and F. Gomide, "Multivariable Gaussian Evolving Fuzzy Modeling System," IEEE Transactions on Fuzzy Systems, vol. 19, no. 1, pp. 91–104, Feb. 2011.
P. Angelov, "Anomaly detection based on eccentricity analysis," in 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), Orlando, FL, USA, Dec. 2014, pp. 1–8.
N. Su, J. Liu, C. Yan, T. Liu, and X. An, "An arbitrary shape clustering algorithm over variable density data streams," Journal of Algorithms & Computational Technology, vol. 11, no. 1, pp. 93–99, Mar. 2017.
J. Jacques and C. Preda, "Functional data clustering: a survey," Advances in Data Analysis and Classification, vol. 8, no. 3, pp. 231–255, Sep. 2014.
A. Lemos, W. Caminhas, and F. Gomide, "Multivariable Gaussian Evolving Fuzzy Modeling System," IEEE Transactions on Fuzzy Systems, vol. 19, no. 1, pp. 91–104, Oct. 2011.
H. L. Nguyen, Y. K. Woon, and W. K. Ng, "A survey on data stream clustering and classification," Knowledge and Information Systems, vol. 45, no. 3, pp. 535–569, Dec. 2015.
D. Puschmann, P. Barnaghi, and R. Tafazolli, "Adaptive Clustering for Dynamic IoT Data Streams," IEEE Internet of Things Journal, vol. 4, no. 1, pp. 64–74, Oct. 2017.
M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi, "Online Clustering for Evolving Data Streams with Online Anomaly Detection," in Advances in Knowledge Discovery and Data Mining, Melbourne, Australia, 2018, pp. 508–521.
S. Mansalis, E. Ntoutsi, N. Pelekis, and Y. Theodoridis, "An evaluation of data stream clustering algorithms," Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 11, no. 4, pp. 167–187, 2018.
M. Sayed-Mouchaweh and E. Lughofer, Learning in Non-Stationary Environments: Methods and Applications. Springer Science & Business Media, 2012.
J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L. F. de Carvalho, and J. Gama, "Data stream clustering: A survey," ACM Computing Surveys, vol. 46, no. 1, Apr. 2013.
A. Amini, T. Y. Wah, and H. Saboohi, "On Density-Based Data Streams Clustering Algorithms: A Survey," Journal of Computer Science and Technology, vol. 29, no. 1, pp. 116–141, Jan. 2014.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann, 2011.
E. Lughofer and M. Sayed-Mouchaweh, "Autonomous data stream clustering implementing split-and-merge concepts – Towards a plug-and-play approach," Information Sciences, vol. 304, pp. 54–79, May 2015.
S. Ding, F. Wu, J. Qian, H. Jia, and F. Jin, "Research on data stream clustering algorithms," Artificial Intelligence Review, vol. 43, no. 4, pp. 593–600, Apr. 2015.
K. Partington and J. A. Cardille, "Uncovering Dominant Land-Cover Patterns of Quebec: Representative Landscapes, Spatial Clusters, and Fences," Land, vol. 2, no. 4, pp. 756–773, Dec. 2013.
P. Angelov, Autonomous Learning Systems: From Data Streams to Knowledge in Real-time. John Wiley & Sons, 2012.
M. Shukla, Y. P. Kosta, and M. Jayswal, "A Modified Approach of OPTICS Algorithm for Data Streams," Engineering, Technology & Applied Science Research, vol. 7, no. 2, pp. 1478–1481, Apr. 2017.
A. I. Abueid, "Big Data and Cloud Computing Opportunities and Application Areas," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14509–14516, Jun. 2024.
M. Parekh and M. Shukla, "Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture," in Information and Communication Technology for Competitive Strategies (ICTCS 2021), Jaipur, India, 2023, pp. 503–514.
P. Chauhan and M. Shukla, "A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm," in 2015 International Conference on Advances in Computer Engineering and Applications, Ghaziabad, India, Mar. 2015, pp. 580–585.
J. Tamboli and M. Shukla, "A survey of outlier detection algorithms for data streams," in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, Mar. 2016, Art. no. 3535–3540.
P. Fränti and S. Sieranoja, "K-means properties on six clustering benchmark datasets," Applied Intelligence, vol. 48, no. 12, pp. 4743–4759, Dec. 2018.
M. Shukla, Y. P. Kosta, and P. Chauhan, "Analysis and evaluation of outlier detection algorithms in data streams," in 2015 International Conference on Computer, Communication and Control (IC4), Indore, India, Sep. 2015, pp. 1–8.
Downloads
How to Cite
License
Copyright (c) 2024 Madhuri H. Parekh, Madhu Shambhu Shukla
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.