An Ensemble Approach to Improve the Performance of Real Time Data Stream Classification
Received: 30 July 2024 | Revised: 29 August 2024 | Accepted: 8 September 2024 | Online: 24 October 2024
Corresponding author: Dhara Joshi
Abstract
In the era of the Internet of Things (IoT), data stream mining has gained importance to make accurate and profitable decisions. Various techniques are used to gain insight into data streams, including classification, clustering, pattern mining, etc. Data are subject to changes over time. When this happens, predictive models that assume a static link between input and output variables may perform poorly or even degrade, which is called concept drift. This study proposes an ensemble architecture designed to improve performance and effectively detect concept drift in stream data classification. Using an ensemble approach, the proposed architecture incorporates three classifiers to improve accuracy and robustness against concept drift. The proposed architecture provides drift detection that ensures the model's continued performance by enabling it to be quickly modified to changing data distributions. Through comprehensive testing, the performance of the proposed algorithm was compared with existing methods, and the results demonstrate its superiority in terms of classification accuracy, precision, and recall and drift detection capabilities.
Keywords:
data stream, mining, classification, ensemble, challenges, concept driftDownloads
References
J. Gama, J. Aguilar-Ruiz, and R. Klinkenberg, "Knowledge discovery from data streams," Intelligent Data Analysis, vol. 12, no. 3, pp. 251–252, Jan. 2008.
J. Shan, H. Zhang, W. Liu, and Q. Liu, "Online Active Learning Ensemble Framework for Drifted Data Streams," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 2, pp. 486–498, Oct. 2019.
I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, "Discussion and review on evolving data streams and concept drift adapting," Evolving Systems, vol. 9, no. 1, pp. 1–23, Mar. 2018.
D. Brzezinski and J. Stefanowski, "Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 81–94, Jan. 2014.
S. Wang, L. L. Minku, and X. Yao, "A Systematic Study of Online Class Imbalance Learning With Concept Drift," IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4802–4821, Jul. 2018.
M. M. S. Shukla and M. K. R. Rathod, "Stream data mining and comparative study of classification algorithms," Algorithms, vol. 3, no. 1, pp. 163–168, 2013.
M. Baena-Garcıa et al., "Early Drift Detection Method ?," in Proceedings of the 4th International Workshop in Knowledge Discovery Data Streams, 2006.
A. Masrani, M. Shukla, and K. Makadiya, "Empirical Analysis of Classification Algorithms in Data Stream Mining," in International Conference on Innovative Computing and Communications, Singapore, 2020, pp. 657–669.
D. Joshi and M. Shukla, "A Consolidated Study On Advanced Classification Techniques Used On Stream Data," in 2023 IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC), Rajkot, India, Oct. 2023, pp. 614–619.
J. N. Adams, S. J. van Zelst, T. Rose, and W. M. P. van der Aalst, "Explainable concept drift in process mining," Information Systems, vol. 114, Mar. 2023, Art. no. 102177.
A. Bifet, G. Holmes, and B. Pfahringer, "Leveraging Bagging for Evolving Data Streams," in Machine Learning and Knowledge Discovery in Databases, Barcelona, Spain, 2010, pp. 135–150.
S. G. T. de C. Santos, P. M. Gonçalves Júnior, G. D. dos S. Silva, and R. S. M. de Barros, "Speeding Up Recovery from Concept Drifts," in Machine Learning and Knowledge Discovery in Databases, Nancy, France, 2014, pp. 179–194.
I. Frías-Blanco, A. Verdecia-Cabrera, A. Ortiz-Díaz, and A. Carvalho, "Fast adaptive stacking of ensembles," in Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, Apr. 2016, pp. 929–934.
D. Brzezinski and J. Stefanowski, "Ensemble Classifiers for Imbalanced and Evolving Data Streams," in Data Mining in Time Series and Streaming Databases, vol. 83, World Scientific, 2017, pp. 44–68.
D. Brzeziński and J. Stefanowski, "Accuracy Updated Ensemble for Data Streams with Concept Drift," in Hybrid Artificial Intelligent Systems, Wroclaw, Poland, 2011, pp. 155–163.
B. Krawczyk, B. Pfahringer, and M. Wozniak, "Combining active learning with concept drift detection for data stream mining," in 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, Dec. 2018, pp. 2239–2244.
W. Fan, Y. Huang, H. Wang, and P. S. Yu, "Active Mining of Data Streams," in Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), Apr. 2004, pp. 457–461.
I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes, "Active Learning With Drifting Streaming Data," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 27–39, Jan. 2014.
Y. Wang, M. M. Rosli, N. Musa, and F. Li, "Multi-Class Imbalanced Data Classification: A Systematic Mapping Study," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14183–14190, Jun. 2024.
A. S. Alkarim, A. S. A.-M. Al-Ghamdi, and M. Ragab, "Ensemble Learning-based Algorithms for Traffic Flow Prediction in Smart Traffic Systems," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13090–13094, Apr. 2024.
W. Xu, F. Zhao, and Z. Lu, "Active learning over evolving data streams using paired ensemble framework," in 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), Chiang Mai, Thailand, Feb. 2016, pp. 180–185.
B. Ghuse and S. Dongre, "Data Stream Classification for Anomaly Detection Using Ensemble of Classifiers," in 2023 Global Conference on Information Technologies and Communications (GCITC), Bangalore, India, Dec. 2023, pp. 1–6.
H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet, "A Survey on Ensemble Learning for Data Stream Classification," ACM Computing Surveys, vol. 50, no. 2, pp. 1–36, Mar. 2018.
Downloads
How to Cite
License
Copyright (c) 2024 Dhara Joshi, Madhu Shukla
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.