Multi-Class Imbalanced Data Classification: A Systematic Mapping Study
Received: 7 March 2024 | Revised: 2 April 2024 | Accepted: 4 April 2024 | Online: 1 June 2024
Corresponding author: Marshima Mohd Rosli
Abstract
Multi-class data classification is distinguished as a significant and challenging research topic in contemporary machine learning, particularly when concerning imbalanced data sets. Hence, a thorough investigation of multi-class imbalanced data classification is becoming increasingly pertinent. In this paper, an overview of multi-class imbalanced data classification was generated via conducting a systematic mapping study, which endeavors to analyze the state of contemporary multi-class imbalanced data classification, with the primary goal of ascertaining the corpus of research undertaken in machine learning. To achieve this aim, 7,164 papers were assessed and the 147 prominent ones were selected from five digital libraries, which were further categorized according to techniques, issues, and types of datasets. After a thorough review of these papers, a taxonomy of multi-class imbalanced data classification techniques is proposed. Based on the results, researchers widely employ algorithmic-level, ensemble, and oversampling strategies to address the issue of multi-class imbalance in medical datasets, primarily to mitigate the impact of challenging data factors. This research highlights an urgent need for more studies on multi-class imbalanced data classification.
Keywords:
multi-class imbalanced data, systematic mapping study, machine learningDownloads
References
P. Branco, L. Torgo, and R. P. Ribeiro, "A Survey of Predictive Modeling on Imbalanced Domains," ACM Computing Surveys, vol. 49, no. 2, pp. 31:1-31:50, Aug. 2016.
J. Forough and S. Momtazi, "Sequential credit card fraud detection: A joint deep neural network and probabilistic graphical model approach," Expert Systems, vol. 39, no. 1, 2022, Art. no. e12795.
A. Rezaeipanah and G. Ahmadi, "Breast Cancer Diagnosis Using Multi-Stage Weight Adjustment In The MLP Neural Network," The Computer Journal, vol. 65, no. 4, pp. 788–804, Apr. 2022.
K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, "Imbalance Problems in Object Detection: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3388–3415, Oct. 2021.
H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
D. Virmani, N. Jain, A. Srivastav, M. Mittal, and S. Mittal, "An Enhanced Binary Classifier Incorporating Weighted Scores," Engineering, Technology & Applied Science Research, vol. 8, no. 2, pp. 2853–2858, Apr. 2018.
N. Behar and M. Shrivastava, "A Novel Model for Breast Cancer Detection and Classification," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9496–9502, Dec. 2022.
L. Abdi and S. Hashemi, "To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 238–251, Jan. 2016.
S. Wang and X. Yao, "Multiclass Imbalance Problems: Analysis and Potential Solutions," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. 1119–1130, Aug. 2012.
M. Liu, M. Dong, and C. Jing, "A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems," Information Sciences, vol. 556, pp. 160–176, May 2021.
A. S. Palli, J. Jaafar, M. A. Hashmani, H. M. Gomes, and A. R. Gilal, "A Hybrid Sampling Approach for Imbalanced Binary and Multi-Class Data Using Clustering Analysis," IEEE Access, vol. 10, pp. 118639–118653, 2022.
B. Kitchenham, "Guidelines for performing Systematic Literature Reviews in Software Engineering," EBSE E-2007-01 Technical Report, Jan. 2007.
A. Idri, H. Benhar, J. L. Fernández-Alemán, and I. Kadi, "A systematic map of medical data preprocessing in knowledge discovery," Computer Methods and Programs in Biomedicine, vol. 162, pp. 69–85, Aug. 2018.
M. Hosni et al., "A systematic mapping study for ensemble classification methods in cardiovascular disease," Artificial Intelligence Review, vol. 54, no. 4, pp. 2827–2861, Apr. 2021.
B. A. Tama and S. Lim, "Ensemble learning for intrusion detection systems: A systematic mapping study and cross-benchmark evaluation," Computer Science Review, vol. 39, Feb. 2021, Art. no. 100357.
J. Edward and M. M. Rosli, "A Systematic Mapping Study on Ensemble-Based Classifier," in 2021 IEEE International Conference on Computing (ICOCO), Nov. 2021, pp. 43–48.
S. Anwar, M. Mohd Rosli, and N. A. S. Abdullah, "Classification of Fault Prediction: A Mapping Study," Pertanika Journal of Science and Technology, vol. 30, pp. 2157–2171, May 2022.
M. Maw, V. Balakrishnan, O. Rana, and S. D. Ravana, "Trends and Patterns of Text Classification Techniques: A Systematic Mapping Sudy," Malaysian Journal of Computer Science, vol. 33, no. 2, pp. 102–117, Apr. 2020.
K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, "Systematic Mapping Studies in Software Engineering," Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, vol. 17, Jun. 2008.
K. Petersen, S. Vakkalanka, and L. Kuzniarz, "Guidelines for conducting systematic mapping studies in software engineering: An update," Information and Software Technology, vol. 64, pp. 1–18, Aug. 2015.
P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, "Lessons from applying the systematic literature review process within the software engineering domain," Journal of Systems and Software, vol. 80, no. 4, pp. 571–583, Apr. 2007.
CodingLifeV, "CodingLifeV/MultiClassSMS." Mar. 05, 2024, [Online]. Available: https://github.com/CodingLifeV/MultiClassSMS.
Downloads
How to Cite
License
Copyright (c) 2024 Yujiang Wang, Marshima Mohd Rosli, Norzilah Musa, Feng Li
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.