Multi-Class Imbalanced Data Classification: A Systematic Mapping Study

Authors

  • Yujiang Wang College of Computing, Informatics and Mathematics Universiti Teknologi MARA, Malaysia
  • Marshima Mohd Rosli College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Malaysia
  • Norzilah Musa College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Malaysia
  • Feng Li College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Malaysia | College of Computer and Information Engineering, Hebei Finance University, Baoding, China
Volume: 14 | Issue: 3 | Pages: 14183-14190 | June 2024 | https://doi.org/10.48084/etasr.7206

Abstract

Multi-class data classification is distinguished as a significant and challenging research topic in contemporary machine learning, particularly when concerning imbalanced data sets. Hence, a thorough investigation of multi-class imbalanced data classification is becoming increasingly pertinent. In this paper, an overview of multi-class imbalanced data classification was generated via conducting a systematic mapping study, which endeavors to analyze the state of contemporary multi-class imbalanced data classification, with the primary goal of ascertaining the corpus of research undertaken in machine learning. To achieve this aim, 7,164 papers were assessed and the 147 prominent ones were selected from five digital libraries, which were further categorized according to techniques, issues, and types of datasets. After a thorough review of these papers, a taxonomy of multi-class imbalanced data classification techniques is proposed. Based on the results, researchers widely employ algorithmic-level, ensemble, and oversampling strategies to address the issue of multi-class imbalance in medical datasets, primarily to mitigate the impact of challenging data factors. This research highlights an urgent need for more studies on multi-class imbalanced data classification.

Keywords:

multi-class imbalanced data, systematic mapping study, machine learning

Downloads

Download data is not yet available.

References

P. Branco, L. Torgo, and R. P. Ribeiro, "A Survey of Predictive Modeling on Imbalanced Domains," ACM Computing Surveys, vol. 49, no. 2, pp. 31:1-31:50, Aug. 2016.

J. Forough and S. Momtazi, "Sequential credit card fraud detection: A joint deep neural network and probabilistic graphical model approach," Expert Systems, vol. 39, no. 1, 2022, Art. no. e12795.

A. Rezaeipanah and G. Ahmadi, "Breast Cancer Diagnosis Using Multi-Stage Weight Adjustment In The MLP Neural Network," The Computer Journal, vol. 65, no. 4, pp. 788–804, Apr. 2022.

K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, "Imbalance Problems in Object Detection: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3388–3415, Oct. 2021.

H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sep. 2009.

D. Virmani, N. Jain, A. Srivastav, M. Mittal, and S. Mittal, "An Enhanced Binary Classifier Incorporating Weighted Scores," Engineering, Technology & Applied Science Research, vol. 8, no. 2, pp. 2853–2858, Apr. 2018.

N. Behar and M. Shrivastava, "A Novel Model for Breast Cancer Detection and Classification," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9496–9502, Dec. 2022.

L. Abdi and S. Hashemi, "To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 238–251, Jan. 2016.

S. Wang and X. Yao, "Multiclass Imbalance Problems: Analysis and Potential Solutions," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. 1119–1130, Aug. 2012.

M. Liu, M. Dong, and C. Jing, "A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems," Information Sciences, vol. 556, pp. 160–176, May 2021.

A. S. Palli, J. Jaafar, M. A. Hashmani, H. M. Gomes, and A. R. Gilal, "A Hybrid Sampling Approach for Imbalanced Binary and Multi-Class Data Using Clustering Analysis," IEEE Access, vol. 10, pp. 118639–118653, 2022.

B. Kitchenham, "Guidelines for performing Systematic Literature Reviews in Software Engineering," EBSE E-2007-01 Technical Report, Jan. 2007.

A. Idri, H. Benhar, J. L. Fernández-Alemán, and I. Kadi, "A systematic map of medical data preprocessing in knowledge discovery," Computer Methods and Programs in Biomedicine, vol. 162, pp. 69–85, Aug. 2018.

M. Hosni et al., "A systematic mapping study for ensemble classification methods in cardiovascular disease," Artificial Intelligence Review, vol. 54, no. 4, pp. 2827–2861, Apr. 2021.

B. A. Tama and S. Lim, "Ensemble learning for intrusion detection systems: A systematic mapping study and cross-benchmark evaluation," Computer Science Review, vol. 39, Feb. 2021, Art. no. 100357.

J. Edward and M. M. Rosli, "A Systematic Mapping Study on Ensemble-Based Classifier," in 2021 IEEE International Conference on Computing (ICOCO), Nov. 2021, pp. 43–48.

S. Anwar, M. Mohd Rosli, and N. A. S. Abdullah, "Classification of Fault Prediction: A Mapping Study," Pertanika Journal of Science and Technology, vol. 30, pp. 2157–2171, May 2022.

M. Maw, V. Balakrishnan, O. Rana, and S. D. Ravana, "Trends and Patterns of Text Classification Techniques: A Systematic Mapping Sudy," Malaysian Journal of Computer Science, vol. 33, no. 2, pp. 102–117, Apr. 2020.

K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, "Systematic Mapping Studies in Software Engineering," Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, vol. 17, Jun. 2008.

K. Petersen, S. Vakkalanka, and L. Kuzniarz, "Guidelines for conducting systematic mapping studies in software engineering: An update," Information and Software Technology, vol. 64, pp. 1–18, Aug. 2015.

P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, "Lessons from applying the systematic literature review process within the software engineering domain," Journal of Systems and Software, vol. 80, no. 4, pp. 571–583, Apr. 2007.

CodingLifeV, "CodingLifeV/MultiClassSMS." Mar. 05, 2024, [Online]. Available: https://github.com/CodingLifeV/MultiClassSMS.

Downloads

How to Cite

[1]
Wang, Y., Rosli, M.M., Musa, N. and Li, F. 2024. Multi-Class Imbalanced Data Classification: A Systematic Mapping Study. Engineering, Technology & Applied Science Research. 14, 3 (Jun. 2024), 14183–14190. DOI:https://doi.org/10.48084/etasr.7206.

Metrics

Abstract Views: 314
PDF Downloads: 405

Metrics Information

Most read articles by the same author(s)