Leukemia Diagnosis using Machine Learning Classifiers based on MRMR Feature Selection

Authors

  • Sipan M. Hameed Computer Information Systems Department, Duhok Polytechnique University, Iraq
  • Walat A. Ahmed Computer Information Systems Department, Duhok Polytechnique University, Iraq
  • Masood A. Othman Computer Information Systems Department, Duhok Polytechnique University, Iraq
Volume: 14 | Issue: 4 | Pages: 15614-15619 | August 2024 | https://doi.org/10.48084/etasr.7720

Abstract

Early and accurate diagnosis of leukemia is crucial for effective treatment. Machine Learning (ML) offers promising tools for leukemia diagnosis classification, but the required high-dimensional datasets pose challenges. This study explores the effectiveness of ML algorithms for leukemia disease classification and investigates the impact of feature selection with the Minimum Redundancy Maximum Relevance (MRMR ) technique. MRMR was implemented to select informative features and evaluate four ML algorithms (Naïve Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Artificial Neural Networks (ANNs)) using feature subsets with varying levels of relevance based on MRMR scores. Our results demonstrate that MRMR effectively reduced dimensionality while maintaining and even improving classification accuracy. KNN and SVM achieved the highest accuracy (100% for 67, 30, and 24 feature subsets), suggesting the benefit of focusing on highly relevant features. NB exhibited consistent accuracy across all feature sets.

Keywords:

machine learning, feature selection, KNN, SVM, leukemia, ANN, Naïve Bayes

Downloads

Download data is not yet available.

References

M. Javaid, A. Haleem, R. Pratap Singh, R. Suman, and S. Rab, "Significance of machine learning in healthcare: Features, pillars and applications," International Journal of Intelligent Networks, vol. 3, pp. 58–73, Jan. 2022.

M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255–260, Jul. 2015.

K. Theofilatos, S. Likothanassis, and A. Karathanasopoulos, "Modeling and Trading the EUR/USD Exchange Rate Using Machine Learning Techniques," Engineering, Technology & Applied Science Research, vol. 2, no. 5, pp. 269–272, Oct. 2012.

C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185–205, Apr. 2005.

T. Haferlach et al., "Clinical Utility of Microarray-Based Gene Expression Profiling in the Diagnosis and Subclassification of Leukemia: Report From the International Microarray Innovations in Leukemia Study Group," Journal of Clinical Oncology, vol. 28, no. 15, pp. 2529–2537, May 2010.

V. A. Rajendran and S. Shanmugam, "Automated Skin Cancer Detection and Classification using Cat Swarm Optimization with a Deep Learning Model," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12734–12739, Feb. 2024.

T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531–537, Oct. 1999.

Y. L. Ng, X. Jiang, Y. Zhang, S. B. Shin, and R. Ning, "Automated Activity Recognition with Gait Positions Using Machine Learning Algorithms," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4554–4560, Aug. 2019.

S. Dasariraju, M. Huo, and S. McCalla, "Detection and Classification of Immature Leukocytes for Diagnosis of Acute Myeloid Leukemia Using Random Forest Algorithm," Bioengineering, vol. 7, no. 4, Dec. 2020, Art. no. 120.

P. M. Gumble and S. V. Rode, "Analysis & Classification of Acute Lymphoblastic Leukemia using KNN Algorithm," International Journal on Recent and Innovation Trends in Computing and Communication, vol. 5, no. 2, pp. 94–98, 2017.

U. K. Dey and Md. S. Islam, "Genetic Expression Analysis To Detect Type Of Leukemia Using Machine Learning," in 1st International Conference on Advances in Science, Engineering and Robotics Technology, Dhaka, Bangladesh, Dec. 2019, pp. 1–6.

P. K. Mallick, S. K. Mohapatra, G.-S. Chae, and M. N. Mohanty, "Convergent learning–based model for leukemia classification from gene expression," Personal and Ubiquitous Computing, vol. 27, no. 3, pp. 1103–1110, Jun. 2023.

M. Ilyas, K. M. Aamir, S. Manzoor, and M. Deriche, "Linear programming based computational technique for leukemia classification using gene expression profile," PLOS ONE, vol. 18, no. 10, Sep. 2023, Art. no. e0292172.

K. A. Kadhim, F. H. Najjar, A. A. Waad, I. H. Al-Kharsan, Z. N. Khudhair, and A. A. Salim, "Leukemia Classification using a Convolutional Neural Network of AML Images," Malaysian Journal of Fundamental and Applied Sciences, vol. 19, no. 3, pp. 306–312, May 2023.

F. M. Talaat and S. A. Gamel, "Machine learning in detection and classification of leukemia using C-NMC_Leukemia," Multimedia Tools and Applications, vol. 83, no. 3, pp. 8063–8076, Jan. 2024.

E. Purwanti and E. Calista, "Detection of acute lymphocyte leukemia using k-nearest neighbor algorithm based on shape and histogram features," Journal of Physics: Conference Series, vol. 853, no. 1, Feb. 2017, Art. no. 012011.

S. Kumar, S. Mishra, P. Asthana, and Pragya, "Automated Detection of Acute Leukemia Using K-mean Clustering Algorithm," in Advances in Computer and Computational Sciences, S. K. Bhatia, K. K. Mishra, S. Tiwari, and V. K. Singh, Eds. New York, NY, USA: Springer, 2018, pp. 655–670.

V. R. Minciacchi, R. Kumar, and D. S. Krause, "Chronic Myeloid Leukemia: A Model Disease of the Past, Present and Future," Cells, vol. 10, no. 1, Jan. 2021, Art. no. 117.

H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, Aug. 2005.

I. Rish, "An Empirical Study of the Naive Bayes Classifier," IBM, IBM Research Report RC 22230 (W0111-014), Nov. 2001.

P. Cunningham and S. J. Delany, "k-Nearest Neighbour Classifiers - A Tutorial," ACM Computing Surveys, vol. 54, no. 6, Apr. 2021, Art. no. 128.

S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, "Applications of Support Vector Machine (SVM) Learning in Cancer Genomics," Cancer Genomics & Proteomics, vol. 15, no. 1, pp. 41–51, Jan. 2018.

S. Shanmuganathan, "Artificial Neural Network Modelling: An Introduction," in Artificial Neural Network Modelling, S. Shanmuganathan and S. Samarasinghe, Eds. New York, NY, USA: Springer, 2016, pp. 1–14.

W. Samek and K.-R. Muller, "Towards Explainable Artificial Intelligence," in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K.-R. Muller, Eds. New York, NY, USA: Springer, 2019, pp. 5–22.

L. Zhang et al., "A review of machine learning in building load prediction," Applied Energy, vol. 285, Mar. 2021, Art. no. 116452.

L. A. Yates, Z. Aandahl, S. A. Richards, and B. W. Brook, "Cross validation for model selection: A review with examples from ecology," Ecological Monographs, vol. 93, no. 1, 2023, Art. no. e1557.

S. Aljawarneh, M. B. Yassein, and M. Aljundi, "An enhanced J48 classification algorithm for the anomaly intrusion detection systems," Cluster Computing, vol. 22, no. 5, pp. 10549–10565, Sep. 2019.

N. Bibi, M. Sikandar, I. Ud Din, A. Almogren, and S. Ali, "IoMT-Based Automated Detection and Classification of Leukemia Using Deep Learning," Journal of Healthcare Engineering, vol. 2020, no. 1, 2020, Art. no. 6648574.

Downloads

How to Cite

[1]
Hameed, S.M., Ahmed, W.A. and Othman, M.A. 2024. Leukemia Diagnosis using Machine Learning Classifiers based on MRMR Feature Selection. Engineering, Technology & Applied Science Research. 14, 4 (Aug. 2024), 15614–15619. DOI:https://doi.org/10.48084/etasr.7720.

Metrics

Abstract Views: 270
PDF Downloads: 352

Metrics Information