Leukemia Diagnosis using Machine Learning Classifiers based on MRMR Feature Selection
Received: 2 May 2024 | Revised: 28 May 2024 | Accepted: 6 June 2024 | Online: 15 June 2024
Corresponding author: Walat A. Ahmed
Abstract
Early and accurate diagnosis of leukemia is crucial for effective treatment. Machine Learning (ML) offers promising tools for leukemia diagnosis classification, but the required high-dimensional datasets pose challenges. This study explores the effectiveness of ML algorithms for leukemia disease classification and investigates the impact of feature selection with the Minimum Redundancy Maximum Relevance (MRMR ) technique. MRMR was implemented to select informative features and evaluate four ML algorithms (Naïve Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Artificial Neural Networks (ANNs)) using feature subsets with varying levels of relevance based on MRMR scores. Our results demonstrate that MRMR effectively reduced dimensionality while maintaining and even improving classification accuracy. KNN and SVM achieved the highest accuracy (100% for 67, 30, and 24 feature subsets), suggesting the benefit of focusing on highly relevant features. NB exhibited consistent accuracy across all feature sets.
Keywords:
machine learning, feature selection, KNN, SVM, leukemia, ANN, Naïve BayesDownloads
References
M. Javaid, A. Haleem, R. Pratap Singh, R. Suman, and S. Rab, "Significance of machine learning in healthcare: Features, pillars and applications," International Journal of Intelligent Networks, vol. 3, pp. 58–73, Jan. 2022.
M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255–260, Jul. 2015.
K. Theofilatos, S. Likothanassis, and A. Karathanasopoulos, "Modeling and Trading the EUR/USD Exchange Rate Using Machine Learning Techniques," Engineering, Technology & Applied Science Research, vol. 2, no. 5, pp. 269–272, Oct. 2012.
C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185–205, Apr. 2005.
T. Haferlach et al., "Clinical Utility of Microarray-Based Gene Expression Profiling in the Diagnosis and Subclassification of Leukemia: Report From the International Microarray Innovations in Leukemia Study Group," Journal of Clinical Oncology, vol. 28, no. 15, pp. 2529–2537, May 2010.
V. A. Rajendran and S. Shanmugam, "Automated Skin Cancer Detection and Classification using Cat Swarm Optimization with a Deep Learning Model," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12734–12739, Feb. 2024.
T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531–537, Oct. 1999.
Y. L. Ng, X. Jiang, Y. Zhang, S. B. Shin, and R. Ning, "Automated Activity Recognition with Gait Positions Using Machine Learning Algorithms," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4554–4560, Aug. 2019.
S. Dasariraju, M. Huo, and S. McCalla, "Detection and Classification of Immature Leukocytes for Diagnosis of Acute Myeloid Leukemia Using Random Forest Algorithm," Bioengineering, vol. 7, no. 4, Dec. 2020, Art. no. 120.
P. M. Gumble and S. V. Rode, "Analysis & Classification of Acute Lymphoblastic Leukemia using KNN Algorithm," International Journal on Recent and Innovation Trends in Computing and Communication, vol. 5, no. 2, pp. 94–98, 2017.
U. K. Dey and Md. S. Islam, "Genetic Expression Analysis To Detect Type Of Leukemia Using Machine Learning," in 1st International Conference on Advances in Science, Engineering and Robotics Technology, Dhaka, Bangladesh, Dec. 2019, pp. 1–6.
P. K. Mallick, S. K. Mohapatra, G.-S. Chae, and M. N. Mohanty, "Convergent learning–based model for leukemia classification from gene expression," Personal and Ubiquitous Computing, vol. 27, no. 3, pp. 1103–1110, Jun. 2023.
M. Ilyas, K. M. Aamir, S. Manzoor, and M. Deriche, "Linear programming based computational technique for leukemia classification using gene expression profile," PLOS ONE, vol. 18, no. 10, Sep. 2023, Art. no. e0292172.
K. A. Kadhim, F. H. Najjar, A. A. Waad, I. H. Al-Kharsan, Z. N. Khudhair, and A. A. Salim, "Leukemia Classification using a Convolutional Neural Network of AML Images," Malaysian Journal of Fundamental and Applied Sciences, vol. 19, no. 3, pp. 306–312, May 2023.
F. M. Talaat and S. A. Gamel, "Machine learning in detection and classification of leukemia using C-NMC_Leukemia," Multimedia Tools and Applications, vol. 83, no. 3, pp. 8063–8076, Jan. 2024.
E. Purwanti and E. Calista, "Detection of acute lymphocyte leukemia using k-nearest neighbor algorithm based on shape and histogram features," Journal of Physics: Conference Series, vol. 853, no. 1, Feb. 2017, Art. no. 012011.
S. Kumar, S. Mishra, P. Asthana, and Pragya, "Automated Detection of Acute Leukemia Using K-mean Clustering Algorithm," in Advances in Computer and Computational Sciences, S. K. Bhatia, K. K. Mishra, S. Tiwari, and V. K. Singh, Eds. New York, NY, USA: Springer, 2018, pp. 655–670.
V. R. Minciacchi, R. Kumar, and D. S. Krause, "Chronic Myeloid Leukemia: A Model Disease of the Past, Present and Future," Cells, vol. 10, no. 1, Jan. 2021, Art. no. 117.
H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, Aug. 2005.
I. Rish, "An Empirical Study of the Naive Bayes Classifier," IBM, IBM Research Report RC 22230 (W0111-014), Nov. 2001.
P. Cunningham and S. J. Delany, "k-Nearest Neighbour Classifiers - A Tutorial," ACM Computing Surveys, vol. 54, no. 6, Apr. 2021, Art. no. 128.
S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, "Applications of Support Vector Machine (SVM) Learning in Cancer Genomics," Cancer Genomics & Proteomics, vol. 15, no. 1, pp. 41–51, Jan. 2018.
S. Shanmuganathan, "Artificial Neural Network Modelling: An Introduction," in Artificial Neural Network Modelling, S. Shanmuganathan and S. Samarasinghe, Eds. New York, NY, USA: Springer, 2016, pp. 1–14.
W. Samek and K.-R. Muller, "Towards Explainable Artificial Intelligence," in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K.-R. Muller, Eds. New York, NY, USA: Springer, 2019, pp. 5–22.
L. Zhang et al., "A review of machine learning in building load prediction," Applied Energy, vol. 285, Mar. 2021, Art. no. 116452.
L. A. Yates, Z. Aandahl, S. A. Richards, and B. W. Brook, "Cross validation for model selection: A review with examples from ecology," Ecological Monographs, vol. 93, no. 1, 2023, Art. no. e1557.
S. Aljawarneh, M. B. Yassein, and M. Aljundi, "An enhanced J48 classification algorithm for the anomaly intrusion detection systems," Cluster Computing, vol. 22, no. 5, pp. 10549–10565, Sep. 2019.
N. Bibi, M. Sikandar, I. Ud Din, A. Almogren, and S. Ali, "IoMT-Based Automated Detection and Classification of Leukemia Using Deep Learning," Journal of Healthcare Engineering, vol. 2020, no. 1, 2020, Art. no. 6648574.
Downloads
How to Cite
License
Copyright (c) 2024 Sipan M. Hameed, Walat A. Ahmed, Masood A. Othman
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.