A Deep Ensemble Gene Selection and Attention-guided Classification Framework for Robust Cancer Diagnosis from Microarray Data
Received: 2 November 2024 | Revised: 6 December 2024, 17 December 2024, and 20 December 2024 | Accepted: 22 December 2024 | Online: 2 February 2025
Corresponding author: Sara Haddou Bouazza
Abstract
Microarray technology has enabled unprecedented insight into cancer diagnosis through large-scale gene expression analysis. However, the high dimensionality and complexity of microarray datasets pose significant challenges, as only a small subset of genes is typically informative, with the remainder introducing noise and complicating classification. Traditional gene selection methods, including filter, wrapper, and hybrid techniques, have achieved promising results but often fail to capture complex gene interactions, suffer from computational inefficiencies, or lack interpretability. This study presents DEGS-AGC (Deep Ensemble Gene Selection and Attention-Guided Classification), a novel integrated framework for gene selection and classification. DEGS-AGC is designed to address these limitations through two primary components: Deep Ensemble Gene Selection (DEGS), which leverages ensemble learning with Random Forest, XGBoost, and Deep Neural Networks to select relevant genes while reducing redundancy via sparse autoencoders, and Attention-Guided Classification (AGC), where an attention mechanism dynamically assigns weights to genes to improve interpretability and classification precision. The DEGS-AGC framework was evaluated against traditional methods, using consistent classification models for robust comparisons. Evaluation metrics demonstrated the potential of DEGS-AGC as an effective tool for high-dimensional biomedical data analysis. The results highlighted the ability of DEGS-AGC to offer accurate, interpretable, and computationally feasible solutions for cancer diagnosis, advancing the development of data-driven personalized approaches in healthcare.
Keywords:
machine learning, cancer classification, data mining, pattern recognition, feature selectionDownloads
References
M. J. Heller, "DNA Microarray Technology: Devices, Systems, and Applications," Annual Review of Biomedical Engineering, vol. 4, no. Volume 4, 2002, pp. 129–153, Aug. 2002.
M. Gabig and G. Wegrzyn, "An introduction to DNA chips: principles, technology, applications and analysis," Acta biochimica Polonica, vol. 48, no. 3, pp. 615–622, Jan. 2001.
G. M. Frampton et al., "Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing," Nature Biotechnology, vol. 31, no. 11, pp. 1023–1031, Nov. 2013.
J.-Q. Fan et al., "Fecal microbial biomarkers combined with multi-target stool DNA test improve diagnostic accuracy for colorectal cancer," World Journal of Gastrointestinal Oncology, vol. 15, no. 8, pp. 1424–1435, Aug. 2023.
Z. Sha, L. Zhu, Z. Jiang, Y. Chen, and T. Hu, "How complex is the microarray dataset? A novel data complexity metric for biological high-dimensional microarray data." arXiv, Aug. 12, 2023.
P. A. Futreal et al., "A census of human cancer genes," Nature Reviews Cancer, vol. 4, no. 3, pp. 177–183, Mar. 2004.
N. Spolaôr, E. A. Cherman, M. C. Monard, and H. D. Lee, "Filter Approach Feature Selection Methods to Support Multi-label Learning Based on ReliefF and Information Gain," in Advances in Artificial Intelligence - SBIA 2012, Curitiba, Brazil, 2012, pp. 72–81.
H. Liu and R. Setiono, "Feature Selection and Classification – A Probabilistic Wrapper Approach," in Industrial and Engineering Applications or Artificial Intelligence and Expert Systems, CRC Press, 1997.
A. Got, A. Moussaoui, and D. Zouache, "Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach," Expert Systems with Applications, vol. 183, Nov. 2021, Art. no. 115312.
W. Huo, W. Li, Z. Zhang, C. Sun, F. Zhou, and G. Gong, "Performance prediction of proton-exchange membrane fuel cell based on convolutional neural network and random forest feature selection," Energy Conversion and Management, vol. 243, Sep. 2021, Art. no. 114367.
V. S. Desdhanty and Z. Rustam, "Liver Cancer Classification Using Random Forest and Extreme Gradient Boosting (XGBoost) with Genetic Algorithm as Feature Selection," in 2021 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, Dec. 2021, pp. 716–719.
M. A. Khan et al., "Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists," Diagnostics, vol. 10, no. 8, Aug. 2020, Art. no. 565.
M. Al-Rajab, J. Lu, and Q. Xu, "A framework model using multifilter feature selection to enhance colon cancer classification," PLOS ONE, vol. 16, no. 4, 2021, Art. no. e0249094.
Z. Niu, G. Zhong, and H. Yu, "A review on the attention mechanism of deep learning," Neurocomputing, vol. 452, pp. 48–62, Sep. 2021.
Y. Imrana, Y. Xiang, L. Ali, and Z. Abdul-Rauf, "A bidirectional LSTM deep learning approach for intrusion detection," Expert Systems with Applications, vol. 185, Dec. 2021, Art. no. 115524.
R. M. Devadas, V. Hiremani, J. P. Gujjar, N. S. Rani, and K. R. Bhavya, "Innovative Fusion: Attention-Augmented Support Vector Machines for Superior Text Classification for Social Marketing," in Advances in Data Analytics for Influencer Marketing: An Interdisciplinary Approach, S. Dutta, Á. Rocha, P. K. Dutta, P. Bhattacharya, and R. Singh, Eds. Springer Nature Switzerland, 2024, pp. 283–303.
S. Buchaiah and P. Shakya, "Bearing fault diagnosis and prognosis using data fusion based feature extraction and feature selection," Measurement, vol. 188, Jan. 2022, Art. no. 110506.
H. Zhou, X. Wang, and R. Zhu, "Feature selection based on mutual information with correlation coefficient," Applied Intelligence, vol. 52, no. 5, pp. 5457–5474, Mar. 2022.
H. A. Owida, A. Al-Ghraibah, and M. Altayeb, "Classification of Chest X-Ray Images using Wavelet and MFCC Features and Support Vector Machine Classifier," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7296–7301, Aug. 2021.
M. B. Ayed, "Balanced Communication-Avoiding Support Vector Machine when Detecting Epilepsy based on EEG Signals," Engineering, Technology & Applied Science Research, vol. 10, no. 6, pp. 6462–6468, Dec. 2020.
A. Naz, H. Khan, I. U. Din, A. Ali, and M. Husain, "An Efficient Optimization System for Early Breast Cancer Diagnosis based on Internet of Medical Things and Deep Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15957–15962, Aug. 2024.
A. Bekkouche, M. Merzoug, M. Hadjila, and W. Ferhi, "Towards Early Breast Cancer Detection: A Deep Learning Approach," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17517–17523, Oct. 2024.
S. T. Vemula, M. Sreevani, P. Rajarajeswari, K. Bhargavi, J. M. R. S. Tavares, and S. Alankritha, "Deep Learning Techniques for Lung Cancer Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 14916–14922, Aug. 2024.
T. Imran, A. S. Alghamdi, and M. S. Alkatheiri, "Enhanced Skin Cancer Classification using Deep Learning and Nature-based Feature Optimization," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12702–12710, Feb. 2024.
M. J. Ghrabat et al., "Utilizing Machine Learning for the Early Detection of Coronary Heart Disease," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17363–17375, Oct. 2024.
T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531–537, Oct. 1999.
D. Singh et al., "Gene expression correlates of clinical prostate cancer behavior," Cancer Cell, vol. 1, no. 2, pp. 203–209, Mar. 2002.
A. M. Alshareef et al., "Optimal Deep Learning Enabled Prostate Cancer Detection Using Microarray Gene Expression," Journal of Healthcare Engineering, vol. 2022, no. 1, 2022, Art. no. 7364704.
A. Razzaque and D. A. Badholia, "PCA based feature extraction and MPSO based feature selection for gene expression microarray medical data classification," Measurement: Sensors, vol. 31, Feb. 2024, Art. no. 100945.
M. Vatankhah and M. Momenzadeh, "Self-regularized Lasso for selection of most informative features in microarray cancer classification," Multimedia Tools and Applications, vol. 83, no. 2, pp. 5955–5970, Jan. 2024.
S. M. Hameed, W. A. Ahmed, and M. A. Othman, "Leukemia Diagnosis using Machine Learning Classifiers based on MRMR Feature Selection," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15614–15619, Aug. 2024.
Md. Alamgir Sarder, Md. Maniruzzaman, and B. Ahammed, "Feature Selection and Classification of Leukemia Cancer Using Machine Learning Techniques," Machine Learning Research, vol. 5, no. 2, 2020, Art. no. 18.
J. B. Awotunde et al., "An Enhanced Hyper-Parameter Optimization of a Convolutional Neural Network Model for Leukemia Cancer Diagnosis in a Smart Healthcare System," Sensors, vol. 22, no. 24, Jan. 2022, Art. no. 9689.
G. Dagnew and B. h. Shekar, "Ensemble learning-based classification of microarray cancer data on tree-based features," Cognitive Computation and Systems, vol. 3, no. 1, pp. 48–60, 2021.
Downloads
How to Cite
License
Copyright (c) 2025 Sara Haddou Bouazza

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.