A Deep Ensemble Gene Selection and Attention-guided Classification Framework for Robust Cancer Diagnosis from Microarray Data

Authors

  • Sara Haddou Bouazza Haddou Bouazza
Volume: 15 | Issue: 1 | Pages: 20235-20241 | February 2025 | https://doi.org/10.48084/etasr.9476

Abstract

Microarray technology has enabled unprecedented insight into cancer diagnosis through large-scale gene expression analysis. However, the high dimensionality and complexity of microarray datasets pose significant challenges, as only a small subset of genes is typically informative, with the remainder introducing noise and complicating classification. Traditional gene selection methods, including filter, wrapper, and hybrid techniques, have achieved promising results but often fail to capture complex gene interactions, suffer from computational inefficiencies, or lack interpretability. This study presents DEGS-AGC (Deep Ensemble Gene Selection and Attention-Guided Classification), a novel integrated framework for gene selection and classification. DEGS-AGC is designed to address these limitations through two primary components: Deep Ensemble Gene Selection (DEGS), which leverages ensemble learning with Random Forest, XGBoost, and Deep Neural Networks to select relevant genes while reducing redundancy via sparse autoencoders, and Attention-Guided Classification (AGC), where an attention mechanism dynamically assigns weights to genes to improve interpretability and classification precision. The DEGS-AGC framework was evaluated against traditional methods, using consistent classification models for robust comparisons. Evaluation metrics demonstrated the potential of DEGS-AGC as an effective tool for high-dimensional biomedical data analysis. The results highlighted the ability of DEGS-AGC to offer accurate, interpretable, and computationally feasible solutions for cancer diagnosis, advancing the development of data-driven personalized approaches in healthcare.

Keywords:

machine learning, cancer classification, data mining, pattern recognition, feature selection

Downloads

Download data is not yet available.

References

M. J. Heller, "DNA Microarray Technology: Devices, Systems, and Applications," Annual Review of Biomedical Engineering, vol. 4, no. Volume 4, 2002, pp. 129–153, Aug. 2002.

M. Gabig and G. Wegrzyn, "An introduction to DNA chips: principles, technology, applications and analysis," Acta biochimica Polonica, vol. 48, no. 3, pp. 615–622, Jan. 2001.

G. M. Frampton et al., "Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing," Nature Biotechnology, vol. 31, no. 11, pp. 1023–1031, Nov. 2013.

J.-Q. Fan et al., "Fecal microbial biomarkers combined with multi-target stool DNA test improve diagnostic accuracy for colorectal cancer," World Journal of Gastrointestinal Oncology, vol. 15, no. 8, pp. 1424–1435, Aug. 2023.

Z. Sha, L. Zhu, Z. Jiang, Y. Chen, and T. Hu, "How complex is the microarray dataset? A novel data complexity metric for biological high-dimensional microarray data." arXiv, Aug. 12, 2023.

P. A. Futreal et al., "A census of human cancer genes," Nature Reviews Cancer, vol. 4, no. 3, pp. 177–183, Mar. 2004.

N. Spolaôr, E. A. Cherman, M. C. Monard, and H. D. Lee, "Filter Approach Feature Selection Methods to Support Multi-label Learning Based on ReliefF and Information Gain," in Advances in Artificial Intelligence - SBIA 2012, Curitiba, Brazil, 2012, pp. 72–81.

H. Liu and R. Setiono, "Feature Selection and Classification – A Probabilistic Wrapper Approach," in Industrial and Engineering Applications or Artificial Intelligence and Expert Systems, CRC Press, 1997.

A. Got, A. Moussaoui, and D. Zouache, "Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach," Expert Systems with Applications, vol. 183, Nov. 2021, Art. no. 115312.

W. Huo, W. Li, Z. Zhang, C. Sun, F. Zhou, and G. Gong, "Performance prediction of proton-exchange membrane fuel cell based on convolutional neural network and random forest feature selection," Energy Conversion and Management, vol. 243, Sep. 2021, Art. no. 114367.

V. S. Desdhanty and Z. Rustam, "Liver Cancer Classification Using Random Forest and Extreme Gradient Boosting (XGBoost) with Genetic Algorithm as Feature Selection," in 2021 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, Dec. 2021, pp. 716–719.

M. A. Khan et al., "Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists," Diagnostics, vol. 10, no. 8, Aug. 2020, Art. no. 565.

M. Al-Rajab, J. Lu, and Q. Xu, "A framework model using multifilter feature selection to enhance colon cancer classification," PLOS ONE, vol. 16, no. 4, 2021, Art. no. e0249094.

Z. Niu, G. Zhong, and H. Yu, "A review on the attention mechanism of deep learning," Neurocomputing, vol. 452, pp. 48–62, Sep. 2021.

Y. Imrana, Y. Xiang, L. Ali, and Z. Abdul-Rauf, "A bidirectional LSTM deep learning approach for intrusion detection," Expert Systems with Applications, vol. 185, Dec. 2021, Art. no. 115524.

R. M. Devadas, V. Hiremani, J. P. Gujjar, N. S. Rani, and K. R. Bhavya, "Innovative Fusion: Attention-Augmented Support Vector Machines for Superior Text Classification for Social Marketing," in Advances in Data Analytics for Influencer Marketing: An Interdisciplinary Approach, S. Dutta, Á. Rocha, P. K. Dutta, P. Bhattacharya, and R. Singh, Eds. Springer Nature Switzerland, 2024, pp. 283–303.

S. Buchaiah and P. Shakya, "Bearing fault diagnosis and prognosis using data fusion based feature extraction and feature selection," Measurement, vol. 188, Jan. 2022, Art. no. 110506.

H. Zhou, X. Wang, and R. Zhu, "Feature selection based on mutual information with correlation coefficient," Applied Intelligence, vol. 52, no. 5, pp. 5457–5474, Mar. 2022.

H. A. Owida, A. Al-Ghraibah, and M. Altayeb, "Classification of Chest X-Ray Images using Wavelet and MFCC Features and Support Vector Machine Classifier," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7296–7301, Aug. 2021.

M. B. Ayed, "Balanced Communication-Avoiding Support Vector Machine when Detecting Epilepsy based on EEG Signals," Engineering, Technology & Applied Science Research, vol. 10, no. 6, pp. 6462–6468, Dec. 2020.

A. Naz, H. Khan, I. U. Din, A. Ali, and M. Husain, "An Efficient Optimization System for Early Breast Cancer Diagnosis based on Internet of Medical Things and Deep Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15957–15962, Aug. 2024.

A. Bekkouche, M. Merzoug, M. Hadjila, and W. Ferhi, "Towards Early Breast Cancer Detection: A Deep Learning Approach," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17517–17523, Oct. 2024.

S. T. Vemula, M. Sreevani, P. Rajarajeswari, K. Bhargavi, J. M. R. S. Tavares, and S. Alankritha, "Deep Learning Techniques for Lung Cancer Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 14916–14922, Aug. 2024.

T. Imran, A. S. Alghamdi, and M. S. Alkatheiri, "Enhanced Skin Cancer Classification using Deep Learning and Nature-based Feature Optimization," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12702–12710, Feb. 2024.

M. J. Ghrabat et al., "Utilizing Machine Learning for the Early Detection of Coronary Heart Disease," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17363–17375, Oct. 2024.

T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531–537, Oct. 1999.

D. Singh et al., "Gene expression correlates of clinical prostate cancer behavior," Cancer Cell, vol. 1, no. 2, pp. 203–209, Mar. 2002.

A. M. Alshareef et al., "Optimal Deep Learning Enabled Prostate Cancer Detection Using Microarray Gene Expression," Journal of Healthcare Engineering, vol. 2022, no. 1, 2022, Art. no. 7364704.

A. Razzaque and D. A. Badholia, "PCA based feature extraction and MPSO based feature selection for gene expression microarray medical data classification," Measurement: Sensors, vol. 31, Feb. 2024, Art. no. 100945.

M. Vatankhah and M. Momenzadeh, "Self-regularized Lasso for selection of most informative features in microarray cancer classification," Multimedia Tools and Applications, vol. 83, no. 2, pp. 5955–5970, Jan. 2024.

S. M. Hameed, W. A. Ahmed, and M. A. Othman, "Leukemia Diagnosis using Machine Learning Classifiers based on MRMR Feature Selection," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15614–15619, Aug. 2024.

Md. Alamgir Sarder, Md. Maniruzzaman, and B. Ahammed, "Feature Selection and Classification of Leukemia Cancer Using Machine Learning Techniques," Machine Learning Research, vol. 5, no. 2, 2020, Art. no. 18.

J. B. Awotunde et al., "An Enhanced Hyper-Parameter Optimization of a Convolutional Neural Network Model for Leukemia Cancer Diagnosis in a Smart Healthcare System," Sensors, vol. 22, no. 24, Jan. 2022, Art. no. 9689.

G. Dagnew and B. h. Shekar, "Ensemble learning-based classification of microarray cancer data on tree-based features," Cognitive Computation and Systems, vol. 3, no. 1, pp. 48–60, 2021.

Downloads

How to Cite

[1]
Haddou Bouazza, S. 2025. A Deep Ensemble Gene Selection and Attention-guided Classification Framework for Robust Cancer Diagnosis from Microarray Data. Engineering, Technology & Applied Science Research. 15, 1 (Feb. 2025), 20235–20241. DOI:https://doi.org/10.48084/etasr.9476.

Metrics

Abstract Views: 28
PDF Downloads: 31

Metrics Information