Classification of Macromolecules Based on Amino Acid Sequences Using Deep Learning

Authors

  • S. Khan Department of Computer Science, National Chengchi University, Taiwan
  • I. Ali Department of Computer Science, University of Swat, Pakistan
  • F. Ghaffar System Design Engineering Department, University of Waterloo, Canada
  • Q. Mazhar-ul-Haq National Taipei University of Technology, Taiwan
Volume: 12 | Issue: 6 | Pages: 9491-9495 | December 2022 | https://doi.org/10.48084/etasr.5230

Abstract

The classification of amino acids and their sequence analysis plays a vital role in life sciences and is a challenging task. Deep learning models have well-established frameworks for solving a broad spectrum of complex learning problems compared to traditional machine learning techniques. This article uses and compares state-of-the-art deep learning models like Convolution Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) to solve macromolecule classification problems using amino acid sequences. The CNN extracts features from amino acid sequences, which are treated as vectors with the use of word embedding. These vectors are fed to the above-mentioned models to train robust classifiers. The results show that word2vec as embedding combined with VGG-16 performs better than LSTM and GRU. The proposed approach gets an error rate of 1.5%.

Keywords:

CNN, LSTM, macromolecules , amino acid

Downloads

Download data is not yet available.

References

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, May 2016, pp. 770-778. DOI: https://doi.org/10.1109/CVPR.2016.90

C.-L. Liu, Hsaio W.-H., and Tu Y.-C., "Time series classification with multivariate convolutional neural network," IEEE Transactions on Industrial Electronics, vol. 66, no. 6, pp. 4788-4797, Aug. 2018. DOI: https://doi.org/10.1109/TIE.2018.2864702

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 2818–2826. DOI: https://doi.org/10.1109/CVPR.2016.308

Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," in The handbook of brain theory and neural networks, Cambridge, MA, USA: MIT Press, 1998, pp. 255–258.

S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling." arXiv, Dec. 11, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84–90, Feb. 2017. DOI: https://doi.org/10.1145/3065386

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Aug. 1998. DOI: https://doi.org/10.1109/5.726791

Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015. DOI: https://doi.org/10.1038/nature14539

M. Hussain, J. J. Bird, and D. R. Faria, "A Study on CNN Transfer Learning for Image Classification," in Advances in Computational Intelligence Systems, 2019, pp. 191–202. DOI: https://doi.org/10.1007/978-3-319-97982-3_16

T. K. Lee and T. Nguyen, "Protein Family Classification with Neural Networks," [Online]. Available: https://cs224d.stanford.edu/reports/LeeNguyen.pdf.

J. Hou, B. Adhikari, and J. Cheng, "DeepSF: deep convolutional neural network for mapping protein sequences to folds," Bioinformatics, vol. 34, no. 8, pp. 1295–1303, Apr. 2018. DOI: https://doi.org/10.1093/bioinformatics/btx780

N. G. Nguyen et al., "DNA Sequence Classification by Convolutional Neural Network," Journal of Biomedical Science and Engineering, vol. 9, no. 5, pp. 280–286, Apr. 2016. DOI: https://doi.org/10.4236/jbise.2016.95021

I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks." arXiv, Dec. 14, 2014.

D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, May 19, 2016.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Red Hook, NY, USA, Sep. 2013, pp. 3111–3119.

A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, "FastText.zip: Compressing text classification models." arXiv, Dec. 12, 2016.

J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Jul. 2014, pp. 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv, Apr. 10, 2015.

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, "Snapshot Ensembles: Train 1, get M for free." arXiv, Mar. 31, 2017.

R. P. D. Bank, "RCSB PDB: Homepage," Protein Data Bank. https://www.rcsb.org/.

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. DOI: https://doi.org/10.1023/A:1010933404324

Downloads

How to Cite

[1]
Khan, S., Ali, I., Ghaffar, F. and Mazhar-ul-Haq, Q. 2022. Classification of Macromolecules Based on Amino Acid Sequences Using Deep Learning. Engineering, Technology & Applied Science Research. 12, 6 (Dec. 2022), 9491–9495. DOI:https://doi.org/10.48084/etasr.5230.

Metrics

Abstract Views: 729
PDF Downloads: 622

Metrics Information

Most read articles by the same author(s)