Classification of Macromolecules Based on Amino Acid Sequences Using Deep Learning
Received: 31 July 2022 | Revised: 26 August 2022 | Accepted: 28 August 2022 | Online: 20 September 2022
Corresponding author: S. Khan
Abstract
The classification of amino acids and their sequence analysis plays a vital role in life sciences and is a challenging task. Deep learning models have well-established frameworks for solving a broad spectrum of complex learning problems compared to traditional machine learning techniques. This article uses and compares state-of-the-art deep learning models like Convolution Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) to solve macromolecule classification problems using amino acid sequences. The CNN extracts features from amino acid sequences, which are treated as vectors with the use of word embedding. These vectors are fed to the above-mentioned models to train robust classifiers. The results show that word2vec as embedding combined with VGG-16 performs better than LSTM and GRU. The proposed approach gets an error rate of 1.5%.
Keywords:
CNN, LSTM, macromolecules , amino acidDownloads
References
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, May 2016, pp. 770-778. DOI: https://doi.org/10.1109/CVPR.2016.90
C.-L. Liu, Hsaio W.-H., and Tu Y.-C., "Time series classification with multivariate convolutional neural network," IEEE Transactions on Industrial Electronics, vol. 66, no. 6, pp. 4788-4797, Aug. 2018. DOI: https://doi.org/10.1109/TIE.2018.2864702
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 2818–2826. DOI: https://doi.org/10.1109/CVPR.2016.308
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," in The handbook of brain theory and neural networks, Cambridge, MA, USA: MIT Press, 1998, pp. 255–258.
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling." arXiv, Dec. 11, 2014.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84–90, Feb. 2017. DOI: https://doi.org/10.1145/3065386
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Aug. 1998. DOI: https://doi.org/10.1109/5.726791
Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015. DOI: https://doi.org/10.1038/nature14539
M. Hussain, J. J. Bird, and D. R. Faria, "A Study on CNN Transfer Learning for Image Classification," in Advances in Computational Intelligence Systems, 2019, pp. 191–202. DOI: https://doi.org/10.1007/978-3-319-97982-3_16
T. K. Lee and T. Nguyen, "Protein Family Classification with Neural Networks," [Online]. Available: https://cs224d.stanford.edu/reports/LeeNguyen.pdf.
J. Hou, B. Adhikari, and J. Cheng, "DeepSF: deep convolutional neural network for mapping protein sequences to folds," Bioinformatics, vol. 34, no. 8, pp. 1295–1303, Apr. 2018. DOI: https://doi.org/10.1093/bioinformatics/btx780
N. G. Nguyen et al., "DNA Sequence Classification by Convolutional Neural Network," Journal of Biomedical Science and Engineering, vol. 9, no. 5, pp. 280–286, Apr. 2016. DOI: https://doi.org/10.4236/jbise.2016.95021
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks." arXiv, Dec. 14, 2014.
D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, May 19, 2016.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Red Hook, NY, USA, Sep. 2013, pp. 3111–3119.
A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, "FastText.zip: Compressing text classification models." arXiv, Dec. 12, 2016.
J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Jul. 2014, pp. 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162
K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv, Apr. 10, 2015.
G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, "Snapshot Ensembles: Train 1, get M for free." arXiv, Mar. 31, 2017.
R. P. D. Bank, "RCSB PDB: Homepage," Protein Data Bank. https://www.rcsb.org/.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. DOI: https://doi.org/10.1023/A:1010933404324
Downloads
How to Cite
License
Copyright (c) 2022 S. Khan, I. Ali, F. Ghaffar, Q. Mazhar-ul-Haq
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.