Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms
Received: 29 October 2024 | Revised: 9 November 2024 | Accepted: 8 December 2024 | Online: 13 December 2024
Corresponding author: Taoufik Saidani
Abstract
The deployment of deep learning models on resource-constrained embedded platforms presents significant challenges due to limited computational power, memory, and energy efficiency. To address this issue, this study proposes a novel quantization method tailored to accelerate object detection using a quantized version of the YOLOv5m model, called Q_YOLOv5m. This method reduces the model's computational complexity and memory footprint, allowing for faster inference and lower power consumption, making it ideal for real-time applications on embedded systems. This approach incorporates advanced weight and activation quantization techniques to balance performance with accuracy, dynamically adjusting precision based on hardware capabilities. The efficacy of Q_YOLOv5m was confirmed, exhibiting substantial enhancements in inference speed and a reduction in model size with negligible loss in object detection accuracy. The findings underscore the capability of Q_YOLOv5m for edge applications, including autonomous vehicles, intelligent surveillance, and IoT-based monitoring systems.
Keywords:
object detection, quantization, embedded systems, deep learningDownloads
References
A. Dhillon and G. K. Verma, "Convolutional neural network: a review of models, methodologies and applications to object detection," Progress in Artificial Intelligence, vol. 9, no. 2, pp. 85–112, Jun. 2020.
A. Lopes, F. Pereira dos Santos, D. de Oliveira, M. Schiezaro, and H. Pedrini, "Computer Vision Model Compression Techniques for Embedded Systems:A Survey," Computers & Graphics, vol. 123, Oct. 2024, Art. no. 104015.
A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, "A Survey of Quantization Methods for Efficient Neural Network Inference," in Low-Power Computer Vision, Chapman and Hall/CRC, 2022.
B. Yao, L. Liu, Y. Peng, and X. Peng, "Intelligent Measurement on Edge Devices Using Hardware Memory-Aware Joint Compression Enabled Neural Networks," IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–13, 2024.
H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, "Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation." arXiv, Apr. 20, 2020.
J. Gorospe, R. Mulero, O. Arbelaitz, J. Muguerza, and M. Á. Antón, "A Generalization Performance Study Using Deep Learning Networks in Embedded Systems," Sensors, vol. 21, no. 4, Jan. 2021, Art. no. 1031.
P. Xiao, C. Zhang, Q. Guo, X. Xiao, and H. Wang, "Neural Networks Integer Computation: Quantizing Convolutional Neural Networks of Inference and Training for Object Detection in Embedded Systems," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 15862–15884, 2024.
M. A. Hanif and M. Shafique, "Cross-Layer Optimizations for Efficient Deep Learning Inference at the Edge," in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Software Optimizations and Hardware/Software Codesign, S. Pasricha and M. Shafique, Eds. Cham: Springer Nature Switzerland, 2024, pp. 225–248.
M. Wang et al., "Q-YOLO: Efficient Inference for Real-Time Object Detection," in Pattern Recognition, Kitakyushu, Japan, Nov. 2023, pp. 307–321.
T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, "Pruning and quantization for deep neural network acceleration: A survey," Neurocomputing, vol. 461, pp. 370–403, Oct. 2021.
B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun. 2018, pp. 2704–2713.
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," in Computer Vision – ECCV 2016, Amsterdam, The Netherlands, 2016, pp. 525–542.
J. Y. Li, Y. K. Zhao, Z. E. Xue., Z. Cai, and Q. Li., "A survey of model compression for deep neural networks," Chinese Journal of Engineering, vol. 41, no. 10, pp. 1229–1239, Oct. 2019.
P.-E. Novac, G. Boukli Hacene, A. Pegatoquet, B. Miramond, and V. Gripon, "Quantization and Deployment of Deep Neural Networks on Microcontrollers," Sensors, vol. 21, no. 9, Jan. 2021, Art. no. 2984.
A. Polino, R. Pascanu, and D. Alistarh, "Model compression via distillation and quantization." arXiv, Feb. 15, 2018.
Y. Ding et al., "Towards Accurate Post-Training Quantization for Vision Transformer," in Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, Oct. 2022, pp. 5380–5388.
M. Li et al., "Contemporary Advances in Neural Network Quantization: A Survey," in 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, Jun. 2024, pp. 1–10.
R. Zhang and A. C. S. Chung, "EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation," Medical Image Analysis, vol. 97, Oct. 2024, Art. no. 103277.
T. Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Computer Vision – ECCV 2014, Zurich, Switzerland, 2014, pp. 740–755.
T. Saidani, R. Ghodhbani, A. Alhomoud, A. Alshammari, H. Zayani, and M. B. Ammar, "Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 13066–13071, Feb. 2024.
R. Ghodhbani, T. Saidani, and H. Zayeni, "Deploying deep learning networks based advanced techniques for image processing on FPGA platform," Neural Computing and Applications, vol. 35, no. 26, pp. 18949–18969, Sep. 2023.
Downloads
How to Cite
License
Copyright (c) 2024 Nizal Alshammry, Taoufik Saidani, Nasser S. Albalawi, Sami Mohammed Alenezi, Fahd Alhamazani, Sami Aziz Alshammari, Mohammed Aleinzi, Abdulaziz Alanazi, Mahmoud Salaheldin Elsayed
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.