Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms

Nizal Alshammry; Taoufik Saidani; Nasser S. Albalawi; Sami Mohammed Alenezi; Fahd Alhamazani; Sami Aziz Alshammari; Mohammed Aleinzi; Abdulaziz Alanazi; Mahmoud Salaheldin Elsayed

doi:10.48084/etasr.9441

Authors

Nizal Alshammry Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
Taoufik Saidani Center for Scientific Research and Entrepreneurship, Northern Border University, 73213, Arar, Saudi Arabia
Nasser S. Albalawi Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
Sami Mohammed Alenezi Department of Information Technology, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
Fahd Alhamazani Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
Sami Aziz Alshammari Department of Information Technology, Faculty of Computing and Information Technology, Northern Border University Rafha, Saudi Arabia
Mohammed Aleinzi Department of Information Systems, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
Abdulaziz Alanazi Department of Information Systems, Faculty of Computing and Information Technology, Northern Border University Rafha, Saudi Arabia
Mahmoud Salaheldin Elsayed Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia

Volume: 15 | Issue: 1 | Pages: 19749-19755 | February 2025 | https://doi.org/10.48084/etasr.9441

Received: 29 October 2024 | Revised: 9 November 2024 | Accepted: 8 December 2024 | Online: 13 December 2024

Corresponding author: Taoufik Saidani

Abstract

The deployment of deep learning models on resource-constrained embedded platforms presents significant challenges due to limited computational power, memory, and energy efficiency. To address this issue, this study proposes a novel quantization method tailored to accelerate object detection using a quantized version of the YOLOv5m model, called Q_YOLOv5m. This method reduces the model's computational complexity and memory footprint, allowing for faster inference and lower power consumption, making it ideal for real-time applications on embedded systems. This approach incorporates advanced weight and activation quantization techniques to balance performance with accuracy, dynamically adjusting precision based on hardware capabilities. The efficacy of Q_YOLOv5m was confirmed, exhibiting substantial enhancements in inference speed and a reduction in model size with negligible loss in object detection accuracy. The findings underscore the capability of Q_YOLOv5m for edge applications, including autonomous vehicles, intelligent surveillance, and IoT-based monitoring systems.

Keywords:

object detection, quantization, embedded systems, deep learning

Downloads

Download data is not yet available.

References

A. Dhillon and G. K. Verma, "Convolutional neural network: a review of models, methodologies and applications to object detection," Progress in Artificial Intelligence, vol. 9, no. 2, pp. 85–112, Jun. 2020.

A. Lopes, F. Pereira dos Santos, D. de Oliveira, M. Schiezaro, and H. Pedrini, "Computer Vision Model Compression Techniques for Embedded Systems:A Survey," Computers & Graphics, vol. 123, Oct. 2024, Art. no. 104015.

A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, "A Survey of Quantization Methods for Efficient Neural Network Inference," in Low-Power Computer Vision, Chapman and Hall/CRC, 2022.

B. Yao, L. Liu, Y. Peng, and X. Peng, "Intelligent Measurement on Edge Devices Using Hardware Memory-Aware Joint Compression Enabled Neural Networks," IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–13, 2024.

H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, "Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation." arXiv, Apr. 20, 2020.

J. Gorospe, R. Mulero, O. Arbelaitz, J. Muguerza, and M. Á. Antón, "A Generalization Performance Study Using Deep Learning Networks in Embedded Systems," Sensors, vol. 21, no. 4, Jan. 2021, Art. no. 1031.

P. Xiao, C. Zhang, Q. Guo, X. Xiao, and H. Wang, "Neural Networks Integer Computation: Quantizing Convolutional Neural Networks of Inference and Training for Object Detection in Embedded Systems," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 15862–15884, 2024.

M. A. Hanif and M. Shafique, "Cross-Layer Optimizations for Efficient Deep Learning Inference at the Edge," in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Software Optimizations and Hardware/Software Codesign, S. Pasricha and M. Shafique, Eds. Cham: Springer Nature Switzerland, 2024, pp. 225–248.

M. Wang et al., "Q-YOLO: Efficient Inference for Real-Time Object Detection," in Pattern Recognition, Kitakyushu, Japan, Nov. 2023, pp. 307–321.

T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, "Pruning and quantization for deep neural network acceleration: A survey," Neurocomputing, vol. 461, pp. 370–403, Oct. 2021.

B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun. 2018, pp. 2704–2713.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," in Computer Vision – ECCV 2016, Amsterdam, The Netherlands, 2016, pp. 525–542.

J. Y. Li, Y. K. Zhao, Z. E. Xue., Z. Cai, and Q. Li., "A survey of model compression for deep neural networks," Chinese Journal of Engineering, vol. 41, no. 10, pp. 1229–1239, Oct. 2019.

P.-E. Novac, G. Boukli Hacene, A. Pegatoquet, B. Miramond, and V. Gripon, "Quantization and Deployment of Deep Neural Networks on Microcontrollers," Sensors, vol. 21, no. 9, Jan. 2021, Art. no. 2984.

A. Polino, R. Pascanu, and D. Alistarh, "Model compression via distillation and quantization." arXiv, Feb. 15, 2018.

Y. Ding et al., "Towards Accurate Post-Training Quantization for Vision Transformer," in Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, Oct. 2022, pp. 5380–5388.

M. Li et al., "Contemporary Advances in Neural Network Quantization: A Survey," in 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, Jun. 2024, pp. 1–10.

R. Zhang and A. C. S. Chung, "EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation," Medical Image Analysis, vol. 97, Oct. 2024, Art. no. 103277.

T. Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Computer Vision – ECCV 2014, Zurich, Switzerland, 2014, pp. 740–755.

T. Saidani, R. Ghodhbani, A. Alhomoud, A. Alshammari, H. Zayani, and M. B. Ammar, "Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 13066–13071, Feb. 2024.

R. Ghodhbani, T. Saidani, and H. Zayeni, "Deploying deep learning networks based advanced techniques for image processing on FPGA platform," Neural Computing and Applications, vol. 35, no. 26, pp. 18949–18969, Sep. 2023.

Vol. 14 (2024)	Vol. 7 (2017)
Vol. 13 (2023)	Vol. 6 (2016)
Vol. 12 (2022)	Vol. 5 (2015)
Vol. 11 (2021)	Vol. 4 (2014)
Vol. 10 (2020)	Vol. 3 (2013)
Vol. 9 (2019)	Vol. 2 (2012)
Vol. 8 (2018)	Vol. 1 (2011)

Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License

Most read articles by the same author(s)

Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform

Deep Learning for Tomato Disease Detection with YOLOv8

Deep Learning Approach: YOLOv5-based Custom Object Detection

Real Time FPGA Implementation of an Efficient High Speed Harris Corner Detection Algorithm Based on High-Level Synthesis

Improved and Efficient Object Detection Algorithm based on YOLOv5

Optimization of Concentrated Solar Power Systems with Thermal Storage for Enhanced Efficiency and Cost-Effectiveness in Thermal Power Plants

Model-based Design of a High-Throughput Canny Edge Detection Accelerator on Zynq-7000 FPGA

Adaptive Method for Feature Selection in the Machine Learning Context

An FPGA Accelerator for Real Time Hyperspectral Images Compression based on JPEG2000 Standard

Efficient Hardware Accelerator and Implementation of JPEG 2000 MQ Decoder Architecture