Enhanced Real-Time Object Detection using YOLOv7 and MobileNetv3
Received: 20 August 2024 | Revised: 12 October 2024 and 13 October 2024 | Accepted: 16 November 2024 | Online: 29 November 2024
Corresponding author: Sara Ennaama
Abstract
Object detection serves as a crucial element in computer vision, increasingly relying on deep learning techniques. Among various methods, the YOLO series has gained recognition as an effective solution. This research enhances object detection by merging YOLOv7 with MobileNetv3, known for its efficiency and feature extraction. The integrated model was tested using the COCO dataset, which contains over 164,000 images across 80 categories, achieving a mAP score of 0.61. Additionally, confusion matrix analysis confirmed its accuracy, especially in detecting common objects such as 'person' and 'car' with minimal misclassifications. The results demonstrate the potential of the proposed model to address the complexities of real-world scenarios, highlighting its applicability in various scientific and industrial domains.
Keywords:
real-time object detection, deep learning, YOLOv7, MobileNetv3, computer visionDownloads
References
P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, vol. 1, p. I-511-I–518.
N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 2005, vol. 1, pp. 886–893.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, Jun. 2014, pp. 580–587.
R. Girshick, "Fast R-CNN," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, Dec. 2015, pp. 1440–1448.
S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." arXiv, 2015.
W. Liu et al., "SSD: Single Shot MultiBox Detector," in Computer Vision – ECCV 2016, 2016, vol. 9905, pp. 21–37.
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017, pp. 2999–3007.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 779–788.
J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul. 2017, pp. 6517–6525.
J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement." arXiv, 2018.
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv, 2020.
E. Iren, "Comparison of YOLOv5 and YOLOv6 Models for Plant Leaf Disease Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13714–13719, Apr. 2024.
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." arXiv, 2022.
A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv, 2017.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun. 2018, pp. 4510–4520.
A. Howard et al., "Searching for MobileNetV3," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), Oct. 2019, pp. 1314–1324.
T. Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Computer Vision – ECCV 2014, Zurich, Switzerland, 2014, pp. 740–755.
"Deed - Attribution 4.0 International - Creative Commons." https://creativecommons.org/licenses/by/4.0/.
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, "RepVGG: Making VGG-style ConvNets Great Again," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, Jun. 2021, pp. 13728–13737.
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, Jun. 2021, pp. 13024–13033.
T. Jiang and J. Cheng, "Target Recognition Based on CNN with LeakyReLU and PReLU Activation Functions," in 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Beijing, China, Aug. 2019, pp. 718–722.
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "YOLOX: Exceeding YOLO Series in 2021." arXiv, 2021.
A. M. Roy and J. Bhaduri, "A Deep Learning Enabled Multi-Class Plant Disease Detection Model Based on Computer Vision," AI, vol. 2, no. 3, pp. 413–428, Aug. 2021.
Downloads
How to Cite
License
Copyright (c) 2024 Sara Ennaama, Hassan Silkan, Ahmed Bentajer, Abderrahim Tahiri
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.