Fine-Tuning YOLOv8s for Unified Human and Face Detection in Crowded Environments

Oussama Lachihab; Ahmed El Kiram; Latifa Errajy

doi:10.48084/etasr.15307

Authors

Oussama Lachihab Department of Computer Science, Laboratory of Computer Science and Smart Systems, Faculty of Sciences Semlalia, Cadi Ayyad University, Marrakech, Morocco
Ahmed El Kiram Department of Computer Science, Laboratory of Computer Science and Smart Systems, Faculty of Sciences Semlalia, Cadi Ayyad University, Marrakech, Morocco
Latifa Errajy Department of Computer Science, Laboratory of Computer Science and Smart Systems, Faculty of Sciences Semlalia, Cadi Ayyad University, Marrakech, Morocco

Volume: 16 | Issue: 1 | Pages: 32348-32356 | February 2026 | https://doi.org/10.48084/etasr.15307

Received: 4 October 2025 | Revised: 2 November 2025 | Accepted: 12 November 2025 | Online: 9 February 2026

Corresponding author: Oussama Lachihab

Abstract

Accurate detection of human bodies and faces in densely populated scenes remains challenging due to occlusions and overlapping instances. This paper presents a lightweight object detection solution built upon the You Only Look Once version 8 small (YOLOv8s) architecture, fine-tuned for challenging urban scenes where occlusion, density, and limited computing resources are common. Leveraging an enhanced dataset with detailed person and face annotations, our model achieves a good mean Average Precision at IoU threshold 0.5 (mAP@0.5) of 57.61%, with particularly robust performance on full-body human detection (Average Precision (AP) = 73.5%). Despite moderate face detection accuracy (AP = 42.1%), qualitative results demonstrate solid performance under real-world constraints. The model's compact size and high inference speed make it ideally suited for deployment on edge devices, such as mobile cameras and embedded Artificial Intelligence (AI) systems. A compelling use case is explored through the lens of crowd monitoring in Jamaa El-Fna square in Marrakech, a bustling and high-density public space that demands real-time situational awareness. This work offers a practical tool for urban analytics and public safety, and it lays the foundation for future improvements in face detection, post-processing, and real-time system integration.

Keywords:

YOLOv8s, object detection, human detection, face detection, edge AI, real-time inference, crowd analysis, urban monitoring

References

D. T. Nguyen, W. Li, and P. O. Ogunbona, "Human detection from images and videos: A survey," Pattern Recognition, vol. 51, pp. 148–175, Mar. 2016. DOI: https://doi.org/10.1016/j.patcog.2015.08.027

H. Mokayed, T. Z. Quan, L. Alkhaled, and V. Sivakumar, "Real-Time Human Detection and Counting System Using Deep Learning Computer Vision Techniques," Artificial Intelligence and Applications, vol. 1, no. 4, pp. 205–213, Oct. 2023. DOI: https://doi.org/10.47852/bonviewAIA2202391

W. Chen, H. Huang, S. Peng, C. Zhou, and C. Zhang, "YOLO-face: a real-time face detector," The Visual Computer, vol. 37, no. 4, pp. 805–813, Apr. 2021. DOI: https://doi.org/10.1007/s00371-020-01831-7

M.-A. Fiedler, P. Werner, A. Khalifa, and A. Al-Hamadi, "SFPD: Simultaneous Face and Person Detection in Real-Time for Human–Robot Interaction," Sensors, vol. 21, no. 17, Sept. 2021, Art. no. 5918. DOI: https://doi.org/10.3390/s21175918

R. Stewart, M. Andriluka, and A. Y. Ng, "End-to-End People Detection in Crowded Scenes," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2325–2333. DOI: https://doi.org/10.1109/CVPR.2016.255

L. Stearns and A. Thieme, "Automated Person Detection in Dynamic Scenes to Assist People with Vision Impairments: An Initial Investigation," in Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility, Galway, Ireland, 2018, pp. 391–394. DOI: https://doi.org/10.1145/3234695.3241017

L. Van Ma, T. T. D. Nguyen, C. Shim, D. Y. Kim, N. Ha, and M. Jeon, "Visual multi-object tracking with re-identification and occlusion handling using labeled random finite sets," Pattern Recognition, vol. 156, Dec. 2024, Art. no. 110785. DOI: https://doi.org/10.1016/j.patcog.2024.110785

Y. Hu and A. J. O’Toole, "First impressions: Integrating faces and bodies in personality trait perception," Cognition, vol. 231, Feb. 2023, Art. no. 105309. DOI: https://doi.org/10.1016/j.cognition.2022.105309

L. M. Wastupranata, S. G. Kong, and L. Wang, "Deep Learning for Abnormal Human Behavior Detection in Surveillance Videos—A Survey," Electronics, vol. 13, no. 13, June 2024, Art. no. 2579. DOI: https://doi.org/10.3390/electronics13132579

A. Ali, G. Gaikov, D. Rybalchenko, A. Chigorin, I. Laptev, and S. Zagoruyko, "PairDETR : Joint Detection and Association of Human Bodies and Faces," in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2024, pp. 423–432. DOI: https://doi.org/10.1109/CVPR52733.2024.00048

J. Wan, J. Deng, X. Qiu, and F. Zhou, "Body-Face Joint Detection via Embedding and Head Hook," in 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 2939–2948. DOI: https://doi.org/10.1109/ICCV48922.2021.00295

P. P. Filntisis, N. Efthymiou, P. Koutras, G. Potamianos, and P. Maragos, "Fusing Body Posture With Facial Expressions for Joint Recognition of Affect in Child–Robot Interaction," IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4011–4018, Oct. 2019. DOI: https://doi.org/10.1109/LRA.2019.2930434

T. Zhou, S. Gao, Y. Mei, and L. Wang, "Facial Expressions and Body Postures Emotion Recognition based on Convolutional Attention Network," in 2021 International Conference on Computer, Information and Telecommunication Systems, Istanbul, Turkey, 2021, pp. 1–5. DOI: https://doi.org/10.1109/CITS52676.2021.9618520

Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, "Object Detection With Deep Learning: A Review," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, Nov. 2019. DOI: https://doi.org/10.1109/TNNLS.2018.2876865

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017. DOI: https://doi.org/10.1145/3065386

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv, Apr. 10, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778. DOI: https://doi.org/10.1109/CVPR.2016.90

R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 580–587. DOI: https://doi.org/10.1109/CVPR.2014.81

R. Girshick, "Fast R-CNN," in 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1440–1448. DOI: https://doi.org/10.1109/ICCV.2015.169

S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, June 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 779–788. DOI: https://doi.org/10.1109/CVPR.2016.91

J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6517–6525. DOI: https://doi.org/10.1109/CVPR.2017.690

J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement." arXiv, Apr. 08, 2018.

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv, Apr. 23, 2020.

G. Jocher et al., ultralytics/yolov5: v6.0 - YOLOv5n ‘Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support. (2021), Zenodo.

C. Li et al., "YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications." arXiv, Sept. 07, 2022.

C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors," in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 7464–7475. DOI: https://doi.org/10.1109/CVPR52729.2023.00721

G. Jocher, J. Qiu, and A. Chaurasia, Ultralytics YOLOv8. (2023), Github. Accessed: Jan. 08, 2026. [Online]. Available: https://github.com/ultralytics/ultralytics.

C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information," in 18th European Conference on Computer Vision, Milan, Italy, 2024, pp. 1–21. DOI: https://doi.org/10.1007/978-3-031-72751-1_1

A. Wang et al., "YOLOv10: Real-Time End-to-End Object Detection," in 38th Conference on Neural Information Processing Systems, Vancouver, Canada, 2024, pp. 107984–108011. DOI: https://doi.org/10.52202/079017-3429

R. Khanam and M. Hussain, "YOLOv11: An Overview of the Key Architectural Enhancements." arXiv, Oct. 23, 2024.

Y. Tian, Q. Ye, and D. Doermann, "YOLOv12: Attention-Centric Real-Time Object Detectors." arXiv, Feb. 18, 2025.

W. Liu et al., "SSD: Single Shot MultiBox Detector," in 14th European Conference on Computer Vision, Amsterdam, Netherlands, 2016, pp. 21–37. DOI: https://doi.org/10.1007/978-3-319-46448-0_2

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2999–3007. DOI: https://doi.org/10.1109/ICCV.2017.324

P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, p. I–I.

N. Zhang, J. Luo, and W. Gao, "Research on Face Detection Technology Based on MTCNN," in 2020 International Conference on Computer Network, Electronic and Automation, Xi’an, China, 2020, pp. 154–158. DOI: https://doi.org/10.1109/ICCNEA50255.2020.00040

B. Ye, Y. Shi, H. Li, L. Li, and S. Tong, "Face SSD: A Real-time Face Detector based on SSD," in 2021 40th Chinese Control Conference, Shanghai, China, 2021, pp. 8445–8450. DOI: https://doi.org/10.23919/CCC52363.2021.9550294

Z. Yu, H. Huang, W. Chen, Y. Su, Y. Liu, and X. Wang, "YOLO-FaceV2: A scale and occlusion aware face detector," Pattern Recognition, vol. 155, Nov. 2024, Art. no. 110714. DOI: https://doi.org/10.1016/j.patcog.2024.110714

M. Ş. Gündüz and G. Işık, "A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models," Journal of Real-Time Image Processing, vol. 20, no. 1, Jan. 2023, Art. no. 5. DOI: https://doi.org/10.1007/s11554-023-01276-w

S. Ennaama, H. Silkan, A. Bentajer, and A. Tahiri, "Enhanced Real-Time Object Detection using YOLOv7 and MobileNetv3," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19181–19187, Feb. 2025. DOI: https://doi.org/10.48084/etasr.8777

H. H. Nguyen, T. N. Ta, N. C. Nguyen, V. T. Bui, H. M. Pham, and D. M. Nguyen, "YOLO Based Real-Time Human Detection for Smart Video Surveillance at the Edge," in 2020 IEEE Eighth International Conference on Communications and Electronics, Phu Quoc Island, Vietnam, 2021, pp. 439–444. DOI: https://doi.org/10.1109/ICCE48956.2021.9352144

Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 172–186, Jan. 2021. DOI: https://doi.org/10.1109/TPAMI.2019.2929257

K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep High-Resolution Representation Learning for Human Pose Estimation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 5686–5696. DOI: https://doi.org/10.1109/CVPR.2019.00584

K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," in 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980–2988. DOI: https://doi.org/10.1109/ICCV.2017.322

D. Maji, S. Nagori, M. Mathew, and D. Poddar, "YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, New Orleans, LA, USA, 2022, pp. 2636–2645. DOI: https://doi.org/10.1109/CVPRW56347.2022.00297

H.-S. Fang et al., "AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7157–7173, June 2023. DOI: https://doi.org/10.1109/TPAMI.2022.3222784

C. Chi, S. Zhang, J. Xing, Z. Lei, S. Z. Li, and X. Zou, "Relational Learning for Joint Head and Human Detection," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 10647–10654, Apr. 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6691

S. Shao et al., "CrowdHuman: A Benchmark for Detecting Human in a Crowd." arXiv, Apr. 30, 2018.