CamoVision: A Dual-Mode Deep Learning Framework for Camouflaged Object Detection in Images and Videos

Jaskaranjeet Singh; Sofia Singh; Dipti Theng; Urvashi Agrawal; Sanjay Balwani; Rahul Dhuture; Rahul Agrawal

doi:10.48084/etasr.15125

Authors

Jaskaranjeet Singh Department of Artificial Intelligence, Amity School of Engineering and Technology, Noida, Uttar Pradesh, India
Sofia Singh Department of Artificial Intelligence, Amity School of Engineering and Technology, Noida, Uttar Pradesh, India
Dipti Theng Department of Computer Science and Engineering, Symbiosis Institute of Technology Pune, Symbiosis International (Deemed University), Pune, India
Urvashi Agrawal Department of Electronics & Telecommunication Engineering, Jhulelal Institute of Technology, Nagpur, India
Sanjay Balwani Department of Electronics and Telecommunication Engineering, Jhulelal Institute of Technology, Nagpur, India
Rahul Dhuture Department of Electronics Engineering, Ramdeobaba University, Nagpur, India
Rahul Agrawal Department of Data Science, IOT, Cybersecurity, G H Raisoni College of Engineering, Nagpur, India

Volume: 15 | Issue: 6 | Pages: 30154-30160 | December 2025 | https://doi.org/10.48084/etasr.15125

Received: 26 September 2025 | Revised: 16 October 2025 and 21 October 2025 | Accepted: 24 October 2025 | Online: 8 December 2025

Corresponding author: Sofia Singh

Abstract

Camouflaged Object Detection (COD), is a technology with applications in military surveillance, protection of animals, and intelligent security systems. Traditional computer vision COD methods, such as edge detection and color-based segmentation, frequently fail to function well in real-world scenarios that undergo rapid transformations over time. CamoVision is a Deep Learning (DL)-based dual-mode framework that has the ability to locate camouflaged objects in photos (CamoVision 1.0) and video streams (CamoVision 2.0). To improve the design, which is based on the U-Net and a ResNet-50 encoder, a hybrid loss function that consisted of Dice and BCE was utilized. In addition, the model was trained using strategies that involved mixed precision to maximize its efficiency and speed up the convergence process. The acquired Intersection-over-Union (IoU) score of 0.82 and Dice coefficient of 0.85 showcase the robustness of the proposed system. In addition, the video pipeline operates in real time at a rate of 30 fps, which makes it versatile enough to be utilized in settings where time is of particular significance.

Keywords:

camouflaged object detection, semantic segmentation, deep learning, realtime video analysis, computer vision

Downloads

Download data is not yet available.

References

D. Marr and E. Hildreth, “Theory of edge detection,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 207, no. 1167, pp. 187–217, 1980. DOI: https://doi.org/10.1098/rspb.1980.0020

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015, pp. 234–241. DOI: https://doi.org/10.1007/978-3-319-24574-4_28

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, Jun. 2005, vol. 1, pp. 886–893. DOI: https://doi.org/10.1109/CVPR.2005.177

B. Schiele and J. L. Crowley, “Recognition without Correspondence using Multidimensional Receptive Field Histograms,” International Journal of Computer Vision, vol. 36, no. 1, pp. 31–50, Jan. 2000. DOI: https://doi.org/10.1023/A:1008120406972

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” in Computer Vision – ECCV 2018, Cham, 2018, pp. 833–851. DOI: https://doi.org/10.1007/978-3-030-01234-2_49

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalized Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 2017, pp. 240–248. DOI: https://doi.org/10.1007/978-3-319-67558-9_28

P. Micikevicius et al., “Mixed Precision Training,” presented at the ICLR 2018, Feb. 2018.

S. Sajini and B. Pushpa, “A Binary Object Detection Pattern Model to Assist the Visually Impaired in detecting Normal and Camouflaged Faces,” Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12716–12721, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6631

E. Irwansyah, A. A. S. Gunawan, H. Pranoto, F. S. Pramudya, and L. Fakhriadi, “Deep Learning with Semantic Segmentation Approach for Building Rooftop Mapping in Urban Irregular Housing Complexes,” Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20580–20587, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9670

S. M. Fati and O. Al-Omari, “Deep Learning-Based Automated Segmentation of the Parcellated Corpus Callosum in Brain MRI,” Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27357–27362, Oct. 2025. DOI: https://doi.org/10.48084/etasr.11783

A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, and J. Zou, “Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild,” presented at the 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, CA, USA, Jun. 2019.

R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2017.

A. Haider, “Adaptive Camouflaged Dataset (ACD1K).” [Online]. Available: https://www.kaggle.com/datasets/aalihhiader/military-camouflage-soldiers-dataset-mcs1k.