CamoVision: A Dual-Mode Deep Learning Framework for Camouflaged Object Detection in Images and Videos
Received: 26 September 2025 | Revised: 16 October 2025 and 21 October 2025 | Accepted: 24 October 2025 | Online: 8 December 2025
Corresponding author: Sofia Singh
Abstract
Camouflaged Object Detection (COD), is a technology with applications in military surveillance, protection of animals, and intelligent security systems. Traditional computer vision COD methods, such as edge detection and color-based segmentation, frequently fail to function well in real-world scenarios that undergo rapid transformations over time. CamoVision is a Deep Learning (DL)-based dual-mode framework that has the ability to locate camouflaged objects in photos (CamoVision 1.0) and video streams (CamoVision 2.0). To improve the design, which is based on the U-Net and a ResNet-50 encoder, a hybrid loss function that consisted of Dice and BCE was utilized. In addition, the model was trained using strategies that involved mixed precision to maximize its efficiency and speed up the convergence process. The acquired Intersection-over-Union (IoU) score of 0.82 and Dice coefficient of 0.85 showcase the robustness of the proposed system. In addition, the video pipeline operates in real time at a rate of 30 fps, which makes it versatile enough to be utilized in settings where time is of particular significance.
Keywords:
camouflaged object detection, semantic segmentation, deep learning, realtime video analysis, computer visionDownloads
References
D. Marr and E. Hildreth, “Theory of edge detection,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 207, no. 1167, pp. 187–217, 1980. DOI: https://doi.org/10.1098/rspb.1980.0020
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015, pp. 234–241. DOI: https://doi.org/10.1007/978-3-319-24574-4_28
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, Jun. 2005, vol. 1, pp. 886–893. DOI: https://doi.org/10.1109/CVPR.2005.177
B. Schiele and J. L. Crowley, “Recognition without Correspondence using Multidimensional Receptive Field Histograms,” International Journal of Computer Vision, vol. 36, no. 1, pp. 31–50, Jan. 2000. DOI: https://doi.org/10.1023/A:1008120406972
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” in Computer Vision – ECCV 2018, Cham, 2018, pp. 833–851. DOI: https://doi.org/10.1007/978-3-030-01234-2_49
C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalized Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 2017, pp. 240–248. DOI: https://doi.org/10.1007/978-3-319-67558-9_28
P. Micikevicius et al., “Mixed Precision Training,” presented at the ICLR 2018, Feb. 2018.
S. Sajini and B. Pushpa, “A Binary Object Detection Pattern Model to Assist the Visually Impaired in detecting Normal and Camouflaged Faces,” Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12716–12721, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6631
E. Irwansyah, A. A. S. Gunawan, H. Pranoto, F. S. Pramudya, and L. Fakhriadi, “Deep Learning with Semantic Segmentation Approach for Building Rooftop Mapping in Urban Irregular Housing Complexes,” Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20580–20587, Apr. 2025. DOI: https://doi.org/10.48084/etasr.9670
S. M. Fati and O. Al-Omari, “Deep Learning-Based Automated Segmentation of the Parcellated Corpus Callosum in Brain MRI,” Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27357–27362, Oct. 2025. DOI: https://doi.org/10.48084/etasr.11783
A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, and J. Zou, “Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild,” presented at the 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, CA, USA, Jun. 2019.
R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2017.
A. Haider, “Adaptive Camouflaged Dataset (ACD1K).” [Online]. Available: https://www.kaggle.com/datasets/aalihhiader/military-camouflage-soldiers-dataset-mcs1k.
Downloads
How to Cite
License
Copyright (c) 2025 Jaskaranjeet Singh, Sofia Singh, Dipti Theng, Urvashi Agrawal, Sanjay Balwani, Rahul Dhuture, Rahul Agrawal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
