Autonomous Defect Classification in Manufacturing: A Novel Few-Shot Vision-Language Modeling Approach

Hoang Long Dinh; Diem Vuong Doan; Ngoc Khoat Nguyen; Duy Trung Nguyen; Huy Hoang Hoang

doi:10.48084/etasr.18859

Authors

Hoang Long Dinh Faculty of Control and Automation, Electric Power University, Hanoi, Vietnam
Diem Vuong Doan Faculty of Control and Automation, Electric Power University, Hanoi, Vietnam
Ngoc Khoat Nguyen Faculty of Control and Automation, Electric Power University, Hanoi, Vietnam
Duy Trung Nguyen Faculty of Control and Automation, Electric Power University, Hanoi, Vietnam
Huy Hoang Hoang Faculty of Control and Automation, Electric Power University, Hanoi, Vietnam

Volume: 16 | Issue: 3 | Pages: 36545-36551 | June 2026 | https://doi.org/10.48084/etasr.18859

Received: 21 March 2026 | Revised: 4 May 2026 | Accepted: 11 May 2026 | Online: 15 May 2026

Corresponding author: Ngoc Khoat Nguyen

Abstract

While established Automated Optical Inspection (AOI) systems and contemporary unsupervised deep learning methodologies exhibit high efficacy in anomaly detection, their inherent "black box" nature precludes them from achieving the semantic interpretability required to ascertain the intrinsic classification of a detected defect. This critical deficiency constitutes a significant operational bottleneck, necessitating costly and intensive manual intervention for labeling and expert-driven retraining to accommodate novel defect typologies. This research introduces VLM-AOI, a paradigm shift achieved through the robust integration of large-scale Vision–Language Models (VLMs) into a two-stage hybrid framework. The system strategically initiates with a localization stage, employing a state-of-the-art unsupervised model to precisely delineate candidate defect regions. Subsequently, the interpretation stage leverages the VLM's powerful cross-modal understanding to execute zero-shot classification of these localized regions based on rich semantic text prompts. Experimental outcomes validate that the VLM-AOI framework not only preserves the high pixel-level detection fidelity of current methods but also delivers superior zero-shot classification accuracy, effectively identifying defect classes entirely unrepresented in the training data. This innovation dramatically enhances operational adaptability and reduces both manufacturing downtime and associated expenditure.

Keywords:

Automated Optical Inspection (AOI), Vision–Language Model (VLM), zero-shot defect classification, smart manufacturing, semantic understanding, anomaly detection

References

S. Ruengrote, K. Kasetravetin, P. Srisom, T. Sukchok, and D. Kaewdook, "Design of Deep Learning Techniques for PCBs Defect Detecting System based on YOLOv10," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18741–18749, Dec. 2024.

N. Sajitha and S. P. Priya, "Optimal Artificial Neural Network-based Fabric Defect Detection and Classification," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13148–13152, Apr. 2024.

K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, "Towards Total Recall in Industrial Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 14298–14308.

T. Defard, A. Setkov, A. Loesch, and R. Audigier, "PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization," in 25th International Conference on Pattern Recognition Workshops, Virtual Event, 2021, pp. 475–489.

P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, "The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection," International Journal of Computer Vision, vol. 129, no. 4, pp. 1038–1059, Apr. 2021.

M. A. M. Sathiaseelan, O. P. Paradis, S. Taheri, and N. Asadizanjani, "Why Is Deep Learning Challenging for Printed Circuit Board (PCB) Component Recognition and How Can We Address It?," Cryptography, vol. 5, no. 1, Mar. 2021, Art. no. 9.

Y. Liu, C. Zhang, and X. Dong, "A survey of real-time surface defect inspection methods based on deep learning," Artificial Intelligence Review, vol. 56, no. 10, pp. 12131–12170, Oct. 2023.

W. Wang, V. W. Zheng, H. Yu, and C. Miao, "A Survey of Zero-Shot Learning: Settings, Methods, and Applications," ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, Jan. 2019, Art. no. 13.

A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 8748–8763.

C. Jia et al., "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 4904–4916.

K. Moenck, D. T. Thieu, J. Koch, and T. Schüppstuhl, "Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings," Procedia CIRP, vol. 130, pp. 250–263, Jan. 2024.

J. Zhou, W. Liu, G. Yang, H. Zhao, and F. Yuan, "Prompting Industrial Anomaly Segment with Large Vision-Language Models," in Proceedings of the 6th ACM International Conference on Multimedia in Asia, Auckland, New Zealand, 2024.

G. Shinde, A. Ravi, E. Dey, S. Sakib, M. Rampure, and N. Roy, "A Survey on Efficient Vision-Language Models," WIREs Data Mining and Knowledge Discovery, vol. 15, no. 3, Sept. 2025, Art. no. e70036.

Z. Li, Y. Yan, X. Wang, Y. Ge, and L. Meng, "A survey of deep learning for industrial visual anomaly detection," Artificial Intelligence Review, vol. 58, no. 9, June 2025, Art. no. 279.

Q. Ling and N. A. M. Isa, "Printed Circuit Board Defect Detection Methods Based on Image Processing, Machine Learning and Deep Learning: A Survey," IEEE Access, vol. 11, pp. 15921–15944, 2023.

J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, "WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation," in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 19606–19616.

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A Simple Framework for Contrastive Learning of Visual Representations," in Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 2020, pp. 1597–1607.

tangsanli5201, "tangsanli5201/DeepPCB." May 13, 2026. [Online]. Available: https://github.com/tangsanli5201/DeepPCB.

S. Tang, F. He, X. Huang, and J. Yang, "Online PCB Defect Detector On A New PCB Defect Dataset." arXiv, Feb. 17, 2019.