Autonomous Defect Classification in Manufacturing: A Novel Few-Shot Vision-Language Modeling Approach
Received: 21 March 2026 | Revised: 4 May 2026 | Accepted: 11 May 2026 | Online: 15 May 2026
Corresponding author: Ngoc Khoat Nguyen
Abstract
While established Automated Optical Inspection (AOI) systems and contemporary unsupervised deep learning methodologies exhibit high efficacy in anomaly detection, their inherent "black box" nature precludes them from achieving the semantic interpretability required to ascertain the intrinsic classification of a detected defect. This critical deficiency constitutes a significant operational bottleneck, necessitating costly and intensive manual intervention for labeling and expert-driven retraining to accommodate novel defect typologies. This research introduces VLM-AOI, a paradigm shift achieved through the robust integration of large-scale Vision–Language Models (VLMs) into a two-stage hybrid framework. The system strategically initiates with a localization stage, employing a state-of-the-art unsupervised model to precisely delineate candidate defect regions. Subsequently, the interpretation stage leverages the VLM's powerful cross-modal understanding to execute zero-shot classification of these localized regions based on rich semantic text prompts. Experimental outcomes validate that the VLM-AOI framework not only preserves the high pixel-level detection fidelity of current methods but also delivers superior zero-shot classification accuracy, effectively identifying defect classes entirely unrepresented in the training data. This innovation dramatically enhances operational adaptability and reduces both manufacturing downtime and associated expenditure.
Keywords:
Automated Optical Inspection (AOI), Vision–Language Model (VLM), zero-shot defect classification, smart manufacturing, semantic understanding, anomaly detectionReferences
S. Ruengrote, K. Kasetravetin, P. Srisom, T. Sukchok, and D. Kaewdook, "Design of Deep Learning Techniques for PCBs Defect Detecting System based on YOLOv10," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18741–18749, Dec. 2024.
N. Sajitha and S. P. Priya, "Optimal Artificial Neural Network-based Fabric Defect Detection and Classification," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13148–13152, Apr. 2024.
K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, "Towards Total Recall in Industrial Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 14298–14308.
T. Defard, A. Setkov, A. Loesch, and R. Audigier, "PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization," in 25th International Conference on Pattern Recognition Workshops, Virtual Event, 2021, pp. 475–489.
P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, "The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection," International Journal of Computer Vision, vol. 129, no. 4, pp. 1038–1059, Apr. 2021.
M. A. M. Sathiaseelan, O. P. Paradis, S. Taheri, and N. Asadizanjani, "Why Is Deep Learning Challenging for Printed Circuit Board (PCB) Component Recognition and How Can We Address It?," Cryptography, vol. 5, no. 1, Mar. 2021, Art. no. 9.
Y. Liu, C. Zhang, and X. Dong, "A survey of real-time surface defect inspection methods based on deep learning," Artificial Intelligence Review, vol. 56, no. 10, pp. 12131–12170, Oct. 2023.
W. Wang, V. W. Zheng, H. Yu, and C. Miao, "A Survey of Zero-Shot Learning: Settings, Methods, and Applications," ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, Jan. 2019, Art. no. 13.
A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 8748–8763.
C. Jia et al., "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision," in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 4904–4916.
K. Moenck, D. T. Thieu, J. Koch, and T. Schüppstuhl, "Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings," Procedia CIRP, vol. 130, pp. 250–263, Jan. 2024.
J. Zhou, W. Liu, G. Yang, H. Zhao, and F. Yuan, "Prompting Industrial Anomaly Segment with Large Vision-Language Models," in Proceedings of the 6th ACM International Conference on Multimedia in Asia, Auckland, New Zealand, 2024.
G. Shinde, A. Ravi, E. Dey, S. Sakib, M. Rampure, and N. Roy, "A Survey on Efficient Vision-Language Models," WIREs Data Mining and Knowledge Discovery, vol. 15, no. 3, Sept. 2025, Art. no. e70036.
Z. Li, Y. Yan, X. Wang, Y. Ge, and L. Meng, "A survey of deep learning for industrial visual anomaly detection," Artificial Intelligence Review, vol. 58, no. 9, June 2025, Art. no. 279.
Q. Ling and N. A. M. Isa, "Printed Circuit Board Defect Detection Methods Based on Image Processing, Machine Learning and Deep Learning: A Survey," IEEE Access, vol. 11, pp. 15921–15944, 2023.
J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, "WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation," in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 19606–19616.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A Simple Framework for Contrastive Learning of Visual Representations," in Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 2020, pp. 1597–1607.
tangsanli5201, "tangsanli5201/DeepPCB." May 13, 2026. [Online]. Available: https://github.com/tangsanli5201/DeepPCB.
S. Tang, F. He, X. Huang, and J. Yang, "Online PCB Defect Detector On A New PCB Defect Dataset." arXiv, Feb. 17, 2019.
Downloads
How to Cite
License
Copyright (c) 2026 Hoang Long Dinh, Diem Vuong Doan, Ngoc Khoat Nguyen, Duy Trung Nguyen, Huy Hoang Hoang

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
