Yong, G., Jeon, K., Gil, D., and Lee. G., (2022), "Prompt Engineering for Zero-Shot and Few-Shot Defect Detection and Classification Using a Visual-Language Pretrained Model". "Computer-Aided Civil and Infrastructure Engineering (CACIE)

Zero-shot learning, applied with vision-language pretrained (VLP) models, are expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language-image pre-training (CLIP), showed fluctuated performance on prompts (inputs), resulting in research on prompt engineering—optimization of prompts for improving performance. Therefore, this study aims to identify the features of a prompt that can yield the best performance in classifying and detecting building defects using the zero-shot and few-shot capabilities of CLIP. The results reveal the following: (1) domain-specific definitions are better than general definitions and images; (2) a complete sentence is better than a set of core terms; and (3) multimodal information is better than single-modal information. The resulting detection performance using the proposed prompting method outperformed that of existing supervised models. Learn more…