[Hands-On] Prompt-based Image Classification with CLIP

13 min readJul 13, 2024

(You can find the Korean version of the post at this link.)

In the previous post, we looked at a prompt-based approach for text classification.

[Hands-On] Prompt-based Text Classification Using Large Language Models

Learn how to implement prompt-based text classification using large language models like GPT-3.5.

medium.com

In this post, we’ll extend that concept and apply it to image classification tasks. Specifically, we’ll implement actual code to perform prompt-based image classification using OpenAI’s CLIP (Contrastive Language–Image Pre-training) model and analyze the results.

What is Prompt-based Image Classification?

Prompt-based image classification is an innovative method of classifying images using vision-language models such as CLIP (Contrastive Language-Image Pre-training). These models are trained on large-scale image-text pair data, enabling them to understand deep associations between visual information and linguistic descriptions.

Starting with CLIP, models like ALIGN (Aligning text and images), DALL-E, Imagen, and Stable Diffusion have emerged, greatly enhancing the ability to comprehensively understand images and text. These models can go beyond simply classifying images to explain image content in natural language or generate images based on textual descriptions.

[Hands-On] Prompt-based Image Classification with CLIP

[Hands-On] Prompt-based Text Classification Using Large Language Models

Learn how to implement prompt-based text classification using large language models like GPT-3.5.

What is Prompt-based Image Classification?

Written by Hugman Sangkeun Jung