Collections
Image to text
Models that generate text prompts and captions from images

salesforce/blip
Bootstrapping Language-Image Pre-training
8.9M runs

methexis-inc/img2prompt
Get an approximate text prompt, with style, matching an image. (Optimized for stable-diffusion (clip ViT-L/14))
860K runs

pharmapsychotic/clip-interrogator
The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!
274.7K runs

rmokady/clip_prefix_caption
Simple image captioning model using CLIP and GPT-2
242.9K runs

j-min/clip-caption-reward
Fine-grained Image Captioning with CLIP Reward
49.7K runs

nohamoamary/image-captioning-with-visual-attention
datasets: Flickr8k
944 runs

kdexd/virtex-image-captioning
Image captioning with VirTex
328 runs