Image to text
Models that generate text prompts and captions from images

salesforce / blip
Bootstrapping Language-Image Pre-training

andreasjansson / blip-2
Answers questions about images

methexis-inc / img2prompt
Get an approximate text prompt, with style, matching an image. (Optimized for stable-diffusion (clip ViT-L/14))

yorickvp / llava-13b
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

rmokady / clip_prefix_caption
Simple image captioning model using CLIP and GPT-2

pharmapsychotic / clip-interrogator
The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

daanelson / minigpt-4
A model which generates text in response to an input image and prompt.

j-min / clip-caption-reward
Fine-grained Image Captioning with CLIP Reward

joehoover / instructblip-vicuna13b
An instruction-tuned multi-modal model based on BLIP-2 and Vicuna-13B

joehoover / mplug-owl
An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images

nohamoamary / image-captioning-with-visual-attention
datasets: Flickr8k