gfodor / instructblip
Image captioning via vision-language models with instruction tuning
Input
Output
Want to make some of these yourself?
Run this modelImage captioning via vision-language models with instruction tuning
Want to make some of these yourself?
Run this model