gfodor
/
instructblip
Image captioning via vision-language models with instruction tuning
Image captioning via vision-language models with instruction tuning