gfodor/ instructblip

Image captioning via vision-language models with instruction tuning