Instruction tuned text-to-image diffusion models as vision generalists
Want to make some of these yourself?