nelsonjchen / minigpt-4_vicuna-13b

MiniGPT-4 w/ Vicuna-13B (Image Question/Captioning Use)

  • Public
  • 52K runs
  • GitHub
  • License



Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 27 seconds. The predict time for this model varies significantly based on the inputs.


This is a hacky port of with Vicuna 13B weights to run on

It is currently probably leaving a lot of performance/savings on the floor! Prepare for long run times and disappointment.

  • This takes a really long time to cold boot. Alas, big models (2 of them, glued!), just get this treatment. Expect 15 minutes for cold boot!
  • 6 times longer runtime than BLIP-2
  • sometimes makes stuff up 🤦‍♂️, 13B problems?

The port is not faithful to the chat experience. However, instead it is repackaged a bit to be useful for image captioning and questioning.

Please see the README at .

NOTE: Since there’s LLaMA involved, this is only to be used for non-commercial purposes.

I’m sure there’ll be a free LLaMA-like model coming that the MiniGPT-4 people will conjure up soon enough. Until then, there’s this hack of an implementation, which really only needs to last long enough till the next bombshell.

Cover image is Vicuna’s favicon run through stable diffusion with a prompt like “Realistic llama with glowing realistic eyes” at denoise 0.7.