nelsonjchen / minigpt-4_vicuna-7b

MiniGPT-4 w/ Vicuna-7B (Image Question/Captioning Use)

  • Public
  • 9.8K runs
  • GitHub
  • License



Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 8 seconds. The predict time for this model varies significantly based on the inputs.


This is a hacky port of with Vicuna 7B weights to run on

Consider using, though the cold start on that one might be worse, it should be better than this model. This was uploaded here for comparison.

It is currently probably leaving a lot of performance/savings on the floor! Prepare for long run times and disappointment.

  • This takes a really long time to cold boot. Alas, big models (2 of them, glued!), just get this treatment.
  • 6 times longer runtime than BLIP-2
  • sometimes makes stuff up 🤦‍♂️, 7B problems?

The port is not faithful to the chat experience. However, instead it is repackaged a bit to be useful for image captioning and questioning.

Please see the README at .

NOTE: Since there’s LLaMA involved, this is only to be used for non-commercial purposes.

I’m sure there’ll be a free LLaMA-like model coming that the MiniGPT-4 people will conjure up soon enough. Until then, there’s this hack of an implementation, which really only needs to last long enough till the next bombshell.

Cover image is Vicuna’s favicon run through stable diffusion with a prompt like “Realistic llama with glowing realistic eyes” at denoise 0.7.