nelsonjchen / minigpt-4_vicuna-13b

MiniGPT-4 w/ Vicuna-13B (Image Question/Captioning Use)

  • Public
  • 52K runs
  • A100 (80GB)
  • GitHub
  • License
Iterate in playground

Input

image
*file

Input image to discuss

string
Shift + Return to add a new line

Message to send to the bot.

Default: "Please describe the image."

integer
(minimum: 1, maximum: 10)

beam search numbers

Default: 1

number
(minimum: 0.1, maximum: 2)

temperature

Default: 1

Output

This photo is funny because it shows a group of men in suits standing in front of a mirror, looking at themselves. The man in the middle is wearing a suit and tie, while the other men are wearing suits and ties as well. They all appear to be looking at themselves in the mirror, which adds to the humor of the photo. The fact that they are all dressed up in suits and ties, but standing in front of a bathroom mirror, is ironic and adds to the humor of the photo.
Generated in

This output was created using a different version of the model, nelsonjchen/minigpt-4_vicuna-13b:c1f0352f.

Run time and cost

This model costs approximately $0.037 to run on Replicate, or 27 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 27 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This is a hacky port of https://github.com/Vision-CAIR/MiniGPT-4 with Vicuna 13B weights to run on Replicate.com.

It is currently probably leaving a lot of performance/savings on the floor! Prepare for long run times and disappointment.

  • This takes a really long time to cold boot. Alas, big models (2 of them, glued!), just get this treatment. Expect 15 minutes for cold boot!
  • 6 times longer runtime than BLIP-2
  • sometimes makes stuff up 🤦‍♂️, 13B problems?

The port is not faithful to the chat experience. However, instead it is repackaged a bit to be useful for image captioning and questioning.

Please see the README at https://github.com/nelsonjchen/MiniGPT-4 .

NOTE: Since there’s LLaMA involved, this is only to be used for non-commercial purposes.

I’m sure there’ll be a free LLaMA-like model coming that the MiniGPT-4 people will conjure up soon enough. Until then, there’s this hack of an implementation, which really only needs to last long enough till the next bombshell.

Cover image is Vicuna’s favicon run through stable diffusion with a prompt like “Realistic llama with glowing realistic eyes” at denoise 0.7.