joehoover / mplug-owl
An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images
Run time and cost
This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 8 seconds. The predict time for this model varies significantly based on the inputs.