joehoover / mplug-owl

An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images

Demo API Examples README Versions (51a43c9d)

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 8 seconds. The predict time for this model varies significantly based on the inputs.