Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
An English, monolingual embedding model supporting 8192 sequence length (137M version)
An English, monolingual embedding model supporting 8192 sequence length (33M version)
LLaVA v1.6: Large Language and Vision Assistant (Mistral-7B)
LLaVA v1.6: Large Language and Vision Assistant (Vicuna-13B)
A ControlNet model designed to enhance the temporal consistency of generated outputs
This model is warm. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.