Run time and cost
This model runs on 4x Nvidia A100 (80GB) GPU hardware.
Predictions typically complete within 78 seconds.
The predict time for this model varies significantly based on the inputs.
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
For full details of this model please read our release blog post or the model card on Hugging Face