Run time and cost
This model costs approximately $0.26 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.
This model runs on 4x Nvidia A100 (80GB) GPU hardware.
Predictions typically complete within 46 seconds.
The predict time for this model varies significantly based on the inputs.
Readme
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
For full details of this model please read our release blog post or the model card on Hugging Face