replicate / llama-7b
Transformers implementation of the LLaMA language model
Run time and cost
This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 1 seconds.
Transformers implementation of the LLaMA language model
This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 1 seconds.