replicate / llama-7b

Transformers implementation of the LLaMA language model

Demo API Examples README Train Versions (ac808388)

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 1 seconds.