meta / llama-2-70b-chat

A 70 billion parameter language model from Meta, fine tuned for chat completions

  • Public
  • 5.1M runs
  • GitHub
  • Paper
  • License




This language model is priced by how many input tokens are sent as inputs and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.


Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 70 billion parameter chat model, which has been fine-tuned on instructions to make it better at being a chat bot.

Learn more about running Llama 2 with an API and the different models.

Please see for more information about the model, licensing, and acceptable use.

How to prompt Llama 2 chat

To use this model, you can simply pass a prompt or instruction to the prompt argument. We handle prompt formatting on the backend so that you don’t need to worry about it.

Formatting prompts for chat interfaces

However, if you’re managing dialogue state with multiple exchanges between a user and the model, you need to mark the dialogue turns with instruction tags that indicate the beginning ("[INST]") and end (`”/INST]”) of user input. For example, a properly formatted dialogue looks like:

prompt = """\
[INST] Hi! [/INST]
Hello! How are you?
[INST] I'm great, thanks for asking. Could you help me with a task? [/INST]"""

In this example, the hypothetical user has first prompted "Hi!" and received the response "Hello! How are you?". Then, the user responded "I'm great, thanks for asking. Could you help me with a task?".

Modifying the system prompt

In addition to supporting dialogue exchanges, this deployment also allows you to modify the system prompt that is used to guide model responses. By altering the input to the system_prompt argument, you can inject custom context or information that will be used to guide model output.

To learn more, see this guide to prompting Llama 2.