Official

meta / llama-2-70b-chat

A 70 billion parameter language model from Meta, fine tuned for chat completions

  • Public
  • 9.2M runs
  • Priced per token
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Prompt to send to the model.

string
Shift + Return to add a new line

System prompt to send to the model. This is prepended to the prompt and helps guide system behavior.

Default: "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."

integer
(minimum: 1)

Maximum number of tokens to generate. A word is generally 2-3 tokens.

Default: 512

integer
(minimum: -1)

Minimum number of tokens to generate. To disable, set to -1. A word is generally 2-3 tokens.

number
(minimum: 0, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.7

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.

Default: 0.95

integer
(minimum: -1)

When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.

Default: 0

string
Shift + Return to add a new line

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

number
(minimum: 0, maximum: 5)

A parameter that controls how long the outputs are. If < 1, the model will tend to generate shorter outputs, and > 1 will tend to generate longer outputs.

Default: 1

number

A parameter that penalizes repeated tokens regardless of the number of appearances. As the value increases, the model will be less likely to repeat tokens in the output.

Default: 0

integer

Random seed. Leave blank to randomize the seed.

string
Shift + Return to add a new line

Template for formatting the prompt. Can be an arbitrary string, but must contain the substring `{prompt}`.

Default: "<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{prompt} [/INST]"

boolean

Default: false

integer
(minimum: 1)

This parameter has been renamed to max_tokens. max_new_tokens only exists for backwards compatibility purposes. We recommend you use max_tokens instead. Both may not be specified.

integer
(minimum: -1)

This parameter has been renamed to min_tokens. min_new_tokens only exists for backwards compatibility purposes. We recommend you use min_tokens instead. Both may not be specified.

Output

Sure, here's a poem about open source machine learning in the style of E. E. Cummings: open source machine learning a gift to the world, a treasure trove of knowledge, free for all to use and learn from, like a digital muse with code that's open and transparent anyone can see, anyone can tinker and improve upon, a collaborative dance of innovation, without a single anchor the community shares and shapes it together they build and refine a tool that's greater than the sum of its parts a true democracy of mind and design from deep learning to natural language it's all here, a boundless expanse of possibility, a playground for thought where anyone can explore and advance so come and join this movement embrace the power, the beauty, the grace of open source machine learning and watch the future unfold with ease.
Generated in
Input tokens
140
Output tokens
200
Tokens per second
44.01 tokens / second

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This language model is priced by how many input tokens are sent as inputs and how many output tokens are generated.

TypePer unitPer $1
Input
$0.65 / 1M tokens
or
1.5M tokens / $1
Output
$2.75 / 1M tokens
or
360K tokens / $1

For example, for $10 you can run around 5,195 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 70 billion parameter chat model, which has been fine-tuned on instructions to make it better at being a chat bot.

Learn more about running Llama 2 with an API and the different models.

Please see ai.meta.com/llama for more information about the model, licensing, and acceptable use.

How to prompt Llama 2 chat

To use this model, you can simply pass a prompt or instruction to the prompt argument. We handle prompt formatting on the backend so that you don’t need to worry about it.

Formatting prompts for chat interfaces

However, if you’re managing dialogue state with multiple exchanges between a user and the model, you need to mark the dialogue turns with instruction tags that indicate the beginning ("[INST]") and end (`”/INST]”) of user input. For example, a properly formatted dialogue looks like:

prompt = """\
[INST] Hi! [/INST]
Hello! How are you?
[INST] I'm great, thanks for asking. Could you help me with a task? [/INST]"""

In this example, the hypothetical user has first prompted "Hi!" and received the response "Hello! How are you?". Then, the user responded "I'm great, thanks for asking. Could you help me with a task?".

Modifying the system prompt

In addition to supporting dialogue exchanges, this deployment also allows you to modify the system prompt that is used to guide model responses. By altering the input to the system_prompt argument, you can inject custom context or information that will be used to guide model output.

To learn more, see this guide to prompting Llama 2.