Official

meta / meta-llama-3.1-405b-instruct

Meta's flagship 405 billion parameter language model, fine-tuned for chat completions

  • Public
  • 5.1M runs
  • Priced per token
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Prompt

Default: ""

string
Shift + Return to add a new line

System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.

Default: "You are a helpful assistant."

integer

The minimum number of tokens the model should generate as output.

Default: 0

integer

The maximum number of tokens the model should generate as output.

Default: 512

number

The value used to modulate the next token probabilities.

Default: 0.6

number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.9

integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50

number

Presence penalty

Default: 0

number

Frequency penalty

Default: 0

string
Shift + Return to add a new line

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

string
Shift + Return to add a new line

A template to format the prompt with. If not provided, the default prompt template will be used.

Output

Tina has one brother and one sister. From the brother's perspective, he has one sister, Tina, and also the other sister. So, Tina's brother has 2 sisters. From the sister's perspective, she also has one sister, Tina. So, Tina's siblings have a total of 2 sisters (from the brother's perspective) and 1 sister (from the sister's perspective).
Generated in
Input tokens
26
Output tokens
84
Tokens per second
29.74 tokens / second
Time to first token

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This language model is priced by how many input tokens are sent as inputs and how many output tokens are generated.

TypePer unitPer $1
Input
$9.50 / 1M tokens
or
100K tokens / $1
Output
$9.50 / 1M tokens
or
100K tokens / $1

For example, for $10 you can run around 1,504 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

Meta Llama 3.1 405B Instruct is an instruction-tuned generative language model developed by Meta. It is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Supported languages are English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

The model is trained on over 15 trillion tokens from a mix of publicly available online data, consisting of multilingual text and code. The cutoff date in the dataset is December 2023. The model was trained for 30.84 million GPU hours.

For additional details, please refer to the official model card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md