Official

meta / meta-llama-3.1-405b-instruct

Meta's flagship 405 billion parameter language model, fine-tuned for chat completions

  • Public
  • 5.9M runs
  • GitHub
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Prompt

Default: ""

string
Shift + Return to add a new line

System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.

Default: "You are a helpful assistant."

integer

The minimum number of tokens the model should generate as output.

Default: 0

integer

The maximum number of tokens the model should generate as output.

Default: 512

number

The value used to modulate the next token probabilities.

Default: 0.6

number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.9

integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50

number

Presence penalty

Default: 0

number

Frequency penalty

Default: 0

string
Shift + Return to add a new line

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

string
Shift + Return to add a new line

A template to format the prompt with. If not provided, the default prompt template will be used.

Output

Tina has one brother and one sister. From the brother's perspective, he has one sister, Tina, and also the other sister. So, Tina's brother has 2 sisters. From the sister's perspective, she also has one sister, Tina. So, Tina's siblings have a total of 2 sisters (from the brother's perspective) and 1 sister (from the sister's perspective).
Generated in
Input tokens
26
Output tokens
84
Tokens per second
29.74 tokens / second
Time to first token

Pricing

Model pricing for meta/meta-llama-3.1-405b-instruct. Looking for volume pricing? Get in touch.

$9.50
per million output tokens

or around 105,263 tokens for $1

$9.50
per million input tokens

or around 105,263 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

Meta Llama 3.1 405B Instruct is an instruction-tuned generative language model developed by Meta. It is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Supported languages are English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

The model is trained on over 15 trillion tokens from a mix of publicly available online data, consisting of multilingual text and code. The cutoff date in the dataset is December 2023. The model was trained for 30.84 million GPU hours.

For additional details, please refer to the official model card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md