Official

meta / llama-4-maverick-instruct

A 17 billion parameter model with 128 experts

  • Public
  • 23.6K runs
  • Priced per token
  • Weights
  • License

Input

string
Shift + Return to add a new line

Prompt

Default: ""

integer
(minimum: 2, maximum: 20480)

The maximum number of tokens the model should generate as output.

Default: 1024

number

The value used to modulate the next token probabilities.

Default: 0.6

number

Presence penalty

Default: 0

number

Frequency penalty

Default: 0

number

Top-p (nucleus) sampling

Default: 1

Output

Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?
Generated in
Input tokens
5
Output tokens
24
Tokens per second
40.50 tokens / second
Time to first token

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This language model is priced by how many input tokens are sent as inputs and how many output tokens are generated.

TypePer unitPer $1
Input
$0.25 / 1M tokens
or
4M tokens / $1
Output
$0.95 / 1M tokens
or
1M tokens / $1

For example, for $10 you can run around 15,038 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.