Official

meta / llama-4-maverick-instruct

A 17 billion parameter model with 128 experts

  • Public
  • 379.8K runs
  • Priced by multiple properties
  • Weights
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Prompt

Default: ""

integer
(minimum: 2, maximum: 20480)

The maximum number of tokens the model should generate as output.

Default: 1024

number

The value used to modulate the next token probabilities.

Default: 0.6

number

Presence penalty

Default: 0

number

Frequency penalty

Default: 0

number

Top-p (nucleus) sampling

Default: 1

Output

Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?
Generated in
Input tokens
5
Output tokens
24
Tokens per second
40.50 tokens / second
Time to first token

Pricing

Model pricing for meta/llama-4-maverick-instruct. Looking for volume pricing? Get in touch.

$0.95
per million output tokens

or around 1,052,631 tokens for $1

$0.25
per million input tokens

or around 4,000,000 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.