Official

meta / llama-4-scout-instruct

A 17 billion parameter model with 16 experts

  • Public
  • 2.6M runs
  • Commercial use
  • Weights
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Prompt

Default: ""

integer
(minimum: 0, maximum: 131072)

The maximum number of tokens the model should generate as output.

Default: 4096

number

The value used to modulate the next token probabilities.

Default: 0.6

number

Presence penalty

Default: 0

number

Frequency penalty

Default: 0

number

Top-p (nucleus) sampling

Default: 1

Output

Hello! It's nice to meet you. I'm Llama, a large language model developed by Meta. How can I assist you today?
Generated in
Input tokens
5
Output tokens
29
Tokens per second
53.15 tokens / second
Time to first token

Pricing

Model pricing for meta/llama-4-scout-instruct. Looking for volume pricing? Get in touch.

$0.65
per million output tokens

or around 1,538,461 tokens for $1

$0.17
per million input tokens

or around 5,882,352 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.