Official

meta / llama-2-13b

Base version of Llama 2 13B, a 13 billion parameter language model

  • Public
  • 201.7K runs
  • Priced per token
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Prompt to send to the model.

integer
(minimum: 1)

Maximum number of tokens to generate. A word is generally 2-3 tokens.

Default: 512

integer
(minimum: -1)

Minimum number of tokens to generate. To disable, set to -1. A word is generally 2-3 tokens.

number
(minimum: 0, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.7

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.

Default: 0.95

integer
(minimum: -1)

When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.

Default: 0

string
Shift + Return to add a new line

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

number
(minimum: 0, maximum: 5)

A parameter that controls how long the outputs are. If < 1, the model will tend to generate shorter outputs, and > 1 will tend to generate longer outputs.

Default: 1

number

A parameter that penalizes repeated tokens regardless of the number of appearances. As the value increases, the model will be less likely to repeat tokens in the output.

Default: 0

integer

Random seed. Leave blank to randomize the seed.

string
Shift + Return to add a new line

Template for formatting the prompt. Can be an arbitrary string, but must contain the substring `{prompt}`.

Default: "{prompt}"

boolean

Default: false

integer
(minimum: 1)

This parameter has been renamed to max_tokens. max_new_tokens only exists for backwards compatibility purposes. We recommend you use max_tokens instead. Both may not be specified.

integer
(minimum: -1)

This parameter has been renamed to min_tokens. min_new_tokens only exists for backwards compatibility purposes. We recommend you use min_tokens instead. Both may not be specified.

Output

the wilderness with his mother. He was hungry and looked for food. He searched everywhere but could not find anything to eat. Finally, he found some lettuce growing in the wild. He began eating it happily when suddenly there was a terrible rumble of thunder and the ground shook underneath him! The llama quickly ran away from this dangerous place before lightning struck down on top of him or worse yet…being eaten alive by an angry bear! The moral of the story is that you should always be prepared for disasters like these because they can happen at any time without
Generated in

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This language model is priced by how many input tokens are sent as inputs and how many output tokens are generated.

TypePer unitPer $1
Input
$0.10 / 1M tokens
or
10M tokens / $1
Output
$0.50 / 1M tokens
or
2M tokens / $1

For example, for $10 you can run around 28,571 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.