meta/llama-2-13b – Run with an API on Replicate

Official

meta / llama-2-13b

Base version of Llama 2 13B, a 13 billion parameter language model

Cold

Public
201.8K runs
Priced per token
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README

Input

prompt

*string

Shift + Return to add a new line

Prompt to send to the model.

max_tokens

integer

(minimum: 1)

Maximum number of tokens to generate. A word is generally 2-3 tokens.

Default: 512

min_tokens

integer

(minimum: -1)

Minimum number of tokens to generate. To disable, set to -1. A word is generally 2-3 tokens.

temperature

number

(minimum: 0, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.7

top_p

number

(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.

Default: 0.95

top_k

integer

(minimum: -1)

When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.

Default: 0

stop_sequences

string

Shift + Return to add a new line

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

length_penalty

number

(minimum: 0, maximum: 5)

A parameter that controls how long the outputs are. If < 1, the model will tend to generate shorter outputs, and > 1 will tend to generate longer outputs.

Default: 1

presence_penalty

number

A parameter that penalizes repeated tokens regardless of the number of appearances. As the value increases, the model will be less likely to repeat tokens in the output.

Default: 0

seed

integer

Random seed. Leave blank to randomize the seed.

prompt_template

string

Shift + Return to add a new line

Template for formatting the prompt. Can be an arbitrary string, but must contain the substring `{prompt}`.

Default: "{prompt}"

log_performance_metrics

boolean

Default: false

max_new_tokens

integer

(minimum: 1)

This parameter has been renamed to max_tokens. max_new_tokens only exists for backwards compatibility purposes. We recommend you use max_tokens instead. Both may not be specified.

min_new_tokens

integer

(minimum: -1)

This parameter has been renamed to min_tokens. min_new_tokens only exists for backwards compatibility purposes. We recommend you use min_tokens instead. Both may not be specified.

Run this model in Node.js with one line of code:

npx create-replicate --model=meta/llama-2-13b

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run meta/llama-2-13b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  top_k: 50,
  top_p: 0.9,
  prompt: "Once upon a time a llama explored",
  max_tokens: 512,
  temperature: 0.75,
  length_penalty: 1,
  max_new_tokens: 128,
  min_new_tokens: -1,
  prompt_template: "{prompt}",
  presence_penalty: 0,
  log_performance_metrics: false
};

for await (const event of replicate.stream("meta/llama-2-13b", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run meta/llama-2-13b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The meta/llama-2-13b model can stream output as it's running.
for event in replicate.stream(
    "meta/llama-2-13b",
    input={
        "top_k": 50,
        "top_p": 0.9,
        "prompt": "Once upon a time a llama explored",
        "max_tokens": 512,
        "temperature": 0.75,
        "length_penalty": 1,
        "max_new_tokens": 128,
        "min_new_tokens": -1,
        "prompt_template": "{prompt}",
        "presence_penalty": 0,
        "log_performance_metrics": False
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run meta/llama-2-13b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "top_k": 50,
      "top_p": 0.9,
      "prompt": "Once upon a time a llama explored",
      "max_tokens": 512,
      "temperature": 0.75,
      "length_penalty": 1,
      "max_new_tokens": 128,
      "min_new_tokens": -1,
      "prompt_template": "{prompt}",
      "presence_penalty": 0,
      "log_performance_metrics": false
    }
  }' \
  https://api.replicate.com/v1/models/meta/llama-2-13b/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

the wilderness with his mother. He was hungry and looked for food. He searched everywhere but could not find anything to eat. Finally, he found some lettuce growing in the wild. He began eating it happily when suddenly there was a terrible rumble of thunder and the ground shook underneath him! The llama quickly ran away from this dangerous place before lightning struck down on top of him or worse yet…being eaten alive by an angry bear! The moral of the story is that you should always be prepared for disasters like these because they can happen at any time without

{
  "completed_at": "2023-09-06T16:56:40.321780Z",
  "created_at": "2023-09-06T16:56:37.646628Z",
  "data_removed": false,
  "error": null,
  "id": "jrt4dmlbyivdysgm7sdnravos4",
  "input": {
    "debug": false,
    "top_k": 50,
    "top_p": 0.9,
    "prompt": "Once upon a time a llama explored",
    "temperature": 0.75,
    "max_new_tokens": 128,
    "min_new_tokens": -1
  },
  "logs": "Your formatted prompt is:\nOnce upon a time a llama explored",
  "metrics": {
    "predict_time": 2.810469,
    "total_time": 2.675152
  },
  "output": [
    " the",
    " w",
    "ilder",
    "ness",
    " with",
    " his",
    " mother",
    ".",
    "\n",
    "He",
    " was",
    " hun",
    "gry",
    " and",
    " looked",
    " for",
    " food",
    ".",
    " He",
    " searched",
    " everywhere",
    " but",
    " could",
    " not",
    " find",
    " anything",
    " to",
    " eat",
    ".",
    " Finally",
    ",",
    " he",
    " found",
    " some",
    " lett",
    "uce",
    " growing",
    " in",
    " the",
    " wild",
    ".",
    " He",
    " began",
    " e",
    "ating",
    " it",
    " happ",
    "ily",
    " when",
    " suddenly",
    " there",
    " was",
    " a",
    " terrible",
    " r",
    "umble",
    " of",
    " th",
    "under",
    " and",
    " the",
    " ground",
    " shook",
    " under",
    "ne",
    "ath",
    " him",
    "!",
    " The",
    " ll",
    "ama",
    " quickly",
    " ran",
    " away",
    " from",
    " this",
    " dangerous",
    " place",
    " before",
    " light",
    "ning",
    " struck",
    " down",
    " on",
    " top",
    " of",
    " him",
    " or",
    " worse",
    " yet",
    "…",
    "be",
    "ing",
    " e",
    "aten",
    " alive",
    " by",
    " an",
    " angry",
    " bear",
    "!",
    "\n",
    "The",
    " moral",
    " of",
    " the",
    " story",
    " is",
    " that",
    " you",
    " should",
    " always",
    " be",
    " prepared",
    " for",
    " dis",
    "aster",
    "s",
    " like",
    " these",
    " because",
    " they",
    " can",
    " happen",
    " at",
    " any",
    " time",
    " without"
  ],
  "started_at": "2023-09-06T16:56:37.511311Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/jrt4dmlbyivdysgm7sdnravos4",
    "cancel": "https://api.replicate.com/v1/predictions/jrt4dmlbyivdysgm7sdnravos4/cancel"
  },
  "version": "dc4f980befd2103b0fb17d5854634c0f56d6f80a1a02be1b6f8859ac8ba02896"
}

Generated in

2.8 seconds

Tweak it Report

Pricing

This model is priced by how many input tokens are sent and how many output tokens are generated.

Type	Per unit	Per $1
Input	$0.10 / 1M tokens or 10M tokens / $1
Output	$0.50 / 1M tokens or 2M tokens / $1

For example, for $10 you can run around 28,571 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 13 billion parameter base model, which has not been fine-tuned.

Learn more about running Llama 2 with an API and the different models.

Please see ai.meta.com/llama for more information about the model, licensing, and acceptable use.