meta/llama-2-13b-chat:56acad22 – Run with an API on Replicate

meta /llama-2-13b-chat:56acad22

Input

prompt

*string

Shift + Return to add a new line

User: Explain the self-attention mechanism that Transformers use like I'm five.
Assistant:User: Explain the self-attention mechanism that Transformers use like I'm five.
Assistant:

Prompt to send to Llama v2.

max_length

integer

(minimum: 1)

Maximum number of tokens to generate. A word is generally 2-3 tokens

Default: 500

temperature

number

(minimum: 0.01, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.75

top_p

number

(minimum: 0.01, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

Default: 1

repetition_penalty

number

(minimum: 0.01, maximum: 5)

Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.

Default: 1

debug

boolean

provide debugging output in logs

Default: false

Run this model in Node.js with one line of code:

npx create-replicate --model=meta/llama-2-13b-chat

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run meta/llama-2-13b-chat using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "meta/llama-2-13b-chat:56acad22679f6b95d6e45c78309a2b50a670d5ed29a37dd73d182e89772c02f1",
  {
    input: {
      debug: false,
      top_p: 1,
      prompt: "User: Explain the self-attention mechanism that Transformers use like I'm five.\nAssistant:",
      max_length: 500,
      temperature: 0.75,
      repetition_penalty: 1
    }
  }
);
console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run meta/llama-2-13b-chat using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "meta/llama-2-13b-chat:56acad22679f6b95d6e45c78309a2b50a670d5ed29a37dd73d182e89772c02f1",
    input={
        "debug": False,
        "top_p": 1,
        "prompt": "User: Explain the self-attention mechanism that Transformers use like I'm five.\nAssistant:",
        "max_length": 500,
        "temperature": 0.75,
        "repetition_penalty": 1
    }
)

# The meta/llama-2-13b-chat model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/meta/llama-2-13b-chat/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run meta/llama-2-13b-chat using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "56acad22679f6b95d6e45c78309a2b50a670d5ed29a37dd73d182e89772c02f1",
    "input": {
      "debug": false,
      "top_p": 1,
      "prompt": "User: Explain the self-attention mechanism that Transformers use like I\'m five.\\nAssistant:",
      "max_length": 500,
      "temperature": 0.75,
      "repetition_penalty": 1
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Sure, I'd be happy to explain the self-attention mechanism in a way that's easy to understand! So, you know how sometimes you might be talking to someone, and you want to make sure they understand what you're saying? Maybe you repeat a word or phrase to make sure they get it, or you look them in the eye to make sure they're paying attention. That's kind of like what the self-attention mechanism does in Transformers. In a Transformer, the self-attention mechanism helps the model figure out which parts of the input it should focus on more when it's processing a sentence or paragraph. It does this by looking at all the different words in the input, and deciding which ones are the most important for understanding what the sentence means. For example, imagine you're trying to understand the sentence "The cat chased the mouse." The self-attention mechanism would help the model figure out which words are the most important for understanding this sentence. It might decide that "cat" and "mouse" are the most important words, because they're the nouns that the sentence is about. It might also decide that "chased" is important, because it's the verb that tells us what the cat did. So, the self-attention mechanism helps the model focus on the most important parts of the input, and ignore the less important parts. This helps the model understand the sentence better, and make more accurate predictions about what comes next. Does that make sense?

{
  "completed_at": "2023-07-16T17:12:38.464466Z",
  "created_at": "2023-07-16T17:12:28.610638Z",
  "data_removed": false,
  "error": null,
  "id": "ks7etr3bixmqtqse5gdb6vxtrm",
  "input": {
    "top_p": 1,
    "prompt": "User: Explain the self-attention mechanism that Transformers use like I'm five.\nAssistant:",
    "max_length": 500,
    "temperature": 0.75,
    "repetition_penalty": 1
  },
  "logs": null,
  "metrics": {
    "predict_time": 9.84453,
    "total_time": 9.853828
  },
  "output": [
    "Sure,",
    " I'd",
    " be",
    " happy",
    " to",
    " explain",
    " the",
    " self-attention",
    " mechanism",
    " in",
    " a",
    " way",
    " that's",
    " easy",
    " to",
    " understand!\nSo,",
    " you",
    " know",
    " how",
    " sometimes",
    " you",
    " might",
    " be",
    " talking",
    " to",
    " someone,",
    " and",
    " you",
    " want",
    " to",
    " make",
    " sure",
    " they",
    " understand",
    " what",
    " you're",
    " saying?",
    " Maybe",
    " you",
    " repeat",
    " a",
    " word",
    " or",
    " phrase",
    " to",
    " make",
    " sure",
    " they",
    " get",
    " it,",
    " or",
    " you",
    " look",
    " them",
    " in",
    " the",
    " eye",
    " to",
    " make",
    " sure",
    " they're",
    " paying",
    " attention.",
    " That's",
    " kind",
    " of",
    " like",
    " what",
    " the",
    " self-attention",
    " mechanism",
    " does",
    " in",
    " Transformers.\n\nIn",
    " a",
    " Transformer,",
    " the",
    " self-attention",
    " mechanism",
    " helps",
    " the",
    " model",
    " figure",
    " out",
    " which",
    " parts",
    " of",
    " the",
    " input",
    " it",
    " should",
    " focus",
    " on",
    " more",
    " when",
    " it's",
    " processing",
    " a",
    " sentence",
    " or",
    " paragraph.",
    " It",
    " does",
    " this",
    " by",
    " looking",
    " at",
    " all",
    " the",
    " different",
    " words",
    " in",
    " the",
    " input,",
    " and",
    " deciding",
    " which",
    " ones",
    " are",
    " the",
    " most",
    " important",
    " for",
    " understanding",
    " what",
    " the",
    " sentence",
    " means.\n\nFor",
    " example,",
    " imagine",
    " you're",
    " trying",
    " to",
    " understand",
    " the",
    " sentence",
    " \"The",
    " cat",
    " chased",
    " the",
    " mouse.\"",
    " The",
    " self-attention",
    " mechanism",
    " would",
    " help",
    " the",
    " model",
    " figure",
    " out",
    " which",
    " words",
    " are",
    " the",
    " most",
    " important",
    " for",
    " understanding",
    " this",
    " sentence.",
    " It",
    " might",
    " decide",
    " that",
    " \"cat\"",
    " and",
    " \"mouse\"",
    " are",
    " the",
    " most",
    " important",
    " words,",
    " because",
    " they're",
    " the",
    " nouns",
    " that",
    " the",
    " sentence",
    " is",
    " about.",
    " It",
    " might",
    " also",
    " decide",
    " that",
    " \"chased\"",
    " is",
    " important,",
    " because",
    " it's",
    " the",
    " verb",
    " that",
    " tells",
    " us",
    " what",
    " the",
    " cat",
    " did.\n\nSo,",
    " the",
    " self-attention",
    " mechanism",
    " helps",
    " the",
    " model",
    " focus",
    " on",
    " the",
    " most",
    " important",
    " parts",
    " of",
    " the",
    " input,",
    " and",
    " ignore",
    " the",
    " less",
    " important",
    " parts.",
    " This",
    " helps",
    " the",
    " model",
    " understand",
    " the",
    " sentence",
    " better,",
    " and",
    " make",
    " more",
    " accurate",
    " predictions",
    " about",
    " what",
    " comes",
    " next.",
    " Does",
    " that",
    " make",
    " sense?"
  ],
  "started_at": "2023-07-16T17:12:28.619936Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/ks7etr3bixmqtqse5gdb6vxtrm",
    "cancel": "https://api.replicate.com/v1/predictions/ks7etr3bixmqtqse5gdb6vxtrm/cancel"
  },
  "version": "56acad22679f6b95d6e45c78309a2b50a670d5ed29a37dd73d182e89772c02f1"
}

Generated in

9.8 seconds

Tweak it Report