Prediction

Official model

meta/meta-llama-3.1-405b-instruct

zjq6szssxdrm20cgw0fsntgvbg

Status

Succeeded

Source

Web

Total duration

2.9s

Created

over 1 year ago by meta

Webhook

–

Input

prompt: Tina has one brother and one sister. How many sisters do Tina's siblings have?
system_prompt: You are a helpful assistant.
min_tokens: 0
max_tokens: 1024
temperature: 0.6
top_p: 0.9
presence_penalty: 0
frequency_penalty: 0

{
  "frequency_penalty": 0,
  "max_tokens": 1024,
  "min_tokens": 0,
  "presence_penalty": 0,
  "prompt": "Tina has one brother and one sister. How many sisters do Tina's siblings have?",
  "system_prompt": "You are a helpful assistant.",
  "temperature": 0.6,
  "top_p": 0.9
}

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Wvf**********************************

This is your API token. Keep it to yourself.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run meta/meta-llama-3.1-405b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  frequency_penalty: 0,
  max_tokens: 1024,
  min_tokens: 0,
  presence_penalty: 0,
  prompt: "Tina has one brother and one sister. How many sisters do Tina's siblings have?",
  system_prompt: "You are a helpful assistant.",
  temperature: 0.6,
  top_p: 0.9
};

for await (const event of replicate.stream("meta/meta-llama-3.1-405b-instruct", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Wvf**********************************

This is your API token. Keep it to yourself.

Import the client:

import replicate

Run meta/meta-llama-3.1-405b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The meta/meta-llama-3.1-405b-instruct model can stream output as it's running.
for event in replicate.stream(
    "meta/meta-llama-3.1-405b-instruct",
    input={
        "frequency_penalty": 0,
        "max_tokens": 1024,
        "min_tokens": 0,
        "presence_penalty": 0,
        "prompt": "Tina has one brother and one sister. How many sisters do Tina's siblings have?",
        "system_prompt": "You are a helpful assistant.",
        "temperature": 0.6,
        "top_p": 0.9
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Wvf**********************************

This is your API token. Keep it to yourself.

Run meta/meta-llama-3.1-405b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "frequency_penalty": 0,
      "max_tokens": 1024,
      "min_tokens": 0,
      "presence_penalty": 0,
      "prompt": "Tina has one brother and one sister. How many sisters do Tina\'s siblings have?",
      "system_prompt": "You are a helpful assistant.",
      "temperature": 0.6,
      "top_p": 0.9
    }
  }' \
  https://api.replicate.com/v1/models/meta/meta-llama-3.1-405b-instruct/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Tina has one brother and one sister. From the brother's perspective, he has one sister, Tina, and also the other sister. So, Tina's brother has 2 sisters.

From the sister's perspective, she also has one sister, Tina. 

So, Tina's siblings have a total of 2 sisters (from the brother's perspective) and 1 sister (from the sister's perspective).

{
  "id": "zjq6szssxdrm20cgw0fsntgvbg",
  "model": "meta/meta-llama-3.1-405b-instruct",
  "version": "hidden",
  "input": {
    "frequency_penalty": 0,
    "max_tokens": 1024,
    "min_tokens": 0,
    "presence_penalty": 0,
    "prompt": "Tina has one brother and one sister. How many sisters do Tina's siblings have?",
    "system_prompt": "You are a helpful assistant.",
    "temperature": 0.6,
    "top_p": 0.9
  },
  "logs": "INFO 07-23 15:04:21 async_llm_engine.py:173] Added request 3cd670c9b87b469ea9721ddfbcc6b67a.\nINFO 07-23 15:04:21 metrics.py:396] Avg prompt throughput: 7.8 tokens/s, Avg generation throughput: 17.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:04:24 async_llm_engine.py:140] Finished request 3cd670c9b87b469ea9721ddfbcc6b67a.\nGeneration took 1721746720.08sFormatted prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nTina has one brother and one sister. How many sisters do Tina's siblings have?<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
  "output": [
    "T",
    "ina",
    " has",
    " one",
    " brother",
    " and",
    " one",
    " sister",
    ".",
    " From",
    " the",
    " brother",
    "'s",
    " perspective",
    ",",
    " he",
    " has",
    " one",
    " sister",
    ",",
    " Tina",
    ",",
    " and",
    " also",
    " the",
    " other",
    " sister",
    ".",
    " So",
    ",",
    " Tina",
    "'s",
    " brother",
    " has",
    " ",
    "2",
    " sisters",
    ".\n\n",
    "From",
    " the",
    " sister",
    "'s",
    " perspective",
    ",",
    " she",
    " also",
    " has",
    " one",
    " sister",
    ",",
    " Tina",
    ".",
    " \n\n",
    "So",
    ",",
    " Tina",
    "'s",
    " siblings",
    " have",
    " a",
    " total",
    " of",
    " ",
    "2",
    " sisters",
    " (",
    "from",
    " the",
    " brother",
    "'s",
    " perspective",
    ")",
    " and",
    " ",
    "1",
    " sister",
    " (",
    "from",
    " the",
    " sister",
    "'s",
    " perspective",
    ").",
    ""
  ],
  "data_removed": false,
  "error": null,
  "source": "web",
  "status": "succeeded",
  "created_at": "2024-07-23T15:04:21.227Z",
  "started_at": "2024-07-23T15:04:21.263665Z",
  "completed_at": "2024-07-23T15:04:24.09009Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/zjq6szssxdrm20cgw0fsntgvbg/cancel",
    "get": "https://api.replicate.com/v1/predictions/zjq6szssxdrm20cgw0fsntgvbg",
    "stream": "https://streaming-api.svc.sea.v.replicate.net/v1/streams/6zoqtc6qgpxby7xzauee36qv6bktfaxugelxfqs6ifgg6q4s77sa",
    "web": "https://replicate.com/p/zjq6szssxdrm20cgw0fsntgvbg"
  },
  "metrics": {
    "batch_size": 1.000229398266064,
    "input_token_count": 26,
    "output_token_count": 84,
    "predict_time": 2.826425468,
    "predict_time_share": 2.8200430870056152,
    "time_to_first_token": 0.112068943,
    "tokens_per_second": 29.74134035953616,
    "total_time": 2.86309
  }
}

Generated in

2.8 seconds

Input tokens

Output tokens

Tokens per second

29.74 tokens / second

Time to first token

0.1 seconds

Tweak it Iterate in playground Report