nateraw / llama-2-7b-samsum

debug
top_p: 0.95
prompt: [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary:
temperature: 0.7
return_logits
max_new_tokens: 128
min_new_tokens: -1
repetition_penalty: 1.15

{
  "debug": false,
  "top_p": 0.95,
  "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
  "temperature": 0.7,
  "return_logits": false,
  "max_new_tokens": 128,
  "min_new_tokens": -1,
  "repetition_penalty": 1.15
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
  {
    input: {
      debug: false,
      top_p: 0.95,
      prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
      temperature: 0.7,
      return_logits: false,
      max_new_tokens: 128,
      min_new_tokens: -1,
      repetition_penalty: 1.15
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
    input={
        "debug": False,
        "top_p": 0.95,
        "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
        "temperature": 0.7,
        "return_logits": False,
        "max_new_tokens": 128,
        "min_new_tokens": -1,
        "repetition_penalty": 1.15
    }
)

# The nateraw/llama-2-7b-samsum model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/nateraw/llama-2-7b-samsum/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
    "input": {
      "debug": false,
      "top_p": 0.95,
      "prompt": "[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don\'t forget about Tom\'s bday party!\\nLara: I won\'t! What time should I show up?\\nGary: Around 5 pm. He\'s supposed to be back home at 5:30, so we\'ll have just enough time to prep things up.\\nLara: You\'re such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I\'ll just pick up the cake and get the balloons...\\nGary: Thanks, you\'re so helpful. I\'ve already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: ",
      "temperature": 0.7,
      "return_logits": false,
      "max_new_tokens": 128,
      "min_new_tokens": -1,
      "repetition_penalty": 1.15
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Lara will pick up the cake and balloons for Tom's bday party and meet Gary at 5 pm.

{
  "completed_at": "2023-11-28T07:55:55.905428Z",
  "created_at": "2023-11-28T07:55:48.985119Z",
  "data_removed": false,
  "error": null,
  "id": "4yew3gdbntrgltnwgtj36eg22y",
  "input": {
    "debug": false,
    "top_p": 0.95,
    "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
    "temperature": 0.7,
    "return_logits": false,
    "max_new_tokens": 128,
    "min_new_tokens": -1,
    "repetition_penalty": 1.15
  },
  "logs": "Your formatted prompt is:\n[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\nSummary:\nprevious weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip\nDownloading peft weights\nDownloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries\nDownloaded peft weights in 0.620\nUnzipped peft weights in 0.003\nInitialized peft model in 0.060\nOverall initialize_peft took 1.238\nExllama: False\nINFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: \"[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don't forget about Tom's bday party!\\nLara: I won't! What time should I show up?\\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\\nLara: You're such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I'll just pick up the cake and get the balloons...\\nGary: Thanks, you're so helpful. I've already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: \", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9",
  "metrics": {
    "predict_time": 1.995227,
    "total_time": 6.920309
  },
  "output": [
    "\n",
    "L",
    "ara",
    " will",
    " pick",
    " up",
    " the",
    " c",
    "ake",
    " and",
    " bal",
    "lo",
    "ons",
    " for",
    " Tom",
    "'",
    "s",
    " b",
    "day",
    " party",
    " and",
    " meet",
    " Gary",
    " at",
    " ",
    "5",
    " pm",
    ".",
    ""
  ],
  "started_at": "2023-11-28T07:55:53.910201Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/4yew3gdbntrgltnwgtj36eg22y",
    "get": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y",
    "cancel": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y/cancel"
  },
  "version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397"
}

Generated in

2.0 seconds

Tweak it Share Report

Your formatted prompt is:
[INST] <<SYS>>
Use the Input to provide a summary of a conversation.
<</SYS>>
Input:
Gary: Hey, don't forget about Tom's bday party!
Lara: I won't! What time should I show up?
Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.
Lara: You're such a great boyfriend. He will be so happy!
Gary: Yep, I am :)
Lara: So I'll just pick up the cake and get the balloons...
Gary: Thanks, you're so helpful. I've already paid for the cake.
Lara: No problem, see you at 5 pm!
Gary: See you! [/INST]
Summary:
previous weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip
Downloading peft weights
Downloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries
Downloaded peft weights in 0.620
Unzipped peft weights in 0.003
Initialized peft model in 0.060
Overall initialize_peft took 1.238
Exllama: False
INFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.
hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9

Prediction

nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397

Model

nateraw/llama-2-7b-samsum:7b38898d

bzsonglb4jwxhr2yjswvutd43i

Status

Succeeded

Source

Web

Hardware

A40 (Large)

Total duration

4.4s

Created

over 1 year ago

Input

debug
top_p: 0.95
prompt: [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary:
temperature: 0.3
return_logits
max_new_tokens: 128
min_new_tokens: -1
repetition_penalty: 1.15

{
  "debug": false,
  "top_p": 0.95,
  "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
  "temperature": 0.3,
  "return_logits": false,
  "max_new_tokens": 128,
  "min_new_tokens": -1,
  "repetition_penalty": 1.15
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
  {
    input: {
      debug: false,
      top_p: 0.95,
      prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
      temperature: 0.3,
      return_logits: false,
      max_new_tokens: 128,
      min_new_tokens: -1,
      repetition_penalty: 1.15
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
    input={
        "debug": False,
        "top_p": 0.95,
        "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
        "temperature": 0.3,
        "return_logits": False,
        "max_new_tokens": 128,
        "min_new_tokens": -1,
        "repetition_penalty": 1.15
    }
)

# The nateraw/llama-2-7b-samsum model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/nateraw/llama-2-7b-samsum/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
    "input": {
      "debug": false,
      "top_p": 0.95,
      "prompt": "[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don\'t forget about Tom\'s bday party!\\nLara: I won\'t! What time should I show up?\\nGary: Around 5 pm. He\'s supposed to be back home at 5:30, so we\'ll have just enough time to prep things up.\\nLara: You\'re such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I\'ll just pick up the cake and get the balloons...\\nGary: Thanks, you\'re so helpful. I\'ve already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: ",
      "temperature": 0.3,
      "return_logits": false,
      "max_new_tokens": 128,
      "min_new_tokens": -1,
      "repetition_penalty": 1.15
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Gary and Lara are going to Tom's bday party. They will arrive at 5 pm. Gary has already paid for the cake and Lara will pick it up and get the balloons.

{
  "completed_at": "2023-11-28T07:57:51.619603Z",
  "created_at": "2023-11-28T07:57:47.266607Z",
  "data_removed": false,
  "error": null,
  "id": "bzsonglb4jwxhr2yjswvutd43i",
  "input": {
    "debug": false,
    "top_p": 0.95,
    "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
    "temperature": 0.3,
    "return_logits": false,
    "max_new_tokens": 128,
    "min_new_tokens": -1,
    "repetition_penalty": 1.15
  },
  "logs": "Your formatted prompt is:\n[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\nSummary:\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 11-28 07:57:50 async_llm_engine.py:371] Received request 0: prompt: \"[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don't forget about Tom's bday party!\\nLara: I won't! What time should I show up?\\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\\nLara: You're such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I'll just pick up the cake and get the balloons...\\nGary: Thanks, you're so helpful. I've already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: \", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.3, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-28 07:57:50 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 11-28 07:57:51 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-568d84f685-pjznp",
  "metrics": {
    "predict_time": 1.17975,
    "total_time": 4.352996
  },
  "output": [
    " Gary",
    " and",
    " L",
    "ara",
    " are",
    " going",
    " to",
    " Tom",
    "'",
    "s",
    " b",
    "day",
    " party",
    ".",
    " They",
    " will",
    " arrive",
    " at",
    " ",
    "5",
    " pm",
    ".",
    " Gary",
    " has",
    " already",
    " paid",
    " for",
    " the",
    " c",
    "ake",
    " and",
    " L",
    "ara",
    " will",
    " pick",
    " it",
    " up",
    " and",
    " get",
    " the",
    " bal",
    "lo",
    "ons",
    ".",
    ""
  ],
  "started_at": "2023-11-28T07:57:50.439853Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/bzsonglb4jwxhr2yjswvutd43i",
    "get": "https://api.replicate.com/v1/predictions/bzsonglb4jwxhr2yjswvutd43i",
    "cancel": "https://api.replicate.com/v1/predictions/bzsonglb4jwxhr2yjswvutd43i/cancel"
  },
  "version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397"
}

Generated in

1.2 seconds

Tweak it Share Report

Your formatted prompt is:
[INST] <<SYS>>
Use the Input to provide a summary of a conversation.
<</SYS>>
Input:
Gary: Hey, don't forget about Tom's bday party!
Lara: I won't! What time should I show up?
Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.
Lara: You're such a great boyfriend. He will be so happy!
Gary: Yep, I am :)
Lara: So I'll just pick up the cake and get the balloons...
Gary: Thanks, you're so helpful. I've already paid for the cake.
Lara: No problem, see you at 5 pm!
Gary: See you! [/INST]
Summary:
correct lora is already loaded
Overall initialize_peft took 0.000
Exllama: False
INFO 11-28 07:57:50 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.3, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 11-28 07:57:50 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 11-28 07:57:51 async_llm_engine.py:111] Finished request 0.
hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-568d84f685-pjznp

Want to make some of these yourself?

Run this model

nateraw / llama-2-7b-samsum

Prediction

Input

Output

Prediction

Input

Output

Logs (4yew3gdbntrgltnwgtj36eg22y)

Logs (bzsonglb4jwxhr2yjswvutd43i)