Readme
This model doesn't have a readme.
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
{
input: {
debug: false,
top_p: 0.95,
prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
temperature: 0.7,
return_logits: false,
max_new_tokens: 128,
min_new_tokens: -1,
repetition_penalty: 1.15
}
}
);
console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
input={
"debug": False,
"top_p": 0.95,
"prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
"temperature": 0.7,
"return_logits": False,
"max_new_tokens": 128,
"min_new_tokens": -1,
"repetition_penalty": 1.15
}
)
# The nateraw/llama-2-7b-samsum model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
# https://replicate.com/nateraw/llama-2-7b-samsum/api#output-schema
print(item, end="")
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397",
"input": {
"debug": false,
"top_p": 0.95,
"prompt": "[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don\'t forget about Tom\'s bday party!\\nLara: I won\'t! What time should I show up?\\nGary: Around 5 pm. He\'s supposed to be back home at 5:30, so we\'ll have just enough time to prep things up.\\nLara: You\'re such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I\'ll just pick up the cake and get the balloons...\\nGary: Thanks, you\'re so helpful. I\'ve already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: ",
"temperature": 0.7,
"return_logits": false,
"max_new_tokens": 128,
"min_new_tokens": -1,
"repetition_penalty": 1.15
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
{
"completed_at": "2023-11-28T07:55:55.905428Z",
"created_at": "2023-11-28T07:55:48.985119Z",
"data_removed": false,
"error": null,
"id": "4yew3gdbntrgltnwgtj36eg22y",
"input": {
"debug": false,
"top_p": 0.95,
"prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ",
"temperature": 0.7,
"return_logits": false,
"max_new_tokens": 128,
"min_new_tokens": -1,
"repetition_penalty": 1.15
},
"logs": "Your formatted prompt is:\n[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\nSummary:\nprevious weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip\nDownloading peft weights\nDownloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries\nDownloaded peft weights in 0.620\nUnzipped peft weights in 0.003\nInitialized peft model in 0.060\nOverall initialize_peft took 1.238\nExllama: False\nINFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: \"[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don't forget about Tom's bday party!\\nLara: I won't! What time should I show up?\\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\\nLara: You're such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I'll just pick up the cake and get the balloons...\\nGary: Thanks, you're so helpful. I've already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: \", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9",
"metrics": {
"predict_time": 1.995227,
"total_time": 6.920309
},
"output": [
"\n",
"L",
"ara",
" will",
" pick",
" up",
" the",
" c",
"ake",
" and",
" bal",
"lo",
"ons",
" for",
" Tom",
"'",
"s",
" b",
"day",
" party",
" and",
" meet",
" Gary",
" at",
" ",
"5",
" pm",
".",
""
],
"started_at": "2023-11-28T07:55:53.910201Z",
"status": "succeeded",
"urls": {
"stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/4yew3gdbntrgltnwgtj36eg22y",
"get": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y",
"cancel": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y/cancel"
},
"version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397"
}
Your formatted prompt is:
[INST] <<SYS>>
Use the Input to provide a summary of a conversation.
<</SYS>>
Input:
Gary: Hey, don't forget about Tom's bday party!
Lara: I won't! What time should I show up?
Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.
Lara: You're such a great boyfriend. He will be so happy!
Gary: Yep, I am :)
Lara: So I'll just pick up the cake and get the balloons...
Gary: Thanks, you're so helpful. I've already paid for the cake.
Lara: No problem, see you at 5 pm!
Gary: See you! [/INST]
Summary:
previous weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip
Downloading peft weights
Downloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries
Downloaded peft weights in 0.620
Unzipped peft weights in 0.003
Initialized peft model in 0.060
Overall initialize_peft took 1.238
Exllama: False
INFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.
hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9
This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.
This model doesn't have a readme.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Your formatted prompt is:
[INST] <<SYS>>
Use the Input to provide a summary of a conversation.
<</SYS>>
Input:
Gary: Hey, don't forget about Tom's bday party!
Lara: I won't! What time should I show up?
Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.
Lara: You're such a great boyfriend. He will be so happy!
Gary: Yep, I am :)
Lara: So I'll just pick up the cake and get the balloons...
Gary: Thanks, you're so helpful. I've already paid for the cake.
Lara: No problem, see you at 5 pm!
Gary: See you! [/INST]
Summary:
previous weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip
Downloading peft weights
Downloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries
Downloaded peft weights in 0.620
Unzipped peft weights in 0.003
Initialized peft model in 0.060
Overall initialize_peft took 1.238
Exllama: False
INFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.
hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9